SharePoint has grown to be unanimously acclaimed as the best collaboration platform on the planet. The latest iteration from Microsoft, SharePoint 2010, has seen the fastest adoption rate of any version due to the rich feature set and agile ribbon-based user interface (UI).
SharePoint is a platform that helps move from personal productivity (using the ubiquitous Microsoft Office suite) to organizational productivity. It can seamlessly slide into an organization of any size and become the central window to all common ways of sharing information, be it documents, tasks, images, or videos.
I am repeatedly inclined to call SharePoint a platform, implying that developers and development companies have a huge opportunity (and, in fact, a responsibility) to provide a complete solution to the customer by building and enhancing the features the platform offers. Only then can the technology itself be fully appreciated and consumed.
That is where the challenge really starts looming in front of development companies. Instead of claiming extreme superiority regarding the technology itself, they should think from a customer’s standpoint about what solutions would really address their needs, their industry’s needs, and help them stay competitive. Some companies have been reasonably successful in creating solutions that have this depth and also a verticalized story to share with a customer in a particular industry.
SharePoint’s rich feature set across multiple areas (collaboration, portal, content management, search, eForms, workflows, and Business Intelligence) is really its advantage as well as its disadvantage. If a solution is not built around any of these facets, it really becomes very hard for a customer to be convinced to choose SharePoint over anything else.
The solutions could include intranet, extranet, Internet portals, document management systems, content management systems, business process automation solutions, project management solutions, search solutions, and business intelligence solutions. SharePoint can be used to design and develop multiple types of solutions, as shown in Table 5-1.
DM solutions face the stiffest competition of the group because every organization around the globe wants to go paperless; they are all hunting for the most suitable solution that will understand exactly what they do and help them move from heaps of papers lying around to a sleek and efficient electronic way of managing their documents.
This chapter aims to bring to light some of the practical challenges that one would face in positioning, developing, and deploying DM solutions built on Microsoft SharePoint 2010. I want to bring together all that I have learned from being involved in a number of SharePoint implementations centered around DM for customers worldwide. For a SharePoint professional, I have tried to go into as much detail as possible; for a SharePoint implementation company, the end-to-end coverage and comprehensiveness is something that you might favor.
The chapter assumes reasonable knowledge of SharePoint 2007 or 2010, though the uninitiated may find some of these sections compelling enough to join the club.
Before you start to appreciate why SharePoint could be a good foundation to build a DM system on, you must first understand what document management is and why there’s such a buzz around it.
Document management is more precisely electronic document management and it is a solution to help companies store, archive, and locate (search for) documents that they receive or create. Though the volume, type, and source of documents vary, companies in every industry—be it manufacturing, oil and gas, construction, automotive, consulting, high tech, IT, or consumer goods—all need an efficient way to store their electronic documents.
Companies have moved from allowing users to store documents on their local disks to shared folders, but shared folders are little more than shared storage—they don’t tie into your business structure or processes. The next evolutionary step is to choose a system that is not just a dump yard for documents but one that has the required intelligence to integrate with your business and business applications—and even external organizations such as vendors and customers (see Figure 5-1).
The following questions arise when it comes to looking for a DMS solution:
And so on. Happily, for most of us, SharePoint does indeed have the features to cater to most, if not all, of these requests. Compliance to a customer’s every requirement comes via a combination of out-of-the-box features (OOB) and configuration/customization. For answers to the specific questions mentioned previously, see Table 5-2.
If you read the previous section, you understand by now that SharePoint is an excellent platform but is not really a full-fledged DM solution unless it is customized or third party add-ons are implemented. To understand this a little better, let’s first define the terminologies; see Table 5-3.
Table 5-4 lists the features that are not available OOB. I will show you ways to address this missing functionality later in this chapter.
Document management needs for organizations depend on their size, industry, budget, and IT roadmap. Though these are some common parameters, often times it also depends on how strong/IT savvy the organization’s IT Department is; many IT departments are so particular about the core business applications that they don’t pay much attention to peripheral applications.
As an organization selling SharePoint-based DM solutions, the first real challenge is to attract/create an impression with the IT department of the importance of such systems, if the need or the requirement has not originated from the company. In fact, the need for a DM system is felt/appreciated more by the business users than IT because IT doesn’t have to deal with all of the papers lying around!
The matrix in Table 5-5 might help you understand how the requirements vary based on the size of the organization.
In terms of industries, the needs of DM systems vary. Table 5-6 shows a matrix summarizing the needs for most common industries for which implementing DM Systems will be a priority.
Apart from the size and industry of the organization, its IT roadmap also plays an important role in deciding on a DM system. This is where I believe SharePoint fits in like nothing else. SharePoint has this positively uncanny ability of growing upon you as you start using it. You could just start with few document libraries and grow up to multiple web applications and site collections spanning thousands of users (of course, subject to the way your SharePoint farm is sized initially). Microsoft’s general recommendation of a Content Database size’s upper limit has been 200 GB. With SharePoint 2010 Service Pack 1, this limit has been increased to 4 TB, which most times is really ample for an organization of any size.Planning multiple site collections and multiple content databases is a very important design activity for organizations with huge content storage needs.
We all think about how to execute/deliver a project, but we first need a project to start planning to deliver! As mentioned previously, SharePoint by itself has significant gaps as a DM solution.
If you are from .NET/custom development background, there is a fundamental difference that needs to be understood. In a .NET project, every requirement from the customer is new and needs to be developed from scratch. But in a SharePoint project, the thinking is more along the lines of let’s see what’s available out of the box and then decide what needs to be customized.
Here’s an example to understand the difference: say a customer asks for a place to upload documents with one attribute being a selection of countries. In .NET, you end up creating a table, a UI for the upload process, and classes to do the uploading. The whole exercise might take 30 days to develop. In SharePoint, it would take about 3 minutes!
Now the customer requires that whenever a document is uploaded, a corresponding meeting event is created in a calendar. This will require customization in both .NRT and SharePoint; though in SharePoint, a no-code solution for this exists using a simple SharePoint Designer workflow.
The pre-sales stage boils down to the following activities:
I don’t want this to sound like guidance for aspiring sales and marketing professionals, but the proposal for a SharePoint-based DM system can be successfully received only if it is NOT positioned against any other system the customer might be evaluating.
The proposal is going to succeed if it talks about SharePoint as a comprehensive collaboration platform that can grow as needs grow. At the same time, the proposal also needs to talk about how the solution will be built specifically for their industry. This will require research into the industry and interviews with users; it may also require some research on alternative solutions in the marketplace.
The proposal’s technology solution ideally should contain the following sections, apart from your company profile and executive summary:
Once the proposal is accepted, you might be required to make a demonstration of your solution; this is where you may have to install certain third party solutions for the demonstration. Simple things like changing the logo and doing a bit of branding will certainly create an excellent impression on the customer. The full functionality obviously may not be present, but it doesn’t matter. Make sure the demo looks attractive.
It’s also important to set the expectations with the customer. Any ambiguities in the proposal or the requirements should be clarified at this point in time. Lingering open issues could lead to big problems during the implementation.
It is very important to assemble the right team as you move towards implementation. The team ideally should comprise the people shown in Figure 5-2 and discussed in Table 5-7.
Once the solution is implemented, customer support starts. Typically there are two types of clients: companies who have their own IT department with specific skills in SharePoint administration and development, or companies who have their own IT department but would like to only do activities oriented towards content authoring and want the actual technical support to come from the vendor who implemented the application.
In the former case, the support calls to the company will be less; in the later, the support calls may be very high, at least during the initial period of post-launch. Another approach is to install a resident support engineer at the company, thus supplying the vendor company with recurring revenue.
In either case, it will be important to set up an online help desk where the support cases can be logged by the customer. SharePoint itself can be used for this. There is a WSS 3.0 HelpDesk Template, which is just a structure, but in SharePoint 2010, a help desk can be developed in a much better fashion using InfoPath-based Form libraries.
When involved in implementing a SharePoint 2010 project, specifically a DM project, there are some very important architectural choices that the Architect will be expected to make. These can be categorized into the following:
This is probably one of the most critical architectural decisions and it must be made at the beginning of a project (in fact, even in the proposal stage when the recommended topology has to be specified). The questions that need to be answered are as follows:
With hardware capacities continuing to soar high, what used to be super computers are now sitting in the form of servers in your own computer network. Obviously, the resource requirements for software applications are continuously on the rise. With this premise, it is important to answer the following questions:
Before I attempt to answer these questions, it’s important to follow distinctive approaches depending on the size of your company; see Table 5-8. For a small-medium sized company, the users typically will be
For a large sized company, the users will be
These are the pros and cons for a single instance setup.
These are the multi-instance pros and cons.
When you have a multi-instance SharePoint farm or even intend to keep a primary SharePoint farm and a secondary SharePoint farm for DR purposes, it will be important to keep both farms synchronized. Traditionally, this was done using third party tools or SQL replication (of content databases) or SharePoint backup/restore depending on how “soon” you wanted the other instances to be synchronized.
The new Content Deployment feature is best used in the case of replication from Staging to Production environments. It is important to keep in mind that customizations by way of WSPs always needs to be re-deployed across these environments. Content Deployment takes care of changes across libraries and lists. There are third party solutions from companies like DocAve, Syntergy, and Infonic that also provide excellent replication functionality.
Table 5-9 is a checklist you can use to verify if a particular solution can really fit your requirements.
When you have a multi-instance SharePoint Farm or if you intend to keep a primary SharePoint farm and a secondary farm for DR in a different location, the choice of the tool or methodology largely depends on the type of connectivity and bandwidth available between these locations. You can’t have identical methods for companies that have connectivity between 1Mbps and 10Mbps. TechNet has a good article about this at http://technet.microsoft.com/en-us/library/cc263099(office.12).aspx
.
Some of the popular WAN optimizers include solutions from F5, RiverBed, Packeteer, Citrix, and Cisco; RiverBed and F5 are really popular. F5 also has a hardware-based load balanced solution.
Table 5-10 shows capacity numbers; these are from my own experience of implementing SharePoint-based DM systems. Some customers may demand high availability with clustering for even smaller number of users, so variations of these numbers are possible.
SharePoint 2010 farms can typically grow to many more servers that what you have seen in the previous section. At any of the roles, web or application, more servers can be added.
The farm in Figure 5-3 really puts a lot of focus towards search with search databases/search roles provided with separate processing power. The environment in Figure 5-4 provides separate processing power for search as well as SharePoint content databases.
Once you have decided on the topology, the other critical aspect is to plan the information architecture design. This relates to the way you will create your Web Applications, site collections, sites, keywords, and search. Table 5-11 might help in choosing the right direction.
You are aware that SharePoint stores all its content in SQL content databases. When the DB size crosses a particular limit (Microsoft indicates it’s around 200GB), the performance of the SharePoint sites may deteriorate. An architect in this scenario has the option to store SharePoint’s content on a remote storage outside the SQL content database. Third party solutions exist for this as there was no native support or native tools with Microsoft. With SharePoint 2010 and SQL Server 2008, RBS is available as an option for architects without the need of any other solutions.
The following set of questions and answers might help you understand these terms better.
Q: What is RBS?
RBS (Remote Blob Store) is a set of standardized APIs that allow storage/retrieval of BLOBs outside of your main SQL database where a dedicated BLOB store is desirable for various reasons. This uses a provider model for plugging in any dedicated BLOB store that implements these RBS APIs.
Q: Which version of SQL Server can I use for SharePoint RBS?
SQL Server 2008 and SQL Server 2008 R2 both support RBS. A RBS Library needs to be downloaded and installed on SQL Server to enable the feature. All SQL editions (Express, Standard, and Enterprise) support RBS. Licensing requirements may be involved depending on the scenario.
Q: What is FILESTREAM?
FILESTREAM is a SQL Server 2008 feature to store BLOB content on to file system.
FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary (max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing. SQL FILESTREAM feature does not allow you to store content on anything other than local storages. (SMB shares can’t be used for store BLOB content.)
Q: What is RBS FILESTREAM Provider?
RBS FILESTREAM Provider is a free OOB provider shipped by the Microsoft SQL RBS team that allows a deployment to use a SQL database (local or remote) as a dedicated BLOB store. This provider utilizes the FILESTREAM as the BLOB storage mechanism and ties the two technologies together.
Q: Is there any benefit in using RBS with SharePoint?
By using RBS for SharePoint, the customer may be able to leverage cheaper storage, improve performance, and enable better integration stories with third party technology for their SharePoint databases. But be careful; the benefit is different case by case. You need to investigate your scenarios to see if RBS really fits you.
Q: How does backup and restore get affected when using RBS?
If you use the local FILESTREAM provider with RBS, you can use built-in SharePoint tools to back up and restore. These operations backup and restore both the metadata and the BLOB store. If you use the remote RBS provider, you must carefully coordinate the backup and restore processes. This is because the backup and restore processes involve both the metadata and the BLOB store. You should take this into account when planning the RBS configuration. Not all RBS providers support backup and restore of BLOB data. You must check with the provider to confirm support.
You can also use the Microsoft System Center Data Protection Manager to back up and restore the RBS environment.
Q: Can RBS FILESTREAM Provider support SMB shares to store the content, such as a NAS device?
No. The SQL FILESTREAM feature doesn’t allow you to store content on anything other than local storage. Therefore, the RBS FILESTREAM Provider has the same limitation. Third party RBS providers don’t have this limitation if they are not leveraging SQL FILESTREAM feature.
Microsoft is heavily pushing for FAST 2010 Search as the Enterprise Search Solution but is it really worth it? What additional features does it offer? Table 5-12 answers these questions. It also has certain possible search requirements that are addressed neither by SharePoint nor FAST.
Office Web Apps 2010 seamlessly integrates with SharePoint 2010 to allow you to view/edit MS Office documents (Word, Excel, and PowerPoint) within the browser itself. OWA 2010 (sometimes mistaken as Outlook Web Access) should definitely be considered because users generally find it quite easy (and actually fast) to view the documents within the browser itself.
Note that the Excel Web Access (EWA) part of SharePoint 2010 Enterprise allows you to view (without editing) Excel files within the browser even without OWA. EWA also allows users to interact with parameters (parameters can be linked to Excel formulas) and PivotTables.
In the analysis phase, requirements gathering is done with business users. It can be done with a requirements-gathering template with certain questions. Though these questions can mostly be answered by business users such as department heads, the internal IT team may also help in providing certain required information.
When performing the requirements gathering, it is important to have the right team on both sides. Figure 5-5 shows an good formula. Each member is expected to provide input of certain nature to help understand the requirements better and provide the most appropriate solution.
An requirements gathering questionnaire is presented in Table 5-13.
You could add more questions as per your client’s requirements. The following section with features might also help in adding/modifying any of these questions. It is also important to understand the different types of users and what their expectations might be from the document management system. Table 5-14 is a matrix of the same.
Different levels of users in a company will have different expectations of the DM system. Most, if not all, of the expectations need to be catered to in order for your document management implementation to be called successful.
Table 5-15 outlines the ways users are currently be storing documents/information and the better options offered by SharePoint.
Once the requirements have been gathered and understood, you can proceed with the actual implementation. The implementation may involve creating the actual structure required by the client. During implementation, apart from the usual requirements, you may also encounter requirements that need to be customized. There are other aspects that need to be considered as well. This section covers them.
Table 5-16 covers some OOB features and some customization approaches to functionality that are either overlooked or lack a straightforward manner of implementation.
One of the most popular features in the SharePoint document libraries is the ability to disable/enable certain fields in the Edit form (where the Document Library’s metadata is captured) once the data has been captured. For instance, you might have the user enter a reference number for a document in the library; the rest of the document can still be edited, but that field must be locked against changes. You may also want to change a field’s value based on what is entered in any other fields. This can be done using SPUtility.JS
and Prototype.JS
, available from CodePlex at http://sputility.codeplex.com/.
SPUtility.js
is a JavaScript library used to make modifications to SharePoint's list forms (NewForm.aspx
and EditForm.aspx
in a survey, custom list, or library). This library depends on Prototype.js
(www.prototypejs.org/
is a JavaScript framework). SPUtility.js
has been tested in SharePoint 2007 with WSS 3.0 and MOSS. It is primarily written and tested for 2007; however, it works well with SharePoint 2010 and I have used it in few of my projects.
PreSaveAction
is a JavaScript function that is called when Save is pressed in a New or Edit form, in which these validations can be performed. You can insert a ContentEditor Web Part into the Edit form or New form and insert your Script section into it using the HTML source.
Document libraries, being the central container for storing documents, can get quite large. Although you can go as high as 30 million documents per document library, you will feel the document library’s performance slow as it crosses the 5,000 or 1,000 item threshold. That is why even a single view threshold is kept at a maximum of 5,000. RBS must be considered for document sizes above 256KB.
If you have started working with SharePoint 2010, Document Sets would not have missed your attention. They are way beyond what a folder has to offer. Document Sets should be leveraged as much as possible when you have to store a set of related files together as a single work unit. The benefits include:
SharePoint 2010 Workspace allows you to take documents offline. Be mindful of limitations such as Workspace not working for Page libraries; also, there is a limit of about 5,000 items that can be taken offline. SharePoint 2010 is part of Office Professional Plus 2010. Figure 5-6 shows a workspace with documents in a library.
SharePoint 2007’s ability to have multilingual sites was limited to using Publishing templates with variations and variation labels. Many SharePoint 2007 professionals completely ignore this portion as it is perceived to be quite cumbersome to set up and maintain; it’s also is not available for simple team sites.
SharePoint 2010 brings multilingual capabilities to the commonly used Team Sites itself. It is very easy and straightforward to set up; one of my blog posts has steps with screenshots (http://karthickmicrosoft.blogspot.com/2010/08/arabic-language-on-sharepoint-2010.html ).
There is a difference between customers wanting to store content of different languages and the UI itself being available in different languages. Don’t assume that you have to create a web site or a site collection in multiple languages just because a document library contains documents in multiple languages. For example, the document library for an English site collection can contain documents written in French and Japanese. For publishing sites, content can be created in any language.
When you are planning multilingual sites, you should also consider what locales are necessary to support your sites. A locale is a regional setting that specifies the way numbers, dates, and times are displayed on the site. However, the locale doesn’t change the language in which the site is displayed. For example, selecting the Thai locale changes the default sort order of list items and uses the Buddhist calendar instead of the default calendar. The locale is a setting that is configured independently of the language specified when a site is created, but unlike the language, the locale can be changed at any time.
It is important to understand the limitations of using the multilingual capabilities in SharePoint. There are few elements in the SharePoint UI that do not support MUI (Multilingual User Interface):
Top executives look for ways in which they can access documents on the move and make approvals from their mobile devices. Though SharePoint OOB has good support for Windows Mobiles and Mobile Views for all pages, there are other mobile platforms and a several interesting add-ons that could be of help. Table 5-17 has a summary.
Scanning of documents into SharePoint document libraries can either be done manually by uploading the scanned files or by choosing an add-on that can invoke the scanner interface, perform the scanning, and get the generated file (usually a PDF) into the document library. In the process, OCR can also be applied, thus providing a content-searchable file.
Websio has developed some pretty interesting add-ons. The scanner/OCR add-on achieves the aforementioned sequence of actions. A number of screenshots are available at http://www.websio.com/product.aspx?ID=103
If you are working in the Middle East, you will need Arabic OCR solutions; the leaders in that field are Sakhr (http://international.sakhr.com/index.html
) and Verus (from Novodynamics, www.novodynamics.com/verus_pro.htm
).
Once development is done, the following phases have to be carried out:
In this chapter, I used my experience to provide approaches, guidelines, and practices that you can adopt while designing and implementing a SharePoint 2010-based DM system. For a quick reference, the following is a list of the best practices you should follow:
All the best for your SharePoint 2010 DM Implementation!