Architecting an IBM FileNet P8 solution
In this chapter, the features of the IBM FileNet P8 reference architecture are applied to a solution illustrating the process of FileNet solution design. The process consists of breaking down a business problem into a set of requirements and fulfilling these requirements using IBM FileNet P8 components and services. The solution is finalized by aligning elements of the solution to non-functional requirements.
This chapter covers the following topics:
 
Disclaimer: The scenarios that we describe in this chapter are fictitious. We provide them here for reference purposes only.
10.1 Basic approach
The basic approach for designing an IBM FileNet P8 solution architecture is to break up the business needs into functional requirements and identify which modules fulfill those requirements. The modules have requirements of their own, such as which operating systems or application servers are supported. These are non-functional requirements.
It is essential that the following solutions are used only as examples and be considered as a general methodology for developing a solution. Real world projects involve many factors that require a detailed analysis before a solution can be developed.
For architectural considerations, functional requirements must be fulfilled. Besides the capabilities needed by users to fulfill their job, there are other specifications to be met. Non-functional requirements describe the quality of the future solution and constraints that cannot be changed within the scope and lifetime of the project. When designing a solution based on a dedicated specification, some general questions must be answered:
What are the functional business requirements?
Which product modules fulfill the business requirements?
Are there any leading architectural mandates in addition to the functional requirements?:
 – Legal and IT Governance requirements.
 – Fulfillment of requirements based on corporate strategic directions.
 – Will this environment the intended for production or development use?
What are the availability requirements?
What are the performance requirements with respect to respond times and throughput?
How many environments have to be built?
How are project costs funded? Is there and emphasis for keeping project budget low or are costs based on total cost of ownership (TCO)?
10.1.1 Module selection
Architectural decisions hinge on the selection of the product modules corresponding to functional requirements. Each module has dependencies independent of the other modules and as such their selection influences architectural requirements. The following sections describe a a fictitious set of requirements and how they align with product capabilities.
10.1.2 Leading architectural requirements
After having selected adequate modules, it is necessary to consider requirements that are not necessarily business requirements. We can divide these requirements into three groups:
Leading requirements generally fall into one or more of the following groups:
Legal and IT Governance
All organizations are subject to local laws and most organizations are subject to some degree of IT governance. Sarbanes-Oxley (SOX), for example, requires American companies who publicly offer stock to declare controls ensuring executive accountability. These controls typically refer to security of information relating to the reporting of earnings. Another example is the Health Insurance Portability and Accountability Act (HIPAA), which contains language that is intended to protect the privacy of an individual’s health records. Additional requirements might be mandated by design standards, such as the Standards and Architectures for eGovernment Applications (SAGA) or service delivery frameworks such as Information Technology Infrastructure Library (ITIL). The fulfillment of these requirements can demand architectural considerations over and above the business requirements.
Strategic directions
Nature of a corporate directions are general guidelines concerning operating systems, application servers, databases, virtualization, and many other (technical) regulations without reacting to dedicated requirements but to demand standardization within the environments. Conflicts with functional requirements must be identified at an early stage of the architectural design because exceptions might cause other dependencies to be followed.
Requirements based on the type of environment used
An IBM FileNet P8 solution architecture usually includes more than one environment to ensure that there is adequate infrastructure available to run the application in production and have that environment be available for other work. Additional environments might be needed for development or quality assurance purposes and might have needs that are different from the production environment. Specifications for surrogate environments must be developed while maintaining as much similarity to the production environment as is practical. Table 10-1 on page 336 shows examples concerning various environments and their usual specifications (exemplary specifications, at least three dedicated and independent environments are recommended: development, test, and production environment)
Table 10-1 Sample environments and their typical specification and characteristics
Environment
Usual specification
Characteristic
Production environment
Functional and non-functional requirements must be fully met
Software and configuration changes must be validated in lower environments
Highest performance and availability requirements
Virtualization where applicable
System or integration test environment
Product modules are equal to the production environment
Physical implementation must be comparable with the production environment
Can usually combine with other test environments
Highly virtualized as long as performance requirements do not prevent virtualization
Performance requirements are medium high (integration test requirements lead the performance requirements)
Disaster and recovery test environment
Product modules are equal to the production environment
Physical implementation must be comparable or equal with the production environment
Can usually combine with the environment for performance tests
Use cases of the disaster and recovery tests conduct the characteristics of the environment
If the focus is on functional test cases, virtualization is a valid option for the environment
If the test cases must demonstrate recovery times or other proofs with technical dependencies, equal or comparable (to the production environment) hardware and virtualized systems must be designed
Performance test environment
Product modules are equal to the production environment
Physical implementation must be identical to the production environment or at least must be able to be interpolated
Can usually combine with the environment for disaster and recovery tests
Performance requirements are identical to the production environment (or at least comparable)
Virtualization equals the production environment
Education environment
Product modules are equal to the production environment
Small number of servers
Some functionality might be stubbed
Highly virtualized
Performance requirements follow the number of concurrent training participants
Clustering must be implemented if used in the production environment (not a must if higher environments beneath production environment are available)
Development environment
Product modules are equal to the production environment except where appropriate modules for high availability
Small number of servers
Some functionality might be stubbed
Collocation of modules might differ from production environment
Highly virtualized
Lowest performance requirements
Clustering must be implemented if used in production environment for development purposes (not a must if higher environments beneath production environment are available)
10.1.3 Availability requirements
The purpose of collecting availability requirements is to make investments in strategies that minimize the operational loss of a system should the system be unavailable for business use. Requirements concerning the availability of a solution might follow a grading system that categorizes the level of importance the application represents to the business or maybe based on a more direct weighting of unrest against the cost of a countermeasure.
Availability is not one particular technology but rather a collection of strategies. Each countermeasure is incrementally more costly to implement than the next that provides additional availability. The most basic form of availability protection is a system back up, where application binaries and business data is copied to a system other than that used for production. This is followed by system hardening or multiple redundant components that are employed to reduce the number of single points of failure. High availability builds upon the concept by having multiple redundant servers take over for failed servers. Lastly, disaster recovery protects against the loss of the data center by duplicating an entire solution at an alternate geographic location.
The availability of an IBM FileNet P8 solution is dependent upon the availability of the communication paths, which fulfill the business requirements. While sometimes this represents the whole solution, in practice some components are less critical than others, for example, an organization that only rarely uses Content Search Services might not weigh components as heavily as the Content Engine. The risk to the business choice of countermeasure must be weighted to consider the implementation and maintenance costs, follow-up costs against the risk of unrealized revenue, and lost worker productivity.
The availability of the solution is determined by its weakest link. Each module might have separate availability requirements because of the design of the software or because of the business requirement it fulfills. Deriving the proper strategy can only be determined by weighing the risk, the cost of the risk, and the cost of the countermeasure. Availability requirements are usually defined by service level agreements (SLA).
Further information about availability is provided by the following RedBooks:
IBM High Availability Solution for IBM FileNet P8 Systems, SG24-7700
Disaster Recovery and Backup Solutions for IBM FileNet P8, SG24-7744
10.1.4 Performance requirements
A system specification or an SLA might also define performance requirements. Performance specifications define minimum requirements, response time, and throughput.
The core IBM FileNet components (Workplace/Workplace XT, Content Engine, Process Engine) support horizontal scalability where multiple systems providing the same service work in tandem, minimize response times, and maximize throughput performance. See also 9.3.1, “Horizontal scaling: Scale out” on page 280.
Solution performance is limited by the lowest performing component along a particular communication path. All computer systems are implemented as layers, for example, the performance of checking in a document, a function of Content Engine, is not based on the performance of the Content Engine alone. The system that implements the Content Engine API, the network between the client and the Content Engine server, the Content Engine software, the application server software, implementation of the Java Virtual Machine, the network between the Content Engine server and the NAS, the network between the database and the Content Engine, the database software, the operating system on which the database software runs, the operating system of the NAS, and storage area network between the NAS and disk, all play a role in the performance of what, at the surface, seems to be a simple operation. Performance management, evaluation, and optimization for the purposes of requirements gathering must consider every component through which the data flows.
10.1.5 Number of environments
Besides the previously named conceptual formulation of a dedicated environment, the number of environments to be implemented must influence the architecture of a certain environment. Dependencies originate from shared resources and limitations of interference by third-party systems (for example, the number of FileNet environments is greater than the number of corresponding external applications).
10.1.6 Total cost of ownership
Functional requirements and cost directions have important impact on the architecture. The following competing conditions must be equilibrated:
SLAs have to be effectively met.
Risks of hardware failures and resulted system standstills, times of unavailability and derived costs of downtimes (labor times, sales shortfalls) have to be regarded.
In case of systems, that shall be expanded capacities and their physical implementation phases have to be decided.
For any question and its answer usually more than one option exist with different costs.
At any time, additional costs must be evaluated versus the additional value of the related solution. Maintenance costs of the system and the dedicated implementation costs must be put into relation to each other.
10.2 Solution overview
To help you architect an IBM FileNet P8 solution, we provide solution templates that show how some IBM FileNet P8 implementations are deployed. IBM FileNet P8 implementation can grow over time, so the solutions that we present in this chapter are in the order such that growth over time is reflected because the architecture diagrams change between solutions.
We provide the following solution templates:
Customer Services Support
Single-site implementation. It has low volumes, has content, process, and case management. No high availability involved.
Enterprise-wide Document Management
A worldwide distributed system for all documents in the customer facing and Human Resources departments. Heavy business process and content management use. Multiple offices distributed across the world’s regions. This implementation uses scanning, electronic forms, and productivity suites that are integrated with the IBM FileNet P8 Platform to provide content creation.
Each solution template represents various areas to which the IBM FileNet P8 architecture can be applied, with differing sizes, user interfaces, and integration points, which allows us to discuss the specific points of business value and explain architectural decisions better in a practical context. The solution templates discussed are based on real-life, live IBM FileNet P8 implementations.
For each solution template, we present the type of deployment, either small or large, based on our previous customer experiences. We add a mock business scenario around it to provide explanation as to why we chose to make certain solution and architectural decisions in the design. We also list the particular business problems to be solved and how features of the IBM FileNet P8 Platform solve the problems and provide business value over and above what the customer originally envisioned.
Each solution template consists of multiple sections. The solution overview section provides high-level information that is designed to address the main business problems and identify products to be included and platform features used. As an extension of this, the future enhancements section goes beyond what is needed as part of the core solution to explore additional enhancement that can be made to drive additional value from an IBM FileNet P8 Platform implementation.
The bulk of the information is included in the solution architecture. The architecture section addresses customer and solution-specific issues that require the IBM FileNet P8 Platform to be designed in a particular way. These issues can be constraints, such as multiple sites, existing corporate software, IBM FileNet P8 products used, and sizing.
In addition to the purely IBM FileNet P8 architecture information, we also included, where applicable, information about existing industry process and data models. IBM FileNet P8 has direct mappings into the more content-centric processes and information stores of many of these models. This information provides a greater background to explain how an IBM FileNet P8 solution can fit into an organization’s IT infrastructure.
10.3 Solution template: Customer services support
This is a single-site implementation without high-availability setup. It has low volumes of small documents (letters and eForms) arriving, each with a management processes being launched. The solution requires content, process, and case-management technology. Processes are relatively long lived to approximately 30 days.
10.3.1 Scenario
A water and gas utilities company is finding it hard to fix broken pipes and respond to customer requests. This difficulty is partially due to the number of customer requests, the inconsistent process being applied by various staff of different levels of experience, and inefficient paper-based processes.
10.3.2 Business problems and their solutions
Based on the business scenario, the company has the following business problems:
Inconsistent processing
Inconsistent processes lead to missing information and incorrect assumptions and decisions being made, which cost time to fix because technical staff must second guess what the customer originally wanted; consequently, the technical staff usually does not have access to the original information.
Solution
Using IBM FileNet P8 active content technology, automatically initiate the correct complaint-handling process for a particular or general problem area, which removes initial manual handling and routing of the complaint.
Build the complaint handling process using IBM FileNet Business Process Manager, which makes the process well defined and removes the chance of it being carried out inconsistently.
Customer Service Level Agreements not being met
The company is in a regulated industry and is subject to heavy fines if customer service requests are not handled in a timely manner.
Solution
When building the complaint handling process, configure timers to escalate work based on Customer Service Level Agreement (SLA) targets. Use IBM FileNet P8 Advanced Case Management (ACM) to manage work items and automatically prioritize work within an inbasket. Using this solution the company can merge cases where the same problem was reported by multiple people, which increases the information that we have about the reported problem and increases the speed in responding to it.
Hard to find documents and lost documents
Many customer requests are mislaid in a large area that is used for document storage. Also, files are often sent to people to process and are unavailable to others when needed or when on holiday.
Solution
Use Electronic Forms (eForms) to enable customers to report issues online. This solution ensures that the maximum useful information is instantly recorded rather than waiting for paper to arrive. This electronic information ensures that the information is accessible to any permitted personnel at any time and that it is not being lost somewhere. It also means less paper to handle.
Storage costs increasing
The cost of storage increases due to more complaints.
Solution
Remove paper out of the company by introducing bulk scanning with IBM Datacap into an enterprise content management repository. Extract easy-to-identify data, such as customer name, number, account number, date of report, and customer address, which makes instant savings of cost in searching for data and paper storage. We are assuming that most of the documents will have a size not larger than 150KB and sometimes will reach 1MB. So there will be no need of a dedicated file store because all documents are stored within the database.
Costly customer servicing
Customers who phone in to ask about the status of their requests are hard to service, and the calls are often long because of the manual nature of the current work environment, which leads to high customer service costs.
Solution
Steps within the complaint handling process are configured to proactively inform customers of the status of their complaints, thus reducing the likelihood of them needing to call the customer services team and greatly improving efficiencies.
Using filters in IBM FileNet P8 ACM show active cases that match certain criteria. When customers phone in, it is easy to find their reports by searching by customer account number or area code. Also make customers’ status available online such that they can look up the status on their own.
10.3.3 Customer architectural constraints
The company has about a hundred central staff located in a single office. There are currently 2,000 letters coming in per day, Monday to Friday, arriving at 8:00 AM that need scanning as soon as possible for the workers to start handling them on the same day. This situation requires a high speed, bulk scanner and content repository that can handle high-peak ingestion rates and a constant rate of document retrieval during a day.
Integration with external systems requirement is to post code verification and email systems and a SMS text message web service.
The company’s current operating environment is Windows with a Microsoft SQL Server database and Network Attached storage exposed as a CIFS share. The company is also keen to use existing IBM hardware saved over from a previous project. No disaster recovery or high availability is required.
The company has tight time lines and is keeps implementation costs low. The internal users will use browsers to process work. The external customers are also expected to use browsers to check their status. Therefore, no client-installed applications are considered.
10.3.4 Solution architecture
Figure 10-1 shows the target single-site environment for this IBM FileNet P8 solution. Because of the relatively low demand in processing volumes and speed (per our sizing tool calculation, which we do not cover in this book), the Content Engine and Process Engine can be consolidated into one single server and is still suitable at peak times.
Figure 10-1 Core solution architecture1
Target architecture assumptions
By adding eForms as a separate channel, we assume 40% of all future complaints enter the system through the customer or call center staff completing an eForm.
Systems hardware
The Content Engine, Process Engine, and Workplace/Workplace XT, MS SQL Server, and Directory Server (are all on IBM System x3500 M3 (Windows) machines).
 
Sizing consideration
Although we do not cover the sizing of the system, we discuss some sizing-related considerations:
Consider how long you will retain the content. We assume a content population of five years of operation to be retained.
Be aware that the Content database grows large overtime because we want to record one case object for every document and the document itself. In reality, cases can be merged with the correspondences for multiple complaints that report the same issue. So the overall case number is less.
Consider whether to keep the case history audit fields for all time.
Think about the information life cycle in any system to avoid creating a digital landfill of information that you never intend to use.
Consider the additional processing demand when using eForms. By having eForms as an input mechanism, it increases load on the application server because the form is rendered one time for filling in and again later for review. This is different from incoming letters because scanning the letters includes a create document operation, rather than both a create document and render form operations as for eForms.
10.3.5 Solution processes
For this solution, we create four processes to handle the workload:
Generic Customer Request Handling process
This is launched when we cannot automatically determine the type of complaint. The job goes to an unsigned work queue for someone to manually assign the type of complaint that is used. The relevant process is then launched.
Register Complaint process
When we know the customer request is a complaint, launch this process to extract complaint information and assign to the correct department. This automatically launches the Works Required process.
Works Required process
This handles collation of similar reports and cases, assigning a works team, inspection, and correspondence with the reporting customers.
Incoming Additional Correspondence process
The Works Required process might wait for additional customer correspondence. This process handles additional correspondence documentation, files it against the appropriate case, and reawakens the main handling process.
As an example, Figure 10-2 shows the details of the Works Required process. In this process, we create a case with all of the relevant reports and process the case. For the processing case step, we assign a manager for the case, set the appropriate security, and service the client.
Figure 10-2 Works Required process
10.3.6 Future enhancements
For future improvements, the company can optionally do the following enhancements:
Configure IBM FileNet Process Simulator and IBM FileNet Case Analyzer to identify process road blocks based on current and future staffing levels. Use this in the future to improve the process and make it more efficient. Also use the gathered information for business intelligence and reporting.
Use IBM Cognos Real Time Monitoring to provide graphical feedback to managers of work at varying stages and within or out of SLA targets. Also aggregate data by location to give a pictorial view of where problems occur.
Use IBM Classification Module to determine the type of problem being reported, for example, a leak, incorrect bill, or loss of water to street. This feature makes deciding which team to send the customer complaint to be more automated, which increases work processing throughput.
Integrate the business process with an existing customer information or CRM system. This solution gives process workers access to customer contact details other than those that are provided during reporting. This solution also enables the system to spot when details change so it can initiate a Customer Details Validation and Modification process to make the internal data more accurate. Storing customer contact preferences means that your customer notifications can be sent through their preferred channel (phone, email, SMS, or post).
If all of the above components are added to the final solution, the core functionality is kept as it currently with the extra components shown in Figure 10-3 on page 349 added.
Figure 10-3 Solution with all optional components included2
10.4 Solution template: Enterprise-wide document management
This section discusses a worldwide distributed solution for all documents in the customer facing and Human Resources departments. The implementation involves heavy business process and content management use. There are multiple offices distributed across the world regions that use scanning, electronic forms, and productivity suites integrated with the IBM FileNet P8 Platform to provide content creation.
10.4.1 Scenario
The company is a multi-national company with offices and regional hubs around the world. They have a long history of using enterprise content management systems as departmental or country-wide solutions. They now want to move towards completely eliminating paper for all internal business functions.
This change involves capturing all customer-related documents, internal Human Resources information, policies and procedures, engineering documents, legal contracts, and all business-critical correspondence.
10.4.2 Business problems and their solutions
Based on the business scenario, the company has the following business problems:
No single view of all customer data across all locations
Information is fragmented and impossible to report. No single unified search for information across the organization exists.
Solution
We conducted an analysis of metadata and document classification required for customer related and Human Resources documents. The initial assessment found 60 document types with 200 metadata items shared between these classes. There are also 15 record types to consider with another 60 metadata items for these records management classes.
Future document classes will be subclassed from the above classes to enforce minimum required metadata standards. Search Templates and content indexing are set up to provide an infrastructure for finding information across all information stores and types.
These templates will also be used within business processes to find other relevant data based on initial information provided for each request, for example, a customer request for a new product can cause a business process to search for the customer’s last change of address and financial details documents. All of these are presented to internal users, which provides employees with instant access to all required information to complete their tasks without the need for manual search.
No consistent cost analysis or reporting on internal processes
Executives believe that the company can achieve a great deal of cost savings by analyzing current processes and automating tasks, where possible, which includes data validation, manual system updating, and paper-oriented tasks.
Solution
Perform a business process analysis for the Customer Onboarding, Account Opening, and Customer Maintenance processes. They identify steps that can be automated and made parallel and identify what information people must adequately use to perform human tasks.
Inconsistent information security and retention policies
There is a potential for confidential information to leak out from within the company. The company is keen to use security methods from preventing any unauthorized access. There is an existing hierarchical security classification method in the paper world that they want to apply to all content. Any leak of information can result in competitive, public opinion, financial, and criminal repercussions. These risks also occur when information is kept beyond the time it is required. Legal discovery costs hit the corporation hard in previous litigations.
Solution
The company’s information security hierarchy is implemented within IBM FileNet Content Manager as a Marking Set and applied to the Security Classification property of all documents within the system. Each document class sets the default for this property to the most relevant setting.
Security Policies are created to act as application domain Access Control Lists. An example of this is a Human Resources Document whose policy allows all Human Resources users to read metadata, but senior Human Resources users to view the actual content.
Certain countries have legislation that prevents employee data being sent internationally. To comply, we create a marking set called Country Visibility and populate it when needed. This action effectively denies access to any out-of-country users and is also useful when dealing with security conscious government customers of the company.
Customer interactions generate cases within IBM FileNet Advanced Case Management. All information is created to handle a customer request, and all correspondence are kept together in one case folder. After a vital business interaction is complete, such as contract signing or account opening, these documents and the managing processes are declared as critical business records to prove compliance.
Difficulty locating information
Employees find it hard to do their day-to-day activities because the information that they require is spread out between web-based systems (including wikis and blogs), collaboration tools, email, file shares, paper, and existing electronic repositories. The organization wants to replace these eventually, but first wants to make all of this content accessible to users through search and internal business processes.
Solution
Remove barriers in accessing information that is relevant to employees doing their job by providing federation at the content level to existing systems, migrating some systems that are not web or ECM interface accessible (such as file shares), and linking to web-based systems. Link this all together at the user interface level to provide all information that is required to make a decision on the same first summary window. Provide links to often used but not mandatory information, such as best practice guides, pricing rules, or Business Intelligence displays.
In addition to this process-orientated view of information, it is often necessary to search for answers across all existing repositories. Historically, search has broken down across vast, and various sources of information because it cannot correctly categorize and index such diverse information domains. By taking the metadata and classification described in the previous point, we can apply that to other information sources to make finding disparate but related data easier. This process can greatly improve enterprise search result accuracy by providing domain summary information to search by. At a simple level, this can be item types, such as news item or product information, but at a more detailed level you can aggregate information search results by, for example, country, product, or customer industry.
This solution also makes later transition to a common enterprise content management storage mechanism for all applications much easier to accomplish because all data is described using the same metadata schema.
Regional customization
Internationalization and localization are of paramount importance. The company promotes itself as an international player with local focus. As such, it tries to provide information in all languages, both national and local.
Information might be created in one world region, but primarily consumed in another. Bandwidth to some locations is limited. An advanced feature of this cache is the write-through caching capabilities. This means that if a user in New York checked in a document into the Los Angeles-based object store, this document is automatically cached in the New York content cache. This capability helps reduce the bandwidth usage and greatly reduce use of the network for often used, but rarely changed, documents.
Solution
By using the same underlying content and process metadata schemes, we can ensure that all content, regardless of origin, meets minimum indexing requirements. Many tools can be used to map country and language (locale) named data into this standard schema. An electronic form, for example, can have the same fields on it, but have a version translated from English with its left-to-right text into a right-to-left language with completely different local terminology and instructions. Other web user interfaces, such as IBM FileNet Advanced Case Management, can have language packs installed and detect the user’s locale to show the interface in the most appropriate language.
Reliability and scalability
If a system goes down, it directly impacts the company in terms of revenue generation and customer satisfaction. The company is keen to maintain the highest levels of uptime and resilience. They carry out over a hundred million customer transactions per day. Every minute they are down, it costs them tens of thousands of dollars.
Solution
The content caching features of IBM FileNet Content Manager can be used to cache a document after the first authorized request at a remote location. So if a document in Los Angeles was requested by an employee working on the islands in the English Channel, for example, the first time they requested it there is a lag. The next time any authorized user requests the same document, however, it is drawn from the local cache.
As we saw in previous solutions, we can provide horizontal scaling to meet high availability requirements. When a full site goes down, or is cut off from the network, another approach is required. Disaster recovery can be provided by having a duplicate server infrastructure for a site, either active or passive, at another location. When a disaster happens in the primary site, the duplicate server is made available to clients by making a network routing change and pointing all other sites to this duplicate site.
Data integrity is maintained by providing background, behind the scenes, replication of file storage and database tables that the first site uses. This implementation means that even if an entire site is unavailable for a full day because of network issues beyond the company’s control, the alternative site is made available within minutes, enabling the rest of the organization to work on this information and processes as normal, mitigating business risks due to down time.
Development and support costs
There are many existing systems that the company wants to eventually replace with off the shelf software wherever possible. As a stop gap method, however, they want to quickly replace the storage layer of their systems with an ECM persistence layer. They want to also ensure that in the future, any new systems can use the same storage layer technology. Because the company is risk adverse, they want to ensure that this layer completely abstracts the underlying content repository, which shields them from any future vendor-specific API changes or changes of vendor.
Solution
Large organizations are increasingly looking at a service-oriented architecture approach to mitigate future proofing issues to do with software upgrades, dependencies, and migrations. As such, companies created a shared service business layer that abstracts content creation and retrieval to provide a managed shared service for their organization.
This implementation results in a single, independent interface using open standards to access content. It also makes the back end architecture transparent to the application. This is practical when you must monitor and charge for storage and access to other internal teams. It also makes migrating back end storage of content much easier to administer because you only need to update the mappings to storage on the shared service business tier, not in each and every application that might use the content, which you can achieve by using either a layer or an enterprise content management system that supports SOA and federation, such as the IBM FileNet Content Manager. You can use the IBM FileNet P8 stack with its built-in Content Federation Services (CFS) mappings to other content management and heritage systems. The additional advantage, other than the maintenance cost advantage, of using CFS is that the add-on products all then have visibility over the content. This visibility is particularly useful for adding process accessible content to your application that is built on top of IBM FileNet Business Process Manager and IBM FileNet Advanced Case Management. In this scenario, federation, as opposed to a intermediary service API, can achieve greater performance and usage features.
IBM FileNet Content Manager and IBM FileNet Business Process Manager are fully accessible, which we learned in this book, through Web services and other APIs. You can even invoke and interact with individual running processes using Web services. This fits perfectly into a service-oriented architecture, as required by the Shared Service or Software as a Service (SaaS) models.
Uncontrolled file shares
There will be a migration from existing departmental and user file shares into the new system to remove duplicates and reduce storage capacity. There are currently billions of documents over 50 file servers. Bulk classification is error prone; whereas, manual classification is too costly to perform. The company cannot find a good solution to this.
Solution
Content Collector for File System migrates the more well-defined departmental file shares. Rules are developed to classify documents of particular types, names, and filing locations into specific classes. The unstructured user shares, however, require a more flexible and automated approach. IBM Classification Module can be used with Content Collector to suggest the most likely class and filing location for the documents on the share.
As we mentioned in 6.3, “IBM Classification Module” on page 144, IBM Classification Module works by teaching its neural networks with a corpus of documents of a specific classification and filing location. The larger the corpus you train with, the higher the accuracy. Any documents that the system is unsure about will be routed for manual processing. In this situation, the user sees the suggested classes and filing locations and either agree to them or choose an alternative. As IBM Classification Module learns to be more accurate over time, the accuracy score will improve, causing less and less documents to pass through the manual process.
You might choose, for example, to get 100 internal employees to ingest 100 of their documents and provide the correct classification and filing locations. This process results in a training corpus of 10000 documents. You can then use this as a basis for the automated migration, perhaps migrating a portion of your users’ content per week.
After the migration is complete, the Classification Module system can continue to be used to match new incoming documents. At this stage, it is well trained, and can be used to help users classify new documents, or by the system for incoming emails or OCR text from scanned images. This process greatly reduces user error, increases employee buy-in for using the system, and prevents users from sticking to what they are used to, that is continuing to use personal storage and file shares for new documents.
10.4.3 Customer architectural constraints
The preferred large-scale system is IBM p-Series with AIX and LPARs to support virtualization. Other preferred internal systems include Tivoli Directory Server, Tivoli Access Manager with WebSeal for SSO, WebSphere Application Server, DB2 RDBMS.
Major regional centers include Los Angeles, London, Dubai, Hong Kong, and Johannesburg. Offshore customers are handled out of New York and the Channel Islands (English Channel, UK). This setup connects to the London regional hub. Several smaller offices throughout each region, Los Angeles, London, and Hong Kong, each handle approximately 10000 employees, with Johannesburg and Dubai taking approximately 5000 each in their regions.
The company has two million customers worldwide ranging from individuals up to large multi-nationals. On average, one million customers sign up for new products and services every year with each request requiring an average of 10 documents of 40KB being supplied by the customer for the onboarding process. 80 percent of these documents are eForms. A further 20 documents are generated internally (10 eForms approvals, 10 documents to be sent out) on average, for each request before completion. Individuals account for 95 percent (950,000) of the company’s new requests with the remainder (five percent, 50,000) being from companies.
The company has 40,000 employees. Internally the company has a team of 100 Human Resources professionals (25 senior employees). Focus has 2,000 people managers, and 10,000 employees working on customer facing duties.
The client onboarding process consists of 10 independent business processes with an average of 15 human steps. Of these, five are automated in the to be process, two are data validation, two are system queries, and one is a system update. There will be an additional 10-system steps to manage content. It currently takes 30 days to complete a product onboarding request.
There are Human Resources Onboarding, Employee Leaves, Change of Details, Change of Manager, Change of Role, Employee Review, Disciplinary Action and Document Access Request processes. Typically, there is a 10 percent attrition rate among employees. Human Resource Onboarding process requires that 10 documents are imported or generated (five of these are eForms). Employee Review and Disciplinary Action processes generate five documents (all are eForms). On average, these processes have five human steps. Of these five steps,two are information validation and discovery steps.
The document access process is currently manual and paper based and is used 60,000 times a year. It also takes two weeks to complete. These both need reducing by setting default document security and implementing new streamlined employee leaving, change of role, and change of manager processes. The target processing time is five days.
10.4.4 Solution architecture
Disaster recovery can be achieved by mirroring each site at a nearby, but independent, facility. Set up disaster recovery sites as hot standby that normally do not receive user requests. High availability is designed into each site’s infrastructure.
We call Los Angeles, London, and Hong Kong high-load sites, and Dubai and Johannesburg medium-load sites.
For load modelling, we assume that all client onboarding customers are spread throughout each region, according to the number of employees at each regional office (the regional staffing is directly proportional to the number of customers in that region). These requests are handled on the new follow-the-sun operating model.
All Human Resources requests are handled in region and do not follow the follow-the-sun model, which are spread out as per the number of employees per region.
Figure 10-4 on page 358 shows the large-site system architecture. We use mainly IBM p-Series machines. The number of instances and assigned CPUs are based on sizing from IBM SCOUT (which we do not cover in this book). Although not shown, the system also needs load balancing for Workplace/Workplace XT, Content Engine, and Process Engine. We also have not included IBM Classification Module, Content Collector or Content Federation Services (CFS) in the solution diagram. In our scenario, we only talk about using these for migrating content.
Figure 10-4 Large site system architecture
In Figure 10-4, we only show servers from a sunny-day scenario, which means that we assume that no servers ever fail; therefore, we have not installed a highly-available service. In practice, thanks to clustering technologies, making a service highly available is easily facilitated by adding extra load handling nodes and has an automatic failover mechanism that is transparent to the client. Having a network layer with virtual hosts for the clustering and farming mechanism makes this transparent to the client application.

1 IBM Case Manager requires Workplace XT and is not available in Workplace only environments.
2 IBM Case Manager requires Workplace XT and is not available in Workplace only environments
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset