Chapter 8
Capacity, Availability, and Information Security Management

THE FOLLOWING ITIL PLANNING, PROTECTION, AND OPTIMIZATION CAPABILITY INTERMEDIATE EXAM OBJECTIVES ARE DISCUSSED IN THIS CHAPTER:

  • imagesCapacity, availability management, and information security management are discussed in terms of their
    • Purpose
    • Objectives
    • Scope
    • Value
    • Policies
    • Principles and basic concepts
    • Process activities, methods, and techniques
    • Triggers, inputs, outputs, and interfaces
    • Information management
    • Process roles
    • Critical success factors and key performance indicators
    • Challenges
    • Risks

The processes discussed in this chapter cover three of the four aspects of warranty; you should remember from your ITIL Foundation studies that in order for services to deliver value, they must provide both utility (the functionality required, described as fitness for purpose) and warranty (sufficient availability, capacity, service continuity, and security for the service to perform at the required level, described as fitness for use).

Capacity management ensures that the service, and the technology on which it is based, can support the patterns of business activity. Availability management ensures that the service is reliable and resilient, with any downtime kept to a level acceptable to the customer, that is, as agreed in the service level agreements. ITIL defines information security management as “the management process within the corporate governance framework, which provides the strategic direction for security activities and ensures objectives are achieved.”

We are going to examine each of these processes in turn.

Capacity Management

ITIL states that capacity management is responsible for ensuring that the capacity of IT services and the IT infrastructure is able to meet agreed current and future capacity and performance needs in a cost-effective and timely manner. The capacity management process must therefore understand the likely changes in capacity requirements and ensure that the design and ongoing management of a service meet this demand. As we have just said, sufficient capacity is a key warranty aspect of a service that needs to be delivered if the benefits of the service are to be realized.

Capacity management is considered throughout the lifecycle; as part of strategy, the likely capacity requirements for a new service are considered as part of the service evaluation to ensure that the service is meeting a real need. In design, the service is engineered to cope with that demand and to be flexible enough to be able to adjust to meet changing capacity requirements. Transition ensures that the service, when implemented, is delivering according to its specification. The operational phase of the lifecycle ensures that day-to-day adjustments that are necessary to meet changes in requirements are implemented. Finally, as part of continual service improvement, capacity-related issues are addressed and adjustments are made to ensure that the most cost-effective and reliable delivery of the service is achieved.

Purpose of Capacity Management

The purpose of the capacity management process is to understand the current and future capacity needs of a service and to ensure that the service and its supporting services are able to deliver to this level. The actual capacity requirements will have been agreed on as part of service level management; capacity management must not only meet these, but also ensure that the future needs of the business, which may change over time, are met.

Objectives of Capacity Management

The objectives of capacity management are met by the development of a detailed plan that states the current business requirement, the expected future requirement, and the actions that will be taken to meet these requirements. This plan should be reviewed and updated at regular intervals (at least annually) to ensure that changes in business requirements are considered. Similarly, any requests to change the current configuration will be considered by capacity management to ensure that they are in line with expectations or, if not, that the capacity plan is amended to suit the changed requirement. Those responsible for capacity management will review any issues that arise and help resolve any incidents or problems that are the result of insufficient capacity. This helps ensure that the service meets its objectives. An essential objective is to make sure capacity is increased or decreased in a timely manner so that the business is not impacted.

As part of the ongoing management of capacity and its continual improvement, any proactive measures that may improve performance at a reasonable cost are identified and acted on. Advice and guidance on capacity and performance-related issues are provided, and assistance is given to service operations with performance- and capacity-related incidents and problems.

Scope of Capacity Management

The capacity management process has responsibility for ensuring sufficient capacity at all times, including both planning for short-term fluctuations, such as those caused by seasonal variations, and ensuring that the required capacity is there for longer-term business expansion. Changes in demand may sometimes actually be reductions in that demand, and this is also within the scope of the process. Capacity management should ensure that as demand for the service falls, the capacity provided for that service is also reduced or managed to ensure that unnecessary expenditure is avoided.

The process includes all aspects of service provision and therefore may involve the technical, applications, and operations management functions. Other aspects of capacity, such as staff resources, are also considered.


As the “Capacity Management” sidebar illustrates, an increase in capacity requirements may have repercussions across the infrastructure and on the IT staff resources required to manage it. Although staffing is a line management responsibility, the calculation of resource requirements in this area is also part of the overall capacity management process.

Capacity management also involves monitoring “patterns of business activity” to understand how well the infrastructure is meeting the demands on it and making adjustments as required to ensure that the demand is met. Proactive improvements to capacity may also be implemented, where justified, and any incidents caused by capacity issues need to be investigated.

Capacity management may recommend demand management techniques to smooth out excessive peaks in demand. These techniques are discussed in Chapter 9, “IT Service Continuity Management and Demand Management.”

Capacity Management Value to the Business

Capacity management provides value to the business by improving the performance and availability of IT services the business needs; it does so by helping to reduce capacity- and performance-related incidents and problems. The process will also ensure that the required capacity and performance are provided in the most cost-effective manner.

All processes should be contributing in some way to the achievement of customer satisfaction, and capacity management does this by ensuring that all capacity- and performance-related service levels are met. Capacity management needs to be aware of new techniques and technologies as they become available, exploiting them to provide cost-justified performance improvements and support innovation.

Proactive capacity management activities will ensure that capacity aspects are considered during the design and transition of new or changed services. The capacity plan will ensure that business needs and future plans are taken into account when planning services.

As with availability management, capacity management will have the opportunity to improve the ability of the business to follow an environmentally responsible strategy by using green technologies and techniques.

Capacity Management Policies, Principles, and Basic Concepts

Capacity management is essentially a balancing act. It ensures that the capacity and performance of the IT services and systems match the evolving demands of the business in the most cost-effective and timely manner. This requires balancing the costs against the resources needed. Capacity management needs to ensure that the processing capacity that is purchased is cost justifiable in terms of business need. It ensures that the organization makes the most efficient use of those resources.

Capacity management is also about balancing supply against demand. It is important to ensure that the available supply of IT processing power matches the demands made on it by the business, both now and in the future. It may also be necessary to manage or influence the demand for a particular resource.

The policies for capacity management should reflect the need for capacity management to play a significant role across the service lifecycle.

It is important to ensure that capacity management is part of the consideration for all service level and operational level agreements, and of course any supporting contracts with suppliers. These agreements will capture the service requirements of the business, and capacity management should consider these for the current and future business needs.

Capacity Management Process Activities, Methods, and Techniques

Capacity management has three supporting subprocesses: business capacity management, service capacity management, and component capacity management. There are many similar activities that are performed by each of the subprocesses, but each has a very different focus. Business capacity management is focused on the current and future business requirements, whereas service capacity management is focused on the delivery of the existing services that support the business, and component capacity management is focused on the IT infrastructure that underpins service provision.

In Figure 8.1, you can see the full scope of the subprocesses, techniques, and activities for the capacity management process. We will look at each subprocess in turn.

Chart shows subprocesses in capacity management for business (strategic), service (tactical), and component (operational) leading to production of capacity plan processes and capacity management data storage ends at CMIS.

FIGURE 8.1 Capacity management subprocesses

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Business Capacity Management

The business capacity management subprocess is concerned with understanding the business plans and the implications of these on the IT infrastructure. This enables the necessary changes such as storage upgrades to be planned and implemented in time for the required capacity to be available when the business requires it. Capacity management will use existing utilization data for the current services and resources and extrapolate trends to predict future requirements. In addition, capacity management will plan the required capacity for new services due to come on-stream through the service strategy and service portfolio processes.

The capacity requirements are subject to change as the business changes. Business changes may require new services or changes to existing services or generate increased demand for those services. Some services will no longer be required, and this will free up spare capacity. These new requirements are identified by demand management, which analyzes patterns of business activity to understand how these patterns generate demand patterns for IT service. This information enables proactive capacity management to predict and satisfy future requirements.

Capacity management needs to be included in all strategic, planning, and design activities as early as possible if the services are going to be able to meet the level of demand and the performance targets set in the service level agreements. The process will model the capacity requirements for a number of different scenarios in order to understand the impact on capacity requirements. For example, such scenarios might include the impact of fewer or greater numbers of concurrent users than the original prediction.

Once the capacity requirements are understood, capacity management will recommend procurement of new capacity, if required, or other measures taken as necessary. It will also ensure that the actions taken are implemented through the change management process. It will also advise service level management what SLA performance targets are achievable.

Service Capacity Management

The service capacity management subprocess focuses on the management, control, and prediction of the end-to-end performance and capacity of the live, operational IT services usage and workloads. It ensures that the performance of all services, as detailed in service targets within SLAs and SLRs, is monitored and measured, and that the collected data is recorded, analyzed, and reported. Wherever necessary, proactive and reactive action should be instigated to ensure that the performance of all services meets their agreed-on business targets. Wherever possible, automated thresholds should be used to manage all operational services to ensure that situations where service targets are breached or threatened are rapidly identified. Service capacity management will monitor the workload to ensure that they do not exceed any specified limitations.

Service capacity management monitors changes in performance levels and assesses the impact of changes so that most issues can be predicted and acted on before the service is impacted.

Component Capacity Management

The component capacity management subprocess focuses on the management, control, and prediction of the performance, utilization, and capacity of individual IT technology components such as processors, memory, disks, network bandwidth, and network connections. It ensures that all components within the IT infrastructure that have finite resources are monitored and measured, and that the collected data is recorded, analyzed, and reported. Wherever possible, automated thresholds should be implemented to manage all components through the event management process to ensure that situations where service targets are breached or threatened by component usage or performance are rapidly identified, and cost-effective actions, such as load-balancing, are implemented to reduce or avoid their potential impact.

Capacity Management Tools

It is important to ensure that the tools used by capacity management conform to the organization’s management architecture as well as integrate with other tools used for the management of IT systems and automation of IT processes.

Service operation monitoring and control activities provide a basis for the tools to support capacity management. The IT operations management function and the technical management departments (such as network management and server management) may carry out the bulk of the day-to-day operational duties. They will participate in the capacity management process by providing it with performance information.

Reactive and Proactive Activities

Capacity management has both reactive and proactive activities (as has availability management, which we will examine later in this chapter). In Figure 8.2, you can see the activities relating to both reactive and proactive capacity management and the interaction between the subprocesses.

Chart shows service portfolio (business requirements) flows to capacity management (for business, service, and component) connected to blocks (review and improve current service, assess new requirements, and plan new capacity) and finally to CMIS.

FIGURE 8.2 Capacity management overview with subprocesses

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Capacity management should include the following proactive activities:

  • Preempting performance issues by taking the necessary actions before the issues occur
  • Producing trends of the current component utilization and using them to estimate the future requirements and for planning upgrades and enhancements
  • Modeling and trending the predicted changes in IT services (including service retirements)
  • Ensuring that upgrades are budgeted, planned, and implemented before SLAs and service targets are breached or performance issues occur
  • Actively seeking to improve service performance wherever the cost is justifiable
  • Producing and maintaining a capacity plan addressing future requirements and plans for meeting them
  • Tuning (also known as optimizing) the performance of services and components

Capacity management should include the following reactive activities:

  • Monitoring, measuring, reporting, and reviewing the current performance of both services and components
  • Responding to all capacity-related “threshold” events and instigating corrective action
  • Reacting to and assisting with specific performance issues

Ongoing Activities

A number of ongoing activities form part of the capacity management process. These activities provide the basic historical information and triggers necessary for all the other activities and processes within capacity management (see Figure 8.3).

Image shows closed loop flowchart for tuning, implementation, monitoring and analysis. Resource and service thresholds and CMIS connected to monitoring. Analysis and monitoring connected to service and resource utilization exception reports.

FIGURE 8.3 Ongoing iterative activities of capacity management

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Each service, and all of its components, should be monitored, and compared against defined thresholds, to spot trends and identify potential issues. This enables the necessary action to be taken. Recommendations can also be made for the purchase of new capacity or the adoption of new technology.

Typical monitored data includes the following:

  • Processor utilization
  • Memory utilization
  • I/O rates (physical and buffer) and device utilization
  • Queue lengths
  • Disk utilization
  • Transaction rates
  • Response times
  • Batch duration
  • Database usage
  • Index usage
  • Hit rates
  • Concurrent user numbers
  • Network traffic rates

Thresholds should be based on historical analysis of normal activity and set below the level at which the service is impacted to allow time for corrective action. If the thresholds are exceeded, alarms should be raised and exception reports produced.

The ongoing activities may include balancing services, balancing workloads, changing concurrency levels, and adding or removing resources. The monitoring reports and information about the actions recommended and taken are stored in the capacity management information system (CMIS).

Capacity Management Triggers, Inputs, and Outputs and Interfaces

Let’s consider the triggers, inputs, and outputs for the capacity management process, and its interfaces with other service management processes. Capacity management is a process that has many active connections throughout the organization and its processes. It is important that the triggers, inputs, outputs, and interfaces be clearly defined to avoid duplicated effort or gaps in workflow.

Triggers

Many triggers will initiate capacity management activities:

  • New and changed services requiring additional capacity
  • Service breaches, capacity or performance events, and alerts, including threshold events
  • Exception reports
  • Periodic revision of current capacity and performance and the review of forecasts, reports, and plans
  • Periodic trending and modeling
  • Review and revision of business and IT plans and strategies
  • Review and revision of designs and strategies
  • Review and revision of SLAs, operational level agreements, contracts, or any other agreements
  • Requests from service level management for assistance with capacity and/or performance targets and explanation of achievements

Inputs

A number of sources of information are relevant to the capacity management process:

  • Business information
  • Service and IT information
  • Component performance and capacity information
  • Service performance issue information
  • Service information
  • Financial information
  • Change information
  • Performance information
  • CMS
  • Workload information

Outputs

The outputs of capacity management are used within the process itself as well as by many other processes and other parts of the organization. The information is often reproduced in an electronic format as visual real-time displays of performance. The outputs are as follows:

  • The capacity management information system
  • The capacity plan
  • Service performance information and reports
  • Workload analysis and reports
  • Ad hoc capacity and performance reports
  • Forecasts and predictive reports
  • Thresholds, alerts, and events
  • Improvement actions

Interfaces

As we have already explained, capacity management has strong connections across the service lifecycle with a number of other processes. For example, all changes to service and resource capacity must follow all IT processes such as change, release, configuration, and project management.

The key interfaces are as follows:

  • Availability management works with capacity management to determine the resources needed to ensure the required availability of services and components.
  • Service level management provides assistance with determining capacity targets and the investigation and resolution of breaches related to service and component capacity.
  • IT service continuity management is supported by capacity management through the assessment of business impact and risk, determining the capacity needed to support risk reduction measures and recovery options.
  • Capacity management provides assistance with incident and problem management for the resolution and correction of capacity-related incidents and problems.
  • By anticipating the demand for services based on user profiles and patterns of business activity, and by identifying the means to influence that demand, demand management provides strategic decision making and critical related data on which capacity management can act.

Information Management and Capacity Management

The CMIS is used to provide the relevant capacity and performance information to produce reports and support the capacity management process. The reports provide information to a number of IT and service management processes. These should include the reports described in the following sections.

Component-Based Reports

There is likely to be a team of technical staff responsible for each component, and they should be in charge of their control and management. Reports must be produced to illustrate how components are performing and how much of their maximum capacity is being used.

Service-Based Reports

Service-based reports will provide the basis of SLM and customer service reports. Reports and information must be produced to illustrate how the service and its constituent components are performing with respect to their overall service targets and constraints.

Exception Reports

Exception reports can be used to show management and technical staff when the capacity and performance of a particular component or service becomes unacceptable. Thresholds can be set for any component, service, or measurement within the CMIS. An example threshold may be that processor utilization for a particular server has breached 70 percent for three consecutive hours or that the concurrent number of logged-in users exceeds the specified limit.

In particular, exception reports are of interest to the SLM process in determining whether the targets in SLAs have been breached. Also, the incident and problem management processes may be able to use the exception reports in the resolution of incidents and problems; for example, slow response may be traced to lack of capacity. Excess capacity should also be identified. Unused capacity may represent an opportunity for cost savings.

Predictive and Forecast Reports

Part of the capacity management process is to predict future workloads and growth. To do this, future component and service capacity and performance must be forecast. This can be done in a variety of ways, depending on the techniques and the technology used. A simple example of a capacity forecast is a correlation between a business driver and component utilization. If the forecasts on future capacity requirements identify a requirement for increased resource, this requirement needs to be input into the capacity plan and included within the IT budget cycle.

Process Roles

In Chapter 1, “Introduction to Operational Support and Analysis,” we explored the generic roles applicable to all processes throughout the service lifecycle. These are relevant to the capacity management process, but specific additional requirements also apply. Remember that these are not “job titles”; they are guidance on the roles that may be needed to successfully run the process.

Capacity Management Process Owner

The capacity management process owner’s responsibilities typically include the following:

  • Carrying out the generic process owner role for the capacity management process (see Chapter 1 for more detail)
  • Working with managers of all functions to ensure acceptance of the capacity management process as the single point of coordination for all capacity- and performance-related issues, regardless of the specific technology involved
  • Working with other process owners to ensure an integrated approach to the design and implementation of capacity management, availability management, IT service continuity management, and information security management

Capacity Management Process Manager

The capacity management process manager’s responsibilities typically include the following:

  • Carrying out the generic process manager role for the capacity management process (see Chapter 1 for more detail)
  • Coordinating interfaces between capacity management and other processes, especially service level management, availability management, IT service continuity management, and information security management
  • Ensuring adequate IT capacity to meet required levels of service
  • Providing advice on matching capacity and demand and on optimizing the use of existing capacity
  • Working with SLM to ascertain capacity requirements from the business
  • Understanding the current usage and the maximum capacity of each component
  • Modeling and sizing all proposed new services and systems
  • Forecasting future capacity requirements
  • Producing, reviewing, and revising the capacity plan, in line with the organization’s business planning cycle
  • Setting appropriate levels of monitoring of resources and systems
  • Analyzing usage and performance data, and reporting on performance against targets contained in SLAs
  • Raising incidents and problems when breaches of capacity or performance thresholds are detected, and assisting with the investigation and diagnosis of capacity-related incidents and problems
  • Performing tuning to optimize and improve capacity or performance
  • Implementing initiatives to improve resource usage—for example, demand management techniques
  • Assessing new technology to improve performance and new techniques and products that could be used to improve the process
  • Assessing all changes for their impact on capacity and performance and attending CAB meetings when appropriate
  • Reporting on current usage of resources, trends and forecasts, and performance against targets contained in SLAs
  • Testing performance of new services and systems
  • Predicting the effects of future demand on performance service levels
  • Determining achievable performance service levels that are cost-justified
  • Acting as a focal point for all capacity and performance issues

Critical Success Factors and Key Performance Indicators for Capacity Management

The following list includes some sample critical success factors for capacity management and some key performance indicators for each.

  • Critical success factor: “Accurate business forecasts”
    • KPI: Production of workload forecasts on time
    • KPI: Accuracy (measured as a percentage) of forecasts of business trends
  • Critical success factor: “Knowledge of current and future technologies”
    • KPI: Timely justification and implementation of new technology in line with business requirements (time, cost, and functionality)
    • KPI: Reduction in the use of old technology, causing breached SLAs due to problems with support or performance
  • Critical success factor: “Ability to demonstrate cost effectiveness”
    • KPI: Reduction in last-minute buying to address urgent performance issues
    • KPI: Reduction in the overcapacity of IT
  • Critical success factor: “Ability to plan and implement the appropriate IT capacity to match business needs”
    • KPI: Reduction (measured as a percentage) in the number of incidents due to poor performance
    • KPI: Reduction (measured as a percentage) in lost business due to inadequate capacity

Challenges for Capacity Management

One of the major challenges facing capacity management is persuading the business to provide information on its strategic business plans. Without this information, the IT service provider will find it difficult to provide effective business capacity management. If there are commercial or confidential reasons this data cannot be shared, it becomes even more challenging for the service provider.

Another challenge is the combination of all of the component capacity management data into an integrated set of information that can be analyzed in a consistent manner. This is particularly challenging when the information from the different technologies is provided by different tools in differing formats.

The amount of information produced by business capacity management, and especially service capacity management and component capacity management, is huge, and the analysis of this information is often difficult to achieve.

It is important that the people and the processes focus on the key resources and their usage without ignoring other areas. For this to be done, appropriate thresholds must be used, and reliance must be placed on tools and technology to automatically manage the technology and provide warnings and alerts when things deviate significantly from the norm.

Risks for Capacity Management

The following list includes some of the major risks associated with capacity management:

  • There is a lack of commitment from the business to the capacity management process.
  • There is a lack of appropriate information from the business on future plans and strategies.
  • There is a lack of senior management commitment to or a lack of resources and/or budget for the capacity management process.
  • Service capacity management and component capacity management is performed in isolation because business capacity management is difficult or there is a lack of appropriate and accurate business information.
  • The processes become too bureaucratic or manually intensive.
  • The processes focus too much on the technology (component capacity management) and not enough on the services (service capacity management) and the business (business capacity management).
  • The reports and information provided are too technical and do not give the information required by or appropriate for the customers and the business.

Availability Management

The availability of a service is critical to its value. No matter how clever it is or what functionality it offers (its utility), the service is of no value to the customer unless it delivers the warranty expected. Poor availability is a primary cause of customer dissatisfaction. Availability is one of the four warranty aspects that must be delivered if the service is to be fit for use. Targets for availability are often included in service level agreements, so the IT service provider must understand the factors to be considered when seeking to meet or exceed the availability target. This section covers how availability is measured; the purpose, objectives, and scope of availability management; and a number of key concepts.

Defining Availability

ITIL defines availability as the ability of an IT service or other configuration item to perform its agreed function when required. Any unplanned interruption to a service during its agreed service hours (also called the agreed service time, specified in the service level agreement) is defined as downtime. The availability measure is calculated by subtracting the downtime from the agreed service time and converting it to a percentage of the agreed service time.

It is important to note the inclusion of when required in the definition and the word agreed in the calculation. The service may be available when the customer does not require it; including time when the customer does not need the service in the calculation gives a false impression of the availability from the customer perspective. If customer perception does not match the reporting provided, the customer will become cynical and distrust the reports.


Keep in mind that the customer experiences the end-to-end service; the availability delivered depends on all links in the chain being operational when required. The customer will complain that a service is unavailable whether the fault is with the application, the network, or the hardware. The availability management process is therefore concerned with reducing service affecting downtime wherever it occurs. Again, it should be clearly stated in the availability reports whether the calculations are based on the end-to-end service or just the application availability. It is therefore essential to understand the difference between service availability and component availability.

Purpose of Availability Management

The purpose of the availability management process is to take the necessary steps to deliver the availability requirements defined in the SLA. The process should consider both the current requirements and the future needs of the business. All actions taken to improve availability have an accompanying cost, so all improvements made must be assessed for cost-effectiveness.

Availability management considers all aspects of IT service provision to identify possible improvements to availability. Some improvements will depend on implementing new technology; others will result from more effective use of staff resources or streamlined processes. Availability management analyzes reasons for downtime and assesses the return on investment for improvements to ensure that the most cost-effective measures are taken. The process ensures that the delivery of the agreed availability is prioritized across all phases of the lifecycle.

Objectives of Availability Management

The objectives of availability management are as follows:

  • Producing and maintaining a plan that details how the current and future availability requirements are to be met. This plan should consider requirements 12 to 24 months in advance to ensure that any necessary expenditure is agreed on in the annual budget negotiations and any new equipment is bought and installed before the availability is affected. The plan should be revised regularly to take into account any changes in the business.
  • Providing advice throughout the service lifecycle on all availability-related issues to both the business and IT, ensuring that the impact of any decisions on availability is considered.
  • Managing the delivery of services to meet the agreed targets. Where downtime has occurred, availability management will assist in resolving the incident by utilizing incident management and, when appropriate, resolving the underlying problem by utilizing the problem management process.
  • Assessing all requests for change to ensure that any potential risk to availability has been considered. Any updates to the availability plan required as a result of changes will also be considered and implemented.
  • Considering all possible proactive steps that could be taken to improve availability across the end-to-end service, assessing the risk and potential benefits of these improvements, and implementing them where justified.
  • Implementing monitoring of availability to ensure that targets are being achieved.
  • Optimizing all areas of IT service provision to deliver the required availability consistently to enable the business to use the services provided to achieve its objectives.

Scope of Availability Management

As discussed, the availability management process encompasses all phases of the service lifecycle. It is included in the design phase because the most effective way to deliver availability is to ensure that availability considerations are designed in from the start. Once the service is operational, opportunities are continually sought to remove risks to availability and make the service more robust. The activities for these opportunities are part of proactive availability management. Throughout the live delivery of the service, availability management analyzes any downtime and implements measures to reduce the frequency and length of future occurrences. These are the reactive activities of availability management. Changes to live services are assessed to understand risks to the service, and measurements are put in place to ensure that downtime is measured accurately. This continues throughout the operational phase until the service is retired.

The scope of availability management includes all operational services and technology. Where SLAs are in place, there will be clear, agreed targets. There may be other services, however, where no formal SLA exists but where downtime has a significant business impact. Availability management should not exclude these services from consideration; it should strive to achieve high availability in line with the potential impact of downtime on the business. Service level management should work to negotiate SLAs for all such services in the future—without them, it is the IT service provider who is assessing the level of availability required—but this should be a business decision. Availability management should be applied to all new IT services and for existing services where SLRs or SLAs have been established. Supporting services must be included because the failures of these services impact the customer-facing services. Availability management may also work with supplier management to ensure that the level of service provided by partners does not threaten the overall service availability.

Every aspect of service provision comes within the scope of availability management. Poor processes, untrained staff, and ineffective tools can all contribute to causing or unnecessarily prolonging downtime. The availability management process ensures that the availability of systems and services matches the evolving needs of the business.

The role of IT within businesses is critical. The availability and reliability of IT services can directly influence customer satisfaction and the reputation of the business. Availability management is essential in ensuring that IT delivers the levels of service availability required by the business to satisfy its business objectives and deliver the quality of service demanded by its customers.

Customer satisfaction is an important factor for all businesses and may provide a competitive edge for the organization. Dissatisfaction with the availability and reliability of IT service can be a key factor in customers taking their business to a competitor.

Availability can also improve the ability of the business to follow an environmentally responsible strategy by using green technologies and techniques in availability management.

Availability Management Policies

The policies of availability management should state that the process is included as part of all lifecycle stages, from service strategy to continual service improvement. The appropriate availability and resilience should be designed into services and components from the initial design stages. This will ensure not only that the availability of any new or changed service meets the expected targets, but also that all existing services and components continue to meet all their targets.

Availability policies should be established by the service provider to ensure that availability is considered throughout the lifecycle. Policies should also be established regarding the criteria to be used to define availability and unavailability of a service or component and how each will be measured.

Availability management is completed at two interconnected levels:

  • Service availability involves all aspects of service availability and unavailability. This includes the impact of component availability and the potential impact of component unavailability on service.
  • Component availability involves all aspects of component availability and unavailability.

Availability Management Principles and Basic Concepts

Availability management must align its activities and priorities to the requirements of the business. This requires a firm understanding of the business processes and how they are underpinned by the IT service. Information regarding the future business plans and priorities and therefore the future requirements of the business with regard to availability is essential input to the availability plan. Only with this understanding of the business requirement can the service provider be sure that its efforts to improve availability are correctly targeted.

The response of the IT service provider to failure can improve the customer’s perception of the service, despite the break in service. The service provider’s actions can show an understanding of the impact of the downtime on the business processes, and an eagerness to overcome the issue and prevent recurrences can reassure the business that IT understands its needs.

Additionally, the process requires a strong technical understanding of the individual components that make up each service, their capabilities, and their current performance. Through this combination of business understanding and technical knowledge, the optimal design can be delivered to produce the required level of availability to meet current and future needs.

When designing a new service and discussing its availability requirements, the service provider and the business must focus on the criticality of the service to the business being able to achieve its aims. Expenditure to provide high availability across every aspect of a service is unlikely to be justified. The business process that the IT service supports may be a vital business function (VBF), and identifying which services or parts of services are the most critical is therefore a business decision. For example, the ability of an Internet-based bookshop to be able to process credit card payments would be a vital business function. The ability to display a “customers who bought this book also bought these other books” feature is not vital. It may encourage some increased sales, but the purchaser is able to complete their purchase without it. Once these VBFs are understood, the design of the service to ensure the required availability can commence. Understanding the VBFs informs decisions regarding where expenditure to protect availability is justified.

Determining what the appropriate availability target of a service should be is a business decision, not an IT decision. However, availability comes at a price, and the service provider must ensure that the customer understands the cost implications of too high a target. Customers may otherwise demand a very high availability target (99.99% or greater) and then find the service unaffordable.

Where the cost of very high availability is justified, the design of the service will include highly reliable components, resilience, and minimal or no planned downtime.

Having considered the importance of availability to the business, in the following sections we examine some of the key availability management activities and concepts that the IT service provider may employ to cut downtime and thus deliver the required availability to the business, enabling it to achieve its business objectives.

Availability Concepts

Availability management comprises both reactive and proactive activities, as shown in Figure 8.4. The reactive activities include regular monitoring of service provisions involving extensive data gathering and reporting of the performance of individual components and processes and the availability delivered by them. Event management is often used to monitor components because this speeds up the identification of any issues through the setting of alert thresholds. It may even be possible to restart the failing service automatically, possibly before the break has been noticed by the customers. Instances of downtime are investigated, and remedial actions are taken to prevent a recurrence. The proactive activities include identifying and managing risks to the availability of the service and implementing measures to protect against such an occurrence. Where protective measures have been put in place to provide resilience in the event of component failure, the measures require regular testing to ensure that they work as designed to protect the service availability. All new or changed services should be subject to continual service improvement; countermeasures should be implemented wherever they can be cost justified. This cost justification requires an understanding of the vital business functions and the cost to the business of any downtime. It is ultimately a business decision, not a technical decision. Figure 8.4 also shows the availability management information system (AMIS); this is the repository for all availability management reports, plans, risk registers, and so on, and it forms part of the service knowledge management system (SKMS).

Chart shows availability management process with reactive activities (example: monitor, measure, analyze report, etc.), proactive activities (example: risk assessment and management, etc.) and AMIS (example: availability management reports, plan, etc.).

FIGURE 8.4 The availability management process

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.


Reliability

The first availability concept we cover is reliability. This is defined by ITIL as “a measure of how long a service, component, or CI can perform its agreed function without interruption.” We normally describe how reliable an item is by stating how frequently it can be expected to break down within a given time: “My car is very reliable. It has broken down only twice in five years.” We measure reliability by calculating the mean (or average) time between failures (MTBF) or the mean (or average) time between service incidents (MTBSI).



Reliability of a service can be improved first by ensuring that the components specified in the design are of good quality and from a supplier with a good reputation. Even the best components will fail eventually; however, the reliability of the service can be improved by designing the service so that a component failure does not result in downtime. This is another availability concept called resilience. By ensuring that the design includes alternate network routes, for example, a network component failure will not lead to service downtime because the traffic will reroute. Carrying out planned maintenance to ensure that all the components are kept in good working order will also help improve reliability.


Maintainability

However reliable the equipment and resilient the design, not all downtime can be prevented. When a fault occurs and there is insufficient resilience in the design to prevent it from affecting the service, the length of the downtime that results can be affected by how quickly the fault can be overcome. This is called maintainability and is measured as the mean time to restore service (MTRS). It may be more cost-effective to concentrate on resilience measures for those items that have a long service restoration time. To calculate MTRS, divide the total downtime by the total number of failures.


Simple measures can be taken to reduce MTRS, such as having common spares available on site, and these measures can have a significant impact on availability.

ITIL recommends the use of MTRS rather than mean time to repair (MTTR) because repair may or may not include the restoration of the service following the repair. From the customer perspective, downtime includes all the time between the fault occurring and the service being fully usable again. MTRS measures this complete time and is therefore a more meaningful measurement.

These concepts are illustrated in Figure 8.5, which shows what ITIL calls the expanded incident lifecycle. This illustrates periods of uptime with incidents causing periods of downtime. MTRS is shown as the average of the downtime for the incident. MTBF is shown as the average of the uptime for the incident.

Image shows expanded incident lifecycle with incident start, uptime, service available, downtime, service unavailable, process (diagnose, recover, detect, repair, and restore), and time: MTBSI and MTBF.

FIGURE 8.5 The expanded incident lifecycle

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Each incident needs to be detected, diagnosed, and repaired, and the data needs to be recovered and the service restored. Any method of shortening any of these steps—speeding up detection through event management or speeding up diagnosis by the use of a knowledge base, for example—will shorten the downtime and improve availability. The figure also shows another concept: MTBSI, which calculates the average time from the start of one incident to the start of the next and is used as a measure of reliability.

Serviceability

Serviceability is defined as the ability of a third-party supplier to meet the terms of its contract. This contract will include agreed levels of availability, reliability, and/or maintainability for a supporting service or component.

In Figure 8.6, you can see the terms and measures used in availability management, which are combined when applied to suppliers providing serviceability.

Flowchart of availability terms and measures flows in order: business (customers), service level agreements, service (A, B, and C), IT systems, MTBF and MTRS, internal supports teams, and suppliers.

FIGURE 8.6 Availability terms and measures

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.


Measurement of Availability

The term vital business function (VBF) is used to reflect the part of a business process that is critical to the success of the business. The more vital the business function generally, the greater the level of resilience and availability that needs to be incorporated into the design of the supporting IT services. The availability requirements for all services, vital or not, should be determined by the business and not by IT.

Certain vital business functions may need special designs. These commonly include the following functions:

High Availability A characteristic of the IT service that minimizes or masks the effects of IT component failure to the users of a service.

Fault Tolerance The ability of an IT service, component, or configuration item to continue to operate correctly after failure of a component part.

Continuous Operation An approach or design to eliminate planned downtime of an IT service. Individual components or configuration items may be down even though the IT service remains available.

Continuous Availability An approach or design to achieve 100 percent availability. A continuously available IT service has no planned or unplanned downtime.

Within the IT industry, many suppliers commit to high availability or continuous availability solutions but only if specific environmental standards and resilient processes are used. They often agree to such contracts only after additional, sometimes costly, improvements have been made.

The availability management process depends heavily on the measurement of service and component achievements with regard to availability.

The decision on what to measure and how to report it depends on which activity is being supported, who the recipients are, and how the information is to be used. It is important to recognize the differing perspectives of availability from the business, users, and service providers to ensure that measurement and reporting satisfies these varied needs.

The business perspective considers IT service availability in terms of its contribution or impact on the vital business functions that drive the business operation.

The user perspective considers IT service availability as a combination of three factors. These are the frequency, the duration, and the scope of impact. For many applications, poor response times for the user are considered at the same level as failures of technology.

The IT service provider perspective considers IT service and component availability with regard to availability, reliability, and maintainability.

It is important to consider the full scope of measures needed to report the same level of availability in different ways to satisfy the differing perspectives of availability. Measurements need to be meaningful and add value. This is influenced strongly by the combination of “what you measure” and “how you report it.”

Availability Management Process, Methods, and Techniques

We have explored the concepts and measures used in the availability management process. Figure 8.4 showed the key elements of the process, including the availability management information system. A number of different techniques can be used for availability management; we will examine them now.

Expanded Incident Lifecycle

We looked at the expanded incident lifecycle briefly earlier, in Figure 8.5, when considering availability concepts. This technique requires the analysis of the lifecycle of an incident from start to finish and the period between an incident and the next outage. Outages may not always be preventable; the availability management process seeks not only to avoid downtime, but also to minimize its duration and impact when it does occur. The duration of the downtime can be reduced by analyzing how long each step in the process takes, and then exploiting any opportunities to shorten that stage. Let’s look at each of the incident stages in turn:

Incident Detection This is time that elapses between the incident occurring and the IT service provider becoming aware of the failure. Event management tools can be very helpful in this regard, sending notification of failure, possibly before the users have become aware themselves. This enables the fault to be addressed quickly, thus reducing downtime and improving availability. Tools may have the capability to diagnose the fault and to automatically recover the service. Further information about event management is covered in Chapter 3, “Event Management, Request Fulfillment, and Access Management.”

Incident Diagnosis This is the time by which the cause of the fault has been identified. Faster diagnosis will reduce downtime and increase availability. Again, some monitoring tools can assist in gathering the necessary diagnostic data to help in problem resolution. This may occasionally delay the resolution of service, but it will enable the root cause of repeated failures to be identified and removed, thus reducing the overall downtime.

Incident Repair This is the time at which the repair has been implemented. This may be impacted by the design of the component, so care should be taken in the service design stage to choose components that can be repaired quickly. It may also be impacted by the performance of internal teams or suppliers responsible for carrying out the repair actions. The relevant operational level agreements and underpinning contracts should be monitored by service level management and supplier management for breaches and service improvement plans put in place to prevent such breaches in the future.

Incident Recovery This is the time at which component recovery has been completed. The backup and recovery requirements for the hardware, software, and data components should be identified as early as possible within the design cycle to enable appropriate recovery plans to be drawn up and tested. Wherever possible, recovery actions should be automated. Availability requirements should also contribute to determining what spare parts are kept within the definitive spares area. We discuss the storage of definitive spares in Chapter 12, “Change Management and Service Asset and Configuration Management,” which describes a storage area set aside for the secure storage of spare components and assemblies that are maintained at the same revision level as the systems within the controlled test or live environment; these are used for testing new services in the transition stage and to replace faulty equipment in the operation stage.

Incident Restoration This is the time at which normal business service is resumed. It is important that the ability to work normally is verified before the incident is closed. This may be verified by the service desk by talking to the service users. If the service is one used by the public, such as an ATM or web commerce site, visual checks of transaction throughput or user simulation scripts that validate the end-to-end service may be necessary.

By looking at each stage of the expanded incident lifecycle, we can identify delays and take action to reduce them in the future. For example, a downtime of an hour but with a repair action time of 5 minutes will identify delays such as time wasted before the fault was identified due to poorly configured event management tools, overlong diagnosis time, lack of skills, lack of documentation, and so forth. Availability management needs to work in close association with incident and problem management to ensure that repeat occurrences are eliminated.

Fault Tree Analysis

This approach uses Boolean logic, using AND and OR statements, to analyze the sequence of events that lead to a failure. These events may be characterized as basic, resulting, conditional, or trigger events. The AND statement means all the input events must occur simultaneously for the resultant event to take place (for example, when both the primary and fail-over lines must be down for the network to be down). OR is when the event occurs if any of the input events occurs. There is also an INHIBIT statement when the resulting event occurs only when the input condition is not met. Figure 8.7 gives an example of a fault tree and these statements.

Chart of fault tree analysis example with service down and outside service hours together flows to inhibit, system down, and or (which branches further to server down, application down, and network down).

FIGURE 8.7 Fault tree analysis example

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

In Figure 8.7, the service is down if the network or the server or the application is down (an example of the OR statement). The network is down if both the primary and failover lines are down (an example of the AND statement). Finally the service would not be considered down if the fault(s) occurred outside service hours (an example of the INHIBIT statement).

Component Failure Impact Analysis

Component failure impact analysis (CFIA) is a technique that considers the importance of an individual component to the provision of service. It enables the impact on services of a failing component to be predicted, in particular any single points of failure (SPOFs). This, in turn, will indicate where resilience or risk reduction measures are required to protect availability. Combined with other techniques, this approach can provide useful information for the design of future services.

CFIA is a relatively simple technique that can be used to analyze all aspects of the IT infrastructure and applications, such as hardware, network, software, applications, data centers, and support staff. Additionally, it can identify impact and dependencies on IT support organization skills and staff competencies.

The technique involves identifying which elements of the IT infrastructure configuration are to be assessed, and then creating a grid with CIs on one axis and the IT services that have a dependency on the CI on the other, as shown in Figure 8.8.

Image shows PC1 and PC2 connected to switch 1. WAN connects to data center and environment (contains server 1 connected to disk system 1 and 2, system software, and application).

FIGURE 8.8 Example of component failure impact analysis

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

The next step is to perform the CFIA and populate the grid for each component with a blank where the failure of the CI does not impact the service, an “X” when the failure of the CI would bring down the service, an “A” when there is an alternative CI to provide the service, and an “M” when there is an alternative CI but it requires manual intervention to recover the service.

The completed grid will highlight the critical CIs. Actions can then be taken to provide resilience and protect availability.

Service Failure Analysis

Service failure analysis is used as a structured approach to the analysis of an interruption. Each time an interruption takes place, full analysis is undertaken as an assignment or project to identify a preventive action. It takes a holistic view looking for improvements in technology, the IT support organization, processes, procedures, and tools. Many of its activities are closely aligned with those of problem management. Both processes aim to identify root causes and encourage cross-functional teamwork, lateral thinking, and innovative, and often inexpensive, solutions.

SFIA should use a structured approach, as shown in Figure 8.9.

Structured SFA starts with select opportunity, scope assignment, plan assignment, build hypotheses, analyze data, interview key personnel, findings and conclusions, recommendations, report, and validation.

FIGURE 8.9 The structured approach to SFA

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

The report should categorize the recommendations under the following headings:

Detection Actions to enable better event reporting to ensure that underlying IT service issues are detected early to enable a proactive response

Reduction Actions to minimize the user impact from IT service interruption, possibly by reducing the duration of the impact

Avoidance Actions to eliminate this particular cause of IT service interruption

Risk Analysis and Management

Risk analysis and management provides an analysis of the likelihood of business impact relating to availability risks (the likelihood of something happening). Business impact analysis and the identification of the potential impact of the business are vital parts of risk management. Identification of mitigation against risk is a key component of the design of services. Further information regarding techniques for risk management can be found in the relevant ISO standards, and other best practice frameworks such as the management of risk framework. One common approach to risk assessment is one described in ISO standard 31000. This assessment approach consists of three steps:

  • Risk identification creates a comprehensive list of risks that could impact the service.
  • Risk analysis involves developing a full understanding of the risks.
  • Risk evaluation decides which risks require action and the relative priorities among them.

Possible actions to address the risks may include the following:

  • Avoiding the risk by deciding not to start or continue with the risky activity
  • Deciding to take the risk in order to benefit from an opportunity
  • Removing the risk
  • Taking action to make the risk less likely to occur
  • Lessening the impact of failure

Availability Management Triggers, Inputs, Outputs, and Interfaces

We will now review the triggers, inputs, and outputs of availability management and how the process interfaces with other service management processes.

Triggers

Many events may trigger availability management activity, including the following:

  • New or changed business needs or new or changed services
  • New or changed targets within agreements, such as service level requirements, service level agreements, operational level agreements, and contracts
  • Service or component breaches, availability events, and alerts, including threshold events and exception reports
  • Periodic activities such as reviewing, revising, or reporting against services
  • Review of availability management forecasts, reports, and plans
  • Review and revision of business and IT plans and strategies
  • Review and revision of designs and strategies
  • Recognition or notification of a change of risk or impact of a business process, a vital business function, an IT service, or a component
  • Request from service level management for assistance with availability targets and explanation of achievements

Inputs

A number of sources of information are relevant as inputs to the availability management process. Some of these are as follows:

  • Business information from the organization’s business strategy, plans, and financial plans and information on its current and future requirements, including the availability requirements for new or enhanced IT services
  • Service information from the service level management process, with details of the services from the service portfolio and the service catalog; from service level targets within service level agreements and service level requirements; and possibly from the monitoring of SLAs, service reviews, and breaches of the SLAs
  • Financial information from financial management for IT services, the cost of service provision, and the cost of resources and components
  • Change and release information from the change management process with a change schedule, the release schedule from release and deployment management, and an assessment of all changes for their impact on service availability
  • Service asset and configuration management containing information on the relationships between the business, the services, the supporting services, and the technology
  • Component information on the availability, reliability, and maintainability requirements for the technology components that underpin IT service(s)
  • Technology information from the configuration management system
  • Past performance from previous measurements, achievements, reports, and the availability management information system (AMIS)
  • Unavailability and failure information from incidents and problems

Outputs

Availability management produces the following outputs:

  • The availability management information system (AMIS)
  • The availability plan for the proactive improvement of IT services and technology
  • Availability and recovery design criteria and proposed service targets for new or changed services
  • Service availability, reliability, and maintainability reports of achievements against targets, including input for all service reports
  • Component availability, reliability, and maintainability reports of achievements against targets
  • Revised risk assessment reviews and reports and an updated risk register
  • Monitoring, management, and reporting requirements for IT services and components
  • An availability management test schedule for testing all availability, resilience, and recovery mechanisms
  • The planned and preventive maintenance schedules
  • Contributions for the projected service outage (PSO) document to be created by change management in collaboration with release and deployment management
  • Details of the proactive availability techniques and measures that will be deployed
  • Improvement actions for inclusion within the service improvement plan

Availability Management Interfaces

As you would expect for this process, there are a number of interfaces across the lifecycle. In fact, availability management can be linked to the majority of the service management processes. However, the key interfaces that availability management has with other processes are as follows:

Service Level Management This process relies on availability management to determine and validate availability targets and to investigate and resolve service and component breaches. It links to both the reactive and proactive elements of availability management.

Incident and Problem Management As you have seen from the techniques used in availability measurement and management, these processes are assisted by availability management in the resolution of incidents and problems.

Capacity Management This provides appropriate capacity to support resilience and overall service availability. Strong connections exist between the availability of a service and the capacity of the service. Patterns of business activity and user profiles are used to understand business demand for IT for business-aligned availability planning.

Change Management As a result of investigations into outages, or improvements required by the business, change management supports the management of changes. This in turn is used in the creation of the projected service outage (PSO) document to project the availability-related issues during a change, with contributions from availability management.

IT Service Continuity Management Availability management works collaboratively with this process on the assessment of business impact and risk and the provision of resilience, fail-over, and recovery mechanisms. A continuity invocation is the result of an availability management issue that cannot be resolved within the agreed time frames without additional resources as described in the recovery plan.

Information Security Management Put simply, if the data becomes unavailable, the service becomes unavailable. Information security management defines the security measures and policies that must be included in the service design for availability and the design for recovery.

Access Management Availability management provides the methods for appropriately granting and revoking access to services as needed. This should be carefully monitored because unauthorized or uncontrolled access can be a significant risk to service availability.

Information Management in Availability Management

The availability management process stresses the importance of an availability management information system. Although this is shown in the process diagram (Figure 8.4) as a single database or repository, it is far more likely that the information relating to availability is captured and resides in a number of different tools and systems. The challenge, for an availability manager, is to understand these disparate sources and create a single information source that enables the production of the availability plan.

Despite claims by suppliers of availability management tools, it is unlikely that the unique requirements of an individual customer can be met by a generic toolset. Customization, adaptation, and configuration to meet the customer requirements will always be required, and the information obtained must be managed so that it is fit for use and purpose. This information, covering services, components, and supporting services, provides the basis for regular, ad hoc, and exception availability reporting and the identification of trends within the data for the instigation of improvement activities.

The availability plan should have aims, objectives, and deliverables and should consider the wider issues of people, processes, tools, and techniques as well as have a technology focus. As the availability management process matures, the plan should evolve to cover the following:

  • Actual levels of availability versus agreed levels of availability for key IT services. Availability measurements should always be business- and customer-focused and report availability as experienced by the business and users.
  • Activities being progressed to address shortfalls in availability for existing IT services. Where investment decisions are required, options with associated costs and benefits should be included.
  • Details of changing availability requirements for existing IT services. The plan should document the options available to meet these changed requirements. Where investment decisions are required, the associated costs of each option should be included.
  • Details of the availability requirements for forthcoming new IT services. The plan should document the options available to meet these new requirements. Where investment decisions are required, the associated costs of each option should be included.
  • A forward-looking schedule for the planned SFA assignments.
  • Regular reviews of SFA assignments. These reviews should be completed to ensure that the availability of technology is being proactively improved in conjunction with the SIP.
  • A technology futures section to provide an indication of the potential benefits and exploitation opportunities that exist for planned technology upgrades. Anticipated availability benefits should be detailed, where possible based on business-focused measures, in conjunction with capacity management. The effort required to realize these benefits where possible should also be quantified.

Covering a period of six months to a year, this plan is often produced as a rolling plan, continually updated to meet the changing needs of the business. At a minimum, it is recommended that publication is aligned with the capacity and business budgeting cycle and that the availability plan is considered complementary to the capacity plan and financial plan. Frequency of updates will depend on the nature of the organization and the rate of technological or business change.

The availability management information system can be utilized to record and store selected data and information required to support key activities such as report generation, statistical analysis, and availability forecasting and planning. It should be the main repository for the recording of IT availability metrics, measurements, targets, and documents, including the availability plan, availability measurements, achievement reports, SFA assignment reports, design criteria, action plans, and testing schedules.


Availability Management Process Roles

As stated earlier in this chapter, Chapter 1explored the generic roles applicable to all processes throughout the service lifecycle. These are relevant to the availability management process and are similar to the capacity management roles we discussed earlier in this chapter, but once again specific additional requirements also apply. Remember that these are not “job titles”; they are guidance on the roles that may be needed to successfully run the process.

Availability Management Process Owner

The availability management process owner’s responsibilities typically include the following:

  • Carrying out the generic process owner role for the availability management process (see Chapter 1 for more detail)
  • Working with other managers to ensure acceptance of the availability management process as the single point of coordination for all availability-related issues, regardless of the specific technology involved
  • Working with other process owners to ensure an integrated approach to the design and implementation of availability management, service level management, capacity management, IT service continuity management, and information security management

Availability Management Process Manager

The availability management process manager’s responsibilities typically include the following:

  • Carrying out the generic process manager role for the capacity management process (see Chapter 1 for more detail)
  • Coordinating interfaces between availability management and other processes, especially service level management, capacity management, IT service continuity management, and information security management
  • Ensuring that all existing and new services deliver the levels of availability agreed to by the business in SLAs
  • Validating that the final design meets the minimum specified levels of availability
  • Assisting with the investigation and diagnosis of all incidents and problems that cause availability issues or unavailability of services or components
  • Participating in the IT infrastructure design, including specifying the availability requirements for hardware and software
  • Specifying the requirements for new or enhanced event management systems for automatic monitoring of availability of IT components
  • Specifying the reliability, maintainability, and serviceability requirements for components supplied by internal and external suppliers
  • Monitoring and reporting actual IT availability achieved against SLA targets to ensure that agreed levels of availability, reliability, and maintainability are measured and monitored on an ongoing basis
  • Proactively improving service availability wherever possible, and optimizing the availability of the IT infrastructure to deliver cost-effective improvements that deliver tangible benefits to the business
  • Creating, maintaining, and regularly reviewing an availability management information system and a forward-looking availability plan aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met
  • Ensuring that the availability management process, as well as its associated techniques and methods, are regularly reviewed and audited, and that all of these are subject to continual improvement and remain fit for purpose
  • Creating availability and recovery design criteria to be applied to new or enhanced infrastructure design
  • Working with financial management for IT services, ensuring that the levels of IT availability required are cost-justified
  • Maintaining and completing an availability testing schedule for all availability mechanisms, ensuring that all availability tests and plans are tested after every major business change
  • Assisting security and IT service continuity management with the assessment and management of risk
  • Assessing changes for their impact on all aspects of availability, including overall service availability and the availability plan (this includes attending CAB meetings when appropriate)

Availability Management Critical Success Factors and Key Performance Indicators

This section includes some sample critical success factors for availability management. There are many more, and they can be obtained from the ITIL Service Operation publication, or from your own experience within your organization.

  • Critical success factor: “Manage availability and reliability of IT service”
    • KPI: Reduction (measured as a percentage) in the unavailability of services and components
    • KPI: Increase (measured as a percentage) in the reliability of services and components
    • KPI: Effective review and follow-up of all SLA, OLA, and underpinning contract breaches relating to availability and reliability
  • Critical success factor: “Satisfy business needs for access to IT services”
    • KPI: Reduction (measured as a percentage) in the unavailability of services
    • KPI: Reduction (measured as a percentage) of the cost of business overtime due to unavailable IT
  • Critical success factor: “Availability of IT infrastructure and applications, as documented in SLAs, provided at optimum costs”
    • KPI: Reduction (measured as a percentage) in the cost of unavailability
    • KPI: Improvement (measured as a percentage) in the service delivery costs

Availability Management Challenges and Risks

We’ll begin with looking at the key challenges for the process.

Challenges

The main challenge is to meet and manage the expectations of the customers and the business. The service levels should be publicized to all customers and areas of the business so that when services do fail, the expectation for their recovery is at the right level. It also means that availability management must have access to the right level of quality information on the current business need for IT services and its plans for the future.

Another challenge facing availability management is the integration of all of the availability data into an integrated set of information (AMIS). This can be analyzed in a consistent manner to provide details on the availability of all services and components. This is particularly challenging when the information from the different technologies is provided by different tools in different formats, which often happens.

Yet another challenge facing availability management is the investment needed in proactive availability measures. Availability management should work closely with ITSCM, information security management, and capacity management in producing the justifications necessary to secure the appropriate investment.

Risks

The following major risks are among those associated with availability management:

  • A lack of commitment from the business to the availability management process
  • A lack of appropriate information on future plans and strategies from the business
  • A lack of senior management commitment to or a lack of resources and/or budget for the availability management process
  • Labor-intensive reporting processes
  • Processes that focus too much on the technology and not enough on the services and the needs of the business

Another risk is evident when the availability management information system is maintained in isolation and is not shared or consistent with other process areas, especially ITSCM, information security management, and capacity management. This interaction is particularly important when considering the necessary service and component backup and recovery tools, technology, and processes to meet the specified needs.

Information Security Management

Another of the key warranty aspects of a service is security, and it is this aspect that we will discuss in this final section of the chapter. A service that is insecure will not deliver value to the customer and indeed may not be used by the customer at all.

Central to information security management (ISM) is the identification and mitigation of risks to the security of the organization’s information. The ISM process ensures that all security aspects are considered and managed throughout the service lifecycle.


Organizations operate under an overall corporate governance framework, and information security management forms part of this framework. In accordance with organization-wide governance, ISM provides guidance as to what is required, ensuring that risks are managed and the objectives of the organization are achieved.

Purpose of Information Security Management

The purpose of the information security management process is to align IT security with business security. IT and business security requires that the confidentiality, integrity, and availability of the organization’s assets, information, data, and IT services always match the specified needs of the business.

Objectives of Information Security Management

The objective of information security management is to protect the interests of those relying on information. It should also ensure that the systems and communications that deliver the information are protected from harm resulting from failures of confidentiality, integrity, and availability.

For most organizations, the security objective is met when the following terms are fulfilled:

  • Confidentiality, where information is observed by or disclosed to only those who have a right to know.
  • Integrity, where information is complete, accurate, and protected against unauthorized modification.
  • Availability, where information is available and usable when required and the systems that provide it can appropriately resist attacks and recover from or prevent failures.
  • Business transactions, as well as information exchanges between enterprises or with partners, that can be trusted. This is referred to as authenticity and, where there is control of the denial of access, nonrepudiation.

Scope of Information Security Management

The scope of ISM includes all aspects of information security that are important to the business. It is the responsibility of the business to define what requires protection and how strong this protection should be. Risks to security must be recognized, and appropriate countermeasures should be implemented. These may include physical aspects (restricting access to secure areas through swipe cards) as well as technical aspects (password policies, use of biometrics, and so on). Information security is an integral part of corporate governance.

The information security management process should be the focal point for all IT security issues. A key responsibility of the process is the production of an information security policy that is maintained and enforced and covers the use and misuse of all IT systems and services.

Information security management needs to understand the total IT and business security environment. Important aspects that must be included in the policy are the business security policy and plans along with the current business operation and its security requirements. Consideration must also be given to future business plans and requirements. External factors, such as legislative and regulatory requirements, should also be included in the policy.

IT’s obligations and responsibilities with regard to security should be contained within the service level agreements with their customers. The policy should also include reference to the business and IT risks and their management.

The information security management process should include the production, maintenance, distribution, and enforcement of an information security policy and supporting security policies. This will involve understanding the current and future security requirements of the business and the existing business security policy and plans.

The process will be responsible for implementation of a set of security controls that support the information security policy. This will support the management of risks associated with access to services, information, and systems. Information security management is responsible for the documentation of all security controls together with the operation and maintenance of the controls and their associated risks.

In association with supplier management, the process will also address the management of suppliers and contracts regarding access to systems and services.

Operationally, information security management will be involved in the management of all security breaches, incidents, and problems associated with all systems and services. It will also be responsible for the proactive improvement of security controls and security risk management and the reduction of security risks.

Information security management is also responsible for the integration of security aspects within all other IT service management processes. To achieve effective information security governance, the process must establish and maintain an information security management system (ISMS).

Information Security Management Value to the Business

Security has become a critical issue for organizations as their reliance on IT systems increases and more electronic media is used for confidential transactions within and between organizations.

Information security management ensures that an information security policy that fulfills the needs of the business security policy and the requirements of corporate governance is maintained and enforced. The information security policy provides assurance of business processes by enforcing appropriate security controls in all areas of IT. The process is responsible for the management of IT risk in line with business and corporate risk management processes and guidelines.

Information Security Management Policies

Information security management activities should be focused on and driven by an overall information security policy and a set of underpinning specific security policies.

The information security policy should have the full support of the top executive IT management. Ideally, the top executive business management should also be in support of and committed to the security policy. The policy should cover all areas of security, be appropriate, and meet the needs of the business. Email usage policies, antivirus policies, and remote access policies are examples of specific security policies.

The information security process is responsible for creating, managing, and maintaining an information security management system. The elements of the management system are shown in Figure 8.10. It begins with the identification of the customer requirements and business needs.

Image of cyclic process of ISMS, Plan (service, and operation level policies), Implement (physical network security), Evaluate (internal and external audits), Maintain (routine) and with Control at center.

FIGURE 8.10 Elements of an ISMS for managing IT security

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Planning the system incorporates use of the details and targets captured in the various agreements and contracts. It also covers use of the various policies agreed to by the business and IT.

Implementation of the system requires awareness of the policies and the systems by all who are affected by them. This will need the engagement of all parts of the organization because the policies will cover everything from personnel security to the procedures for security incidents.

The next stage is evaluation, which requires internal and external audits of the state of system security, but there may also be self-assessments. Security incidents will also be evaluated as part of this stage of the management of the system.

Maintaining the system requires that the information security process capture the lessons learned so that improvements can be planned and implemented.

The overall approach is designed to maintain control and establish a framework for managing security throughout the organization. Part of this will be to allocate appropriate responsibilities for ensuring that the information security management system is maintained, within both the IT department and the rest of the organization.

IT Security Management Process Activities, Methods, and Techniques

In this section we are going to explore the process in detail. You should make sure you are familiar with all the aspects of the process and the management requirements for each. In Figure 8.11, you can see the information security management process and its techniques and activities.

Image shows information security management process such as communicate and enforce security policies, monitor, report and review incidents, produce and maintain, assess, impose and review are interconnected and finally flows to SMIS processes.

FIGURE 8.11 Information security management process

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

The information security management process ensures that the security aspects are appropriately managed and controlled in line with business needs and risks.

Process Activities

A key activity within the information security management process is the production and maintenance of an overall information security policy and a set of supporting specific policies. The process is also responsible for the communication, implementation, and enforcement of the security policies, including the provision of advice and guidance to all other areas of the business and IT on all issues related to information security.

Information security management is also responsible for the assessment and classification of all information assets and documentation. The process covers the implementation, review, revision, and improvement of a set of security controls as well as risk assessment and responses, including assessment of the impact of all changes on information security policies, controls, and measures. Where possible, if it is in the business’s interest and the cost is justifiable, the process should implement proactive measures to improve information security.

Monitoring and management of all security breaches and major security incidents is a key part of the information security management process. This includes the analysis, reporting, and reduction of the volume and impact of security breaches and incidents.

The process is also responsible for scheduling and completing security reviews, audits, and penetration tests. The outputs from the process will be captured and recorded in the security management information system.

Security Strategy

The information security management process, together with the procedures, methods, tools and techniques, constitute the security strategy. It is the responsibility of the security manager to ensure that technologies, products, and services are in place, together with the published security policy. The security manager is also responsible for security architecture, authentication, authorization, administration, and recovery.

A key challenge for information security is to embed good security practices into every area of the business. Ensuring secure behavior depends on training and awareness. Security practices need to be easy to follow, if they are to be accepted. As technology changes, whether it is the trend toward bring your own device (BYOD) or computing in the cloud, new security challenges emerge. Information security management must understand these challenges and prepare to meet them.

Security Controls

Information security must be considered as an integral part of all services and systems. It needs to be continuously managed using a set of security controls that enforce the policy and minimize threats. These controls should be considered during the design of new services or changes to existing services; this is much easier and more cost effective than trying to apply them later when the service is live. Once the controls are in place, the day-to-day activities required will usually be carried out by access management. This process is covered in Chapter 3.

Security measures can be used at a specific stage in the prevention and handling of security incidents, as illustrated in Figure 8.12. The majority of such incidents are not technical threats, such as deliberate denial-of-service attacks or attempts to hack into the systems. Most breaches are the result of human errors and may even be accidental; they may involve other threats such as safety, legal, or health.

Image shows process of security controls are evaluation or reporting of threat, incident, damage, and control through prevention, detection and correction between each step.

FIGURE 8.12 Security controls for threats and incidents

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

A threat can be anything that disrupts business processes or has negative impact on the business. First there is a risk of a threat actually occurring. Should this happen, it is termed a security incident. This may result in damage (to information or to assets) that has to be repaired or otherwise corrected. Each of these stages requires the appropriate measures to be taken. The choice of measures will depend on the importance attached to the information. These measures should all be documented in the information security management system. They include the following types:

Preventive These security measures aim to prevent a security incident from occurring. An example of this type of measure is restricting access rights to a limited group of authorized people. This requires procedures to control access rights (granting, maintenance, and withdrawal of rights), authorization (identifying who is allowed access to which information and using which tools), identification and authentication (confirming who is seeking access), and access control (ensuring that only authorized personnel can gain access).

Reductive These measures reduce the potential damage of a security breach. One example is ensuring that regular backups are taken in case of any data loss. Ensuring that tested contingency plans are in place is another reductive measure.

Detective It is essential that any security breach is detected quickly; detective measures are concerned with discovering breaches as soon as possible. One example is the use of monitoring tools, linked to an alert procedure. Another example is virus-checking software, which will detect and virus infection.

Repressive These security measures counteract any continuation or repetition of the security incident. One example is when an account or network address is temporarily blocked after numerous failed attempts to log on or when a card is retained when multiple attempts are made with a wrong PIN.

Corrective These measures seek to repair any damage caused. Examples include restoring the backup, or backing out/rolling back to a previous stable situation.

Management of Security Breaches and Incidents

Whenever a serious security breach or incident occurs, it must be handled at the time, but also reviewed afterward to determine what went wrong, what caused it, and how it can be prevented in the future. This evaluation should not be restricted to serious security incidents, however; all breaches of security and security incidents need to be studied to understand the effectiveness or otherwise of the security measures as a whole. Every security incident must be logged as such to enable reporting and analysis. This analysis will also require other evidence, such as log files and audit files, in addition to the security incident record. Information security management should work with problem management to identify the root cause of these incidents and to take the required improvement actions to overcome weaknesses and prevent such incidents from recurring.

Information Security Management Triggers, Inputs, and Outputs

Let’s consider the triggers, inputs, and outputs for the information security management process. Information security management is a process that has many active connections throughout the organization and its processes. It is important that the triggers, inputs, outputs, and interfaces be clearly defined to avoid duplicated effort or gaps in workflow.

Triggers

Information security management activity can be triggered by many events, including these:

  • Changes in statutory or regulatory requirements
  • New or changed corporate governance guidelines
  • New or changed business security policy
  • New or changed corporate risk management processes and guidelines
  • New or changed business needs and new or changed services
  • New or changed requirements within agreements, such as service level requirements, service level agreements, operational level agreements, and contracts
  • Review and revision of business and IT plans and strategies
  • Review and revision of designs and strategies
  • Service or component security breaches or warnings, events, and alerts, including threshold events and exception reports
  • Periodic activities such as reviewing, revising, and reporting, including reviewing and revising information security management policies, reports, and plans
  • Recognition or notification of a change of risk or impact of a business process or vital business functions, an IT service, or a component
  • Requests from other areas, particularly service level management, for assistance with security issues

Inputs

Information security management will need to obtain input from many areas:

  • Business information from the organization’s business strategy, plans and financial plans, and information on its current and future requirements
  • Governance and security from corporate governance and business security policies and guidelines, security plans, and risk assessment and responses
  • IT information from the IT strategy, plans, and current budgets
  • Service information from the SLM process with details of the services from the service portfolio
  • Risk assessment processes and reports from ISM, availability management, and ITSCM
  • Details of all security events and breaches—from all areas of IT and IT service management, especially incident management and problem management
  • Change information from the change management process
  • The configuration management system containing information on the relationships between the business, the services, supporting services, and the technology
  • Details of partner and supplier external access to services and systems from supplier management and availability management

Outputs

The following outputs are produced by the information security management process and used in all areas:

  • An overall information security management policy, together with a set of specific security policies
  • A security management information system (SMIS) containing all the information related to information security management
  • Revised security risk assessment processes and reports
  • A set of security controls with details of their operation and maintenance and their associated risks
  • Security audits and audit reports
  • Security test schedules and plans, including security penetration tests and other security tests and reports
  • A set of security classifications and a set of classified information assets
  • Reviews and reports of security breaches and major incidents
  • Policies, processes, and procedures for managing partners and suppliers and their access to services and information

Information Security Management Interfaces

The key interfaces that information security management has with other processes are as follows:

Service Level Management Information security management provides assistance with determining security requirements and responsibilities and their inclusion within SLRs and SLAs, together with the investigation and resolution of service and component security breaches.

Access Management This process is responsible for following the security policies when granting and revoking access.

Change Management Information security management (ISM) assesses every change for impact on security and security controls. ISM may also detect and report unauthorized changes that resulted from security breaches.

Incident and Problem Management ISM assists with the resolution of security incidents and investigation of security problems. Incident management must be able to recognize and deal with security incidents.

IT Service Continuity Management Information security management works with ITSCM on the assessment of business impact and risk, and the provision of resilience, fail-over, and recovery mechanisms. Security must also be considered when continuity plans are tested or invoked.

Service Asset and Configuration Management An accurate CMS is a prerequisite for security classification of configuration items.

Availability Management ISM ensures that the integrity of data is protected; without this assurance, the ability of the service to perform its agreed function is compromised.

Capacity Management Security aspects must be considered when selecting and introducing new technology.

Financial Management for IT Services This process should ensure that adequate funds are provided to finance security requirements.

Supplier Management Security management works with supplier management to define contractual terms and conditions and enforce controls over supplier access to services and systems.

Legal and Human Resources Issues As stated earlier, security breaches are often the result of human actions, accidental or deliberate. ISM activity should therefore be integrated with these corporate processes and functions.

Information Management in Information Security

All the information required by information security management should be contained within the security management information system (SMIS). This includes information regarding security controls, risks, breaches, processes, and reports, covering all IT services and components. The SMIS should be integrated and maintained in alignment with all other management information systems, particularly the service portfolio and the CMS. The SMIS provides the input to security audits and reviews and to the continual improvement activities. The SMIS also provides input to the design of new systems and services.

Information Security Process Roles

As you will recall, Chapter 1explored the generic roles applicable to all processes throughout the service lifecycle. These are relevant to the information security management process and are similar to the capacity and availability management roles we discussed earlier in this chapter, but once again specific additional requirements also apply. Remember that these are not “job titles”; they are guidance on the roles that may be needed to successfully run the process.

Information Security Management Process Owner

The information security management process owner’s responsibilities typically include the following:

  • Carrying out the generic process owner role for the information security management process
  • Working with the business to ensure proper coordination and communication between organizational (business) security management and information security management
  • Working with managers of all functions to ensure acceptance of the information security management process as the single point of coordination for all information security–related issues, regardless of the specific technology involved
  • Working with other process owners to ensure an integrated approach to the design and implementation of information security management, availability management, IT service continuity management, and organizational security management

Information Security Management Process Manager

The information security management process manager’s responsibilities typically include the following:

  • Carrying out the generic process manager role for the information security management process
  • Coordinating interfaces between information security management and other processes, especially service level management, availability management, IT service continuity management, and organizational security management
  • Developing and maintaining the information security policy and supporting specific policies, ensuring appropriate authorization, commitment, and endorsement from senior IT and business management
  • Communicating and publicizing the information security policy to all appropriate parties
  • Ensuring that the information security policy is enforced and adhered to
  • Identifying and classifying IT and information assets (configuration items) and the level of control and protection required
  • Assisting with business impact analyses
  • Performing security risk assessment and risk management in conjunction with availability and IT service continuity management
  • Designing security controls and developing security plans
  • Developing and documenting procedures for operating and maintaining security controls
  • Monitoring and managing all security breaches and handling security incidents, taking remedial action to prevent recurrence wherever possible
  • Reporting, analyzing, and reducing the impact and volumes of all security incidents in conjunction with problem management
  • Promoting education and awareness of security
  • Maintaining a set of security controls and documentation, and regularly reviewing and auditing all security controls and procedures
  • Ensuring that all changes are assessed for impact on all security aspects, including the information security policy and security controls, and attending CAB meetings when appropriate
  • Ensuring that security tests are performed as required
  • Participating in any security reviews arising from security breaches and instigating remedial actions
  • Ensuring that the confidentiality, integrity, and availability of the services are maintained at the levels agreed to in the SLAs and that they conform to all relevant statutory requirements
  • Ensuring that all access to services by external partners and suppliers is subject to contractual agreements and responsibilities
  • Acting as a focal point for all security issues

Critical Success Factors and Key Performance Indicators for Information Security Management

The following list includes some sample critical success factors for information security management.

  • Critical success factor: “The protection of business against security violations”
    • KPI: Decrease (measured as a percentage) in security breaches reported to the service desk
    • KPI: Decrease (measured as a percentage) in the impact of security breaches and incidents
  • Critical success factor: “The determination of a clear policy, integrated with the needs of the business”
    • KPI: Decrease in the number of nonconformances of the information security management process with the business security policy and process
  • Critical success factor: “Effective marketing and education in security requirements, and IT staff awareness of the technology supporting the services”
    • KPI: Increased awareness throughout the organization of the security policy and its contents
    • KPI: Increase (measured as a percentage) in completeness of supporting services against the IT components that make up those services
  • Critical success factor: “Clear ownership and awareness of the security policies among the customer community”
    • KPI: Increase (measured as a percentage) in acceptable scores on security awareness questionnaires completed by customers and users

Challenges for Information Security Management

One of the biggest challenges is to ensure that there is adequate support from the business, business security, and senior management. It is pointless to implement security policies, procedures, and controls in IT if they cannot be enforced throughout the business. The major use of IT services and assets is outside IT, and so are the majority of security threats and risks.

If a business security process is established, then the challenge becomes alignment and integration. Once there is alignment, the challenge becomes keeping them aligned by management and control of changes to business methods and IT systems using strict change management and service asset and configuration management control. Again, this requires support and commitment from the business and from senior management.

Risks for Information Security Management

Information systems can generate many direct and indirect benefits—and as many direct and indirect risks. This means that there are new risk areas that could have a significant impact on critical business operations:

  • Increasing requirements for availability and robustness
  • Growing potential for misuse and abuse of information systems affecting privacy and ethical values
  • External dangers from hackers, leading to denial-of-service and virus attacks, extortion, industrial espionage, and leakage of organizational information or private data
  • A lack of commitment from the business
  • A lack of senior management commitment
  • The processes focusing too much on the technology issues and not enough on the IT services and the needs and priorities of the business
  • Risk assessment and management performed in isolation and not in conjunction with availability management and ITSCM
  • Information security management policies, plans, risks, and information becoming out of date and losing alignment with the corresponding relevant information and plans of the business and business security
  • Security policies becoming bureaucratic and/or excessively difficult to follow, discouraging compliance
  • Security policies adding no value to business

Summary

This chapter explored the processes of capacity, availability, and information security management. It covered the purpose and objectives for each process in addition to the scope.

We looked at the value of the processes. Then we reviewed the policies for each process and the activities, methods, and techniques, and the specific roles for each process.

Last, we reviewed triggers, inputs, outputs, and interfaces for each process and the information management associated with it. We also considered the critical success factors and key performance indicators and the challenges and risks for the processes.

We examined how each of these processes supports the other and the importance of these processes to the business and the IT service provider.

Exam Essentials

Understand the purpose and objectives of availability, capacity, and information security management. It is important for you to be able to explain the purpose and objectives of these processes. Availability management should ensure that the required availability is delivered to meet the targets in the service level agreement. Capacity management is concerned with the current and future capacity of services to the business. Information security management is concerned with the protection of information and data according to the security requirements of the business.

Understand the critical success factors and key performance indicators for the processes. Measurement of the processes is an important part of understanding their success. You should be familiar with the CSFs and KPIs for capacity, availability, and information security management.

Understand the definition of availability. ITIL defines availability as the ability of an IT service or other configuration item to perform its agreed function when required. Any unplanned interruption to a service during its agreed service hours (also called the agreed service time, specified in the service level agreement) is defined as downtime. The availability measure is calculated by subtracting the downtime from the agreed service time and converting it to a percentage of the agreed service time.

Explain the different concepts of availability management. You need to be able to differentiate between reliability, maintainability, and serviceability. Reliability is defined by ITIL as “a measure of how long a service, component, or CI can perform its agreed function without interruption.” Maintainability is measured as the mean time to restore service (MTRS). Serviceability is defined as the ability of a third-party supplier to meet the terms of its contract. This contract will include agreed levels of availability, reliability, and/or maintainability for a supporting service or component.

Understand and differentiate between the methods and techniques of availability management. A number of different techniques can be used for availability management. Ensure that you are familiar with each of them and can explain the purpose of each.

Explain the role of information management in availability management. Information is key to the service lifecycle, so you need to understand the content of the availability management information system and its use throughout the lifecycle.

Understand the iterative activities of capacity management. Capacity management has both proactive and reactive activities. These include monitoring, tuning, and analysis, which may be carried out as part of a proactive or reactive approach.

Understand the subprocesses of capacity management. Business capacity management is concerned with the business requirements and understanding business needs. Service capacity management is concerned with the capacity of services to fulfill the needs of the business. Component capacity management is concerned with the technical aspect of capacity management and the capacity of individual service components.

Understand the approach to security management. Plan, implement, evaluate, maintain, and control—ensure that you can explain how each of these stages supports the approach to the management of information security.

Understand the process of information security management. Ensure that you are able to explain the various steps of the process and their relationship to the information security management system.

Review Questions

You can find the answers to the review questions in the appendix.

  1. Which of the following are responsibilities of capacity management?

    1. Negotiating capacity requirements to be included in the SLA
    2. Monitoring capacity
    3. Forecasting capacity requirements
    4. Dealing with capacity issues
      1. 2, 3, and 4
      2. 1 and 2 only
      3. All of the above
      4. 1, 2, and 4
  2. Capacity management includes three subprocesses. What are they?

    1. Service capacity, business capacity, component capacity
    2. System capacity, business capacity, component capacity
    3. Service capacity, business capacity, configuration capacity
    4. System capacity, business capacity, infrastructure capacity
  3. Which of the following shows the correct description for the business capacity management subprocess?

    1. It considers the capacity of staff resources to support new services.
    2. It provides a view of the detailed information relating to the performance management of technical assets.
    3. It provides a view of the future plans and requirements of the organization.
    4. It provides a view of the service performance achieved in the operational environment.
  4. True or False? Capacity management has both reactive and proactive activities.

    1. True
    2. False
  5. Which of these statements is/are correct?

    1. Risk management is a vital part of both capacity and information security management.
    2. Both capacity management and information security management are cyclic processes.
      1. Statement 1 only
      2. Statement 2 only
      3. Both statements
      4. Neither statement
  6. Which of these is the key purpose of the information security management process?

    1. Create and maintain an information security policy
    2. Deliver guidance to the operational processes on security issues
    3. Support supplier management in maintaining security concerns in contracts
    4. Manage the information security management information system
  7. Which of these statements is/are correct?

    1. Plan is an element of the information security management system.
    2. Maintain is an element of the information security management system.
      1. Statement 1 only
      2. Statement 2 only
      3. Both statements
      4. Neither statement
  8. Where does information security management keep information about security?

    1. ISDB
    2. IMSS
    3. KEDB
    4. SMIS
  9. Which of the following concepts are key to availability management?

    1. Reliability
    2. Resilience
    3. Resistance
    4. Attainability
    5. Serviceability
    6. Maintainability
    7. Detectability
      1. 1, 2, 6, 7
      2. 2, 3, 5, 6
      3. 1, 4, 6, 7
      4. 1, 2, 5, 6
  10. Availability management considers VBFs. What does VBF stand for?

    1. Viable business factors
    2. Vital business function
    3. Visibility, benefits, functionality
    4. Vital business facilities

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset