Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 36
Service Operation Processes: Event Management

THE FOLLOWING ITIL INTERMEDIATE EXAM OBJECTIVES ARE DISCUSSED IN THIS CHAPTER:

✓ Event management is discussed in terms of its
- Purpose
- Objectives
- Scope
- Value
- Policies
- Principles and basic concepts
- Process activities, methods, and techniques
- Triggers, inputs, outputs, and interfaces
- Critical success factors and key performance indicators
- Challenges
- Risks

Modern infrastructure management depends to a large extent on the use of event monitoring tools. These tools are able to monitor large numbers of configuration items simultaneously, identifying any issues as soon as they arise and notifying technical management staff. The process of event management is responsible for managing events throughout their lifecycle. Event management is one of the main activities of IT operations.

Definitions

To begin, let’s consider some definitions from the ITIL Service Operation publication. These should be familiar from your Foundation course.

An event can be defined as any change of state that has significance for the management of a configuration item (CI) or IT service. Remember, an event is not necessarily an indication that something is wrong; it can merely be a confirmation that the system is working correctly. Many events are purely informational. Informational events could include notification of a user logging onto an application (significant because the use of the application may be metered) or a transaction completing successfully (significant because the notification of the successful completion may trigger the start of the next transaction).

An event that notifies staff of a failure or that a threshold has been breached is called an alert. An alert could be, for example, notification that a server has failed or a warning that the memory or disk usage on a device has exceeded 75 percent. If you consider these concepts in a non-IT environment, a car console may issue an event to say that the system has successfully connected to a Bluetooth device, or it might raise an alert (together with a beep or flashing light) to warn that a threshold has been breached and the car is now low on gas.

Effective service operation is dependent on knowing the status of the infrastructure and detecting any deviation from normal or expected operation. Event management monitors services for any occurrences that could affect their performance. It also provides information to other processes, including incident, problem, and change management.

There are two types of event monitoring tools:

Active monitoring tools monitor configuration items or IT services by automated regular checks to discover the current status. The tool sends a message and expects a positive response within a defined time, such as sending a ping to a device. This is called polling of devices, and it is done to check that they are working correctly. Support staff will be notified of a failure to respond. Some tools will have automated responses to such situations, perhaps automatically restarting a device or rerouting data to avoid the faulty CI so that the service is not affected.
Passive monitoring tools do not send out polling messages. They detect events generated by CIs and correlate them; that is, they identify related events. They rely on an alert or notification to discover the current status. Such notifications could include error messages.

Purpose

The purpose of event management is to detect events, understand what they mean, and take action if necessary.

Many devices are designed to communicate their status, and event monitoring will gather these communications and act upon any that need action. Some communications report operational information, such as “backup of file complete,” “print complete,” and so on. These events show that the service is operating correctly. They can be used to automate routine activities such as submitting the next file to be backed up or the next document to be printed. They may also be used to monitor the load across several devices, issuing automated instructions to balance the load depending on the events received. If the event is an alert, such as “backup failed,” “printer jam,” or “disk full,” the necessary corrective steps will be taken. An incident should be logged in the case of a failure.

Objectives

Event management has the following objectives:

It enables all significant changes of state for a CI or service to be detected. Event management should determine the appropriate control action for each event and ensure that they are communicated as necessary.
The process provides the trigger for the automatic execution of many service operation processes and operations management activities. For example, a notification of a failure in the infrastructure would trigger the incident management process. By providing information when thresholds have been breached (for example, when a service has failed to respond within the agreed time), an event enables service level management to compare the actual operating performance against the SLA. The actual performance can also be compared to what was expected and planned for during the design stage.
It triggers automated processes or activities in response to certain events. This may include automatically logging an incident in the service management tool in the event of a failure.
Finally, the data gathered by event management forms the basis for service assurance and reporting and for comparing performance before and after a service improvement has been implemented.

Scope

Event management can be applied to any aspects of service management that need to be controlled and that could benefit from being automated. For example, the service management toolset automatically logs incidents in response to emails or events being received, escalates incidents when thresholds have been reached, and notifies staff of certain conditions (for example, a priority one incident being logged).

Configuration items can be monitored by event management tools; this monitoring can be for two different reasons:

Some CIs will be monitored to make sure they are constantly available. An example of this is where action needs to be taken as soon as a CI such as a network device fails to respond to a ping.
Other CIs may need to be updated frequently. This updating can be automated using event management, and the CMS can be automatically updated to show the new state.

Tracking licenses is another possible use for event management tools; licenses can be tracked to make sure there is no illegal use of an application by checking to see that the number of people using the software does not exceed the licenses held. This may also save money; by showing that there is less demand for concurrent use than was thought, the number of licenses can be reduced.

Monitoring for and responding to security events, such as detecting intruders, is another use; the tools can also be used to detect a denial of service attack or similar event.

Another use is the monitoring of environmental conditions. This might be for detecting a sudden increase in temperature in the server room or for other environmental changes.

Using Event Management to Preempt a Major Incident

A large transport organization installed event monitoring across its infrastructure, including monitoring the server room environments. A screen showing current events was installed at the service desk. On the second day after this was implemented, its value was proved. The service desk called the head office 150 miles away to ask the staff there to check the server room, because there were environmental alerts showing on the screen. The head office staff entered the server room to find that the air conditioning had failed and the room was extremely warm. Had the temperature increased much more, the servers would have failed, causing major disruption to the services. The head office staff members were able to avert the incident by using fans to lower the temperature until the air-conditioning engineer arrived to fix the fault.

Value

Event management offers many benefits to a business:

Being able to carry out extensive monitoring without requiring a lot of staff. Using staff to monitor when errors may occur only occasionally would not be making best use of their skills.
Errors would be identified faster, enabling an automated response, which would reduce downtime and its resultant costs to the business.
Near-capacity situations would be identified before it is too late, giving time to take action.
Event monitoring removes the need for repeated checks to be carried out on devices; it reduces effort by requiring responses only to exceptions.
Event management, like other automation, takes place constantly, whereas staff may have other duties and may therefore miss something. It is also less error prone.
Event management provides historical data to enable the identification of trends and potential problems.

Policies

Next, we consider suggested policies for event management.

The first policy states that event notifications should go to only those who have responsibility for acting on them. This means that a target audience must be identified for every event that we have chosen to handle—it is not acceptable to send a notification to everyone and hope that someone will do something.

The second policy relates to the centralization of event management. This ensures that notifications are handled consistently, that none are missed, and that none are handled by more than one person or team. It implies that a single rules engine will be used to process notifications, and of course that set of rules should be subject to change management.

The third policy provides guidance and constraints for the designers of new applications. There should be a common set of standards for events generated by applications. This will ensure consistency across applications and, of course, reduce the time to engineer event handling in new applications.

The next policy is that the handling of events should be automated as much as possible. The advantages of automation in general are well-known: reduced costs, fewer errors, and so on.

The fifth policy mandates the use of a standard classification scheme for events to ensure that similar types of events are handled in a consistent way.

The last policy states that all recognized events should at the very least be logged. This will provide a source of valuable information that might have a number of uses, for example, in problem investigation. A more sophisticated analysis of logged events might identify patterns of events that can be used to predict failures before they actually occur.

Principles and Basic Concepts

It is important to understand the difference between the two similar activities of monitoring and managing events. These are similar processes, but with specifically different emphasis.

Monitoring and Event Management

We need to monitor events, but monitoring covers more than events. Monitoring can be used, for example, to make sure devices are operating correctly, even without any events being generated. Monitoring actually looks for conditions that do not generate events.

Event management is about having useful notifications about the status of the IT infrastructure and services. Event management sets up rules to ensure that events are generated so they can be monitored, captured, and acted upon if necessary. Action is the key to event management.

The particular notifications themselves may be vendor specific. However, they are likely to use Simple Network Management Protocol (SNMP), which is an Internet standard protocol for managing devices on IP networks. Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks, and more. Because SNMP is an open standard, it makes interaction between different products simpler. Events must generate useful notifications. The time taken to create meaningful descriptions, with suggested actions, will save enormous effort later.

Event management can be enormously useful in managing large and complex infrastructures. It is often the case, however, that the full value of these tools is not realized. This is usually because there has been insufficient time spent making sure they are configured correctly to only notify staff of events for which they actually need notification. Failing to specify the correct thresholds, for example, will mean that far too many breaches are reported, causing staff to ignore them because they are seldom significant. Of course, this means that significant events are missed. If events are not filtered properly, the service management tool would be flooded with multiple spurious events, which would make it difficult to use its ability to automatically raise incidents.

Another important definition is that of an alert: An alert is a warning that a threshold has been reached, something has changed, or a failure has occurred. Alerts are often created and managed by system management tools and the event management process. Creating an alert when a disk or mailbox is nearly full is one such example.

Informational, Warning, and Exception Events

Some events indicate a failure that must be fixed, while others simply flag that something has happened and should be recorded. These are two types of event: the first is an exception and the second informational. There is a third type—a warning event. A warning event signifies unusual but not necessarily exceptional behavior. Warning events require further analysis to determine whether any action is required. Events will be handled according to their type.

Here are some examples of each type of event:

Informational
- A scheduled workload has completed.
- A user has logged in to use an application.
- An email has reached its intended recipient.
Warning
- A server’s memory utilization reaches within 5 percent of its highest acceptable performance level.
- The completion time of a transaction is 10 percent longer than normal.
Exception
- A user attempts to log on to an application with an incorrect password.

Notice that not all of these examples relate to a failure. (Failures would be alerts.) Some simply contain information, but information that for some reason it is important to record. For example, the business might want to maintain a record of who is using an application for audit purposes.

There are no definitive criteria for determining the type of an event; it depends very much on the specific situation of the organization. For example, an event might be that a previously unknown device has been detected on the network. Some organizations allow their staff to attach their own laptops to the corporate network, in which case the event would be informational. In a highly secure organization, it would almost certainly be treated as an exception.

Filtering

The next topic we’ll examine is event filtering. We don’t have complete control over the notifications that are generated by the configuration items. The manufacturers of hardware will have decided what notifications will be generated, and they may not have provided their customers with the ability to switch them off. A common experience when first beginning to monitor networks is that the monitoring tool is swamped by unrecognized and therefore unneeded notifications.

Filtering prevents the event management system from being overwhelmed by discarding notifications of events that have no significance to the organization.

There are four possible approaches to the problem:

The first approach is to integrate event management into each service management process. What this means is that each process will identify the events that it is interested in.
The next approach is to include event management requirements into the design of new services.
A third approach is to use trial and error—evaluate notifications on a case-by-case basis and adjust the filtering accordingly.
The final approach is to plan the introduction of event management within a formal project.

These approaches are not mutually exclusive; many organizations will adopt some hybrid of them.

Designing for Event Management

Successful event management in service operation requires analyzing and planning for what will be required. This should happen in the service design phase, although it will continue to be adjusted in service operation. Many organizations attempt and abandon event management, or they fail to achieve real value from it because this crucial design phase has been neglected.

The following questions should be asked when designing a service or planning the introduction of new technology:

What needs to be monitored?
What type of monitoring is needed?
When should events be generated?
What information should be communicated?
Who are messages intended for?
Who will respond to the event?

When event management is first established, these questions should be asked about the existing services and infrastructure. Stakeholders who must be consulted include the business, process owners, and operations management staff. Each of these groups will have monitoring requirements, and each could be involved in handling events when they occur.

Instrumentation

Instrumentation refers to specific ways to monitor and control the infrastructure and services. A number of practical issues need to be addressed when designing an event management system:

How will events be generated? In the case of bought-in components, of course, this question is really, How are events generated?
How will they be classified? This is not straightforward, and classifications can change over time.
How will they be communicated and escalated? How exactly will the events get to the appropriate function? For example, how will an exception event trigger the incident management process? Ideally, this will be automated by integrating the event and incident management tools so that an incident will be logged automatically. Of course, this can only happen if the two tools have the necessary functionality.
What data must be included in the event notification? What data will be needed for the event management system itself to interpret and make sense of the event, and what data will be needed by the function that will respond to the event? If the event relates to an error, then it should include necessary diagnostic information such as error messages and codes. Again, for bought-in software, this question is really, What data is included?
Another question to be asked relates to the type of monitoring to use. Should it be active or passive?
Where will event data be stored? There is likely to be a significant amount of event data, so the question of storage is important. Associated with this is the question of how long the data should be retained. This decision must be made on a case-by-case basis in consultation with the relevant stakeholders.
How will supplementary data be gathered? In some cases, the event data alone will not be sufficient to evaluate the event. For example, it might be necessary to integrate the event management system with a CMDB.

Correlation Engine

As events are detected, the event management system must interpret and make decisions about how to handle them. This is done by software known as a correlation engine. The correlation engine allows the creation of rule sets that it will use to process events.

Using a correlation engine will enable the system to determine the significance of each event and also to determine whether there is any predefined response to an event. Patterns of events are defined and programmed into correlation tools for future recognition. The correlation engine can translate component-level events into service impacts and business impacts, as shown in Figure 36.1.

Diagram shows logical layer of engine that includes business process, correlated business and service events and physical layer that includes generated events from infrastructure Cls. — **Figure 36.1** Correlation engine

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

Process Activities, Methods, and Techniques

Next, we take a look at the event management process. The process steps are shown in Figure 36.2.

Flow diagram shows event notification, detection, logging, two levels of correlation and filtering, response selection, problem and change management, incident management, review actions, affectivity check et cetera. — **Figure 36.2** Event management process flow

Copyright © AXELOS Limited 2010. All rights reserved. Material is reproduced under license from AXELOS.

The initial sequence of activities in the event management process is as follows:

An event occurs.
The event notification is sent.
The event is detected by the event management system and logged.
First-level correlation and filtering takes place.

At this point, the event type (exception, warning, or informational) has been identified. No further processing is required for informational events. For exception events, one or more of the service management processes will be triggered. If the event concerns something that has broken and requires restoring to normal service levels, an incident should be raised. A problem record may be updated if another example of a fault under investigation occurs. The automated response to an event may include raising a change. Some events, such as a “toner low” message, may require a service request to be handled by the request fulfilment process.

Warning events will then enter second-level correlation, which identifies how to proceed. In some cases, the event will be treated as informational or as an exception. Other cases will trigger either an automated response or an alert for human intervention, as detailed in the following section.

Event Notification

Let’s look at the initial process activities in a little more detail. Event notification refers to the communication of information about an event. You’ve already seen that some components will generate notifications independently, while others have to be prompted by polling.

Event Detection and Logging

Some events will be detected directly by the event management tool. Other events will be detected by a software agent running on the device being monitored. This agent then generates a notification that can be detected by the event management tool. All events are logged.

Correlation and Filtering

In first-level correlation, a decision about whether any further action is required is made, including whether the event has any significance to the organization. Correlation will determine whether an event is informational, a warning, or an exception. We discussed filtering earlier; this is necessary to stop staff being overwhelmed with events that do not require any action, or multiple reports of the same fault.

Informational events are closed at this point. Exception events will trigger one of the other service management processes. Warning events will go forward to second-level correlation.

Next, we consider the criteria that might be used by the second-level correlation engine to interpret an event:

The number of similar events might be significant. For example, an event might notify us of an unsuccessful attempt to access our network. If this is just a single instance, we might treat it as informational, but if there have been 500 such attempts in the last 5 minutes, we might judge that we are the target of an organized attempt to break in.
The number of devices generating similar events might be significant. It might indicate a widespread virus infection, for example.
The data supplied with the event might indicate its significance.
Events related to device utilization might be compared with a defined threshold.

The correlation engine determines whether the event requires some action or whether it can be treated as informational and closed.

Actions Taken

Some events indicate conditions that can be resolved automatically without human intervention. For example, if a file server is detected to be nearly full, then a script could be run that would free up space by archiving old data.

Some events will require human intervention—an alert from a smoke detector, for example. It’s important that the alert is directed to the right person and that they know what to do.

Exception events will normally trigger the incident management process. Ideally, the event and incident management systems will be integrated so that an incident record can be raised automatically. A word of warning, however: This should be implemented only when you are happy that the filtering of events is working correctly. If this is not the case, your incident management system will be flooded with thousands of spurious incidents!

The problem management process might be triggered if the organization has a policy of always investigating the root causes of incidents that impact key services. Event management can support such a policy by automatically raising a problem record when it detects such an incident.

Change management can be triggered in two circumstances:

First, if a previously unknown device is detected, an RFC should be raised, which can be progressed as appropriate by change management to have the device either added to the CMS or removed if required.
The second circumstance is if a change is needed. For example, a server might need to be allocated more storage from a SAN.

Remember, sometimes it will be necessary to trigger a combination of these responses.

Review and Closure

There could be thousands of events each day, so it’s unlikely that every one of them could be reviewed. It’s sensible to review only what the service provider considers to be significant events. It is probably unnecessary to review events that have triggered other service management processes except to ensure that the triggers were effective.

Most events are neither opened nor closed but just logged in management systems or system logs. Many others can be closed automatically. For example, when a script is triggered to respond to an issue, the script itself could check that the corrective action has worked and generate an event to that effect.

Triggers, Inputs, Outputs, and Interfaces

We’ll now consider the event management process triggers, inputs, outputs, and interfaces.

Triggers

Any type of change in state can trigger event management, and an organization should define which of these state changes need to be acted upon. Some examples are shown here:

Exceptions to any level of CI performance defined in the design specifications or standard operating procedures.
A breach of a threshold in an OLA could also generate an event.
Any exceptions to a process, such as a failure to complete a process within the target time. An exception could also be a routine change that has been assigned to a build team or a business process that is being monitored by event management.
The completion of an automated task or job could trigger the issuing of an event, as could a status change in a server or database CI.
A user accessing a particular application or database could also cause an event to be issued if this is information that the business or the IT department wanted to know about. For example, IT may wish to track how often a service is being used to decide on the number of licenses required, or the business may wish to know how often a customer service representative needs to refer to the knowledge base to answer a query, as this might indicate a training requirement.
The more usual trigger for an event, such as a situation in which a device, database, or application has reached a predefined threshold of performance.

Inputs

Inputs to event management usually come from service design and service transition. They include the examples listed here:

Operational and service level requirements associated with events and their actions.
Alarms, alerts, and thresholds for recognizing events.
Event correlation tables, rules, event codes, and automated response solutions that will support event management activities.
Roles and responsibilities for recognizing events and communicating them to those who need to handle them.
Operational procedures for recognizing, logging, escalating, and communicating events.
SLAs, which can be used by the correlation engine to determine the significance of an event or to identify a performance threshold.
Rule sets that are provided by technical or application management staff based on the monitoring and management requirements. For example, the capacity management process would define the capacity thresholds that should generate an event.
The roles and responsibilities of all those involved.
Procedures for logging and escalating events as required.

Outputs

Outputs from event management are usually passed to other service management processes, such as incident management, change management, and request fulfilment. They include the examples listed here:

Events that have been communicated and escalated to those responsible for further action
Incident, problem, change, or request records required as a result of an event
Event logs describing what events took place and any escalation and communication activities taken to support forensic, diagnosis, or further CSI activities
Events that indicate an incident has occurred
Events that indicate the potential breach of an SLA or OLA objective
Events and alerts that indicate completion status of deployment, operational, or other support activities
Populated service knowledge management system (SKMS) with event information and history

The most obvious output of the process is the events themselves. These should have been communicated and escalated to the appropriate people. Another output is a chronological event log describing what events took place and any escalation and communication activities taken. This may be useful information if further investigation is required or to spot possible improvement opportunities.

Some events output by event management will indicate that an incident has occurred, and others will warn of the potential breach of an SLA or OLA objective. Of course, as we have said, not all events show that something is wrong, and many events will just indicate successful completion of deployment or operational activities. The data output from event management can be used to populate the SKMS with the event information and history.

Interfaces between Event Management and the Lifecycle Stages

Finally, let’s consider the interfaces event management has with the other lifecycle stages and their associated processes. Event management can interface with any process that requires monitoring and control, especially those that don’t require real-time monitoring but do require some form of intervention following an event or group of events. First we’ll consider how the process can even help the business directly.

Business Processes

The information provided by event monitoring may be used to help manage unusual occurrences with business processes.

Using Event Management to Preempt a Major Incident

Some years ago, a camera was mistakenly priced on a website at $59.99, when it should have been $599.99. Word spread through social media and thousands of orders were placed, which the company had to honor to avoid bad publicity. It took some time before anyone noticed, and then it was only when a staff member received an email from a friend. Event monitoring could have alerted the company to the unusual sales pattern very quickly, thus limiting the financial damage. Another similar example was an ATM that was filled with $20 notes in the $5 note holder. Queues formed as people withdrew cash from the machine, which was delivering four times the amount requested. In that situation, event management was in place, and the unusual pattern of multiple $5 withdrawals was spotted, and the machine was remotely closed down.

Service Design

Event management interfaces with a number of service design processes. Examples include the following:

Service level management is the first such interface. Event management can be used to detect any potential impact on SLAs early so that action can be taken to resolve the fault to minimize that impact.
Information security management may use event monitoring to monitor for unusual activity. This may be multiple login attempts or unusual activity for a business process, such as unusual spending on a credit card, which alerts the bank to a possible stolen card.
Capacity and availability management define what events are significant, what the thresholds should be, and how to respond to them. Event management then responds to these events, improving the performance and availability of services.

Service Transition

Event management tools may also be used to support service transition processes:

The service asset and configuration management process uses events to determine the current status of any CI. A discrepancy with the authorized baselines in the CMS will highlight an unauthorized change.
Event management can determine the lifecycle status of assets. For example, an event could signal that a new asset has been successfully configured and is now operational.
Knowledge management stores information obtained by event management in knowledge management systems. For example, patterns of performance information correlated with business activity is input into future design and strategy decisions.
Event management interfaces with change management to identify conditions that may require a response or action.

Service Operation

Event management is a service operation process, and it interfaces with the other processes in that lifecycle stage:

There is an obvious interface with incident and problem management because many alerts are results of failures and require an incident to be raised.
By catching and logging each such occurrence, event management provides vital information to problem management about when and where the incidents are occurring.
Finally, event management can be used by access management to detect unauthorized access attempts and security breaches.

Critical Success Factors and Key Performance Indicators

The next topics for discussion are critical success factors (CSFs) and key performance indicators (KPIs). Before we look at the CSFs and KPIs relevant to event management, we should take a minute to understand what these terms mean.

A critical success factor, or CSF, is a high-level statement of what a process must achieve if it is to be judged a success. Normally, a process would have only three or four CSFs. A CSF cannot be measured directly–that’s what key performance indicators are for.
A key performance indicator, or KPI, is a metric that measures some aspect of a CSF. Each CSF will have three or four associated KPIs.

Here are some examples of CSFs and KPIs for event management:

Critical success factor: “Detect all changes of state that have significance for the management of CIs and IT services.”
Possible associated KPIs for this CSF include the following (notice that the first KPI is trying to gauge the success in detecting faults while the second is trying to measure the scope of the event management implementation):
- Number and ratio of events compared with the number of incidents
- Number and percentage of each type of event per platform or application versus total number of platforms and applications underpinning live IT services
Critical success factor: “Ensure that all events are communicated to the appropriate functions that need to be informed or take further control actions.”
Associated KPIs might be as follows:
- Number and percentage of events that required human intervention and whether this was performed
- Number of incidents that occurred and percentage of them that were triggered without a corresponding event
Critical success factor: “Provide the means to compare actual operating performance and behavior against design standards and SLAs.”
The following KPIs would enable the CSF to be assessed:
- Number and percentage of incidents that were resolved without impact to the business (indicates the overall effectiveness of the event management process and underpinning solutions)
- Number and percentage of events that resulted in incidents or changes
- Number and percentage of events caused by existing problems or known errors (this may result in a change to the priority of work on that problem or known error)

Challenges

The following challenges could be encountered in event management:

Lack of funding for tools and the effort needed to implement them successfully
Establishing the correct level of filtering to avoid being flooded by events or having insufficient useful information
Installing monitoring agents across the entire infrastructure
Lack of time and funding for training to acquire the necessary skills to design and interpret events

Risks

The following risks are associated with event management; in many cases the risks are the result of failing to meet the challenges listed above.

Failure to obtain adequate funding
Ensuring the correct level of filtering
Failure to maintain momentum in deploying the necessary monitoring agents across the IT infrastructure

If any of these risks are not addressed, they could adversely impact the success of event management.

Summary

This chapter explored the next process in the service operation stage, event management. It covered the use of event monitoring to manage large numbers of items and how automated responses to particular events may improve the delivery of services. It also explained the role of events in automating processes.

We discussed the key ITIL concepts of events and alerts and how event management can improve availability by preempting failures or reducing the time taken to identify them. Finally, we considered the technical and staff challenges of implementing this process.

Exam Essentials

Understand the purpose, objectives, and scope of event management. Describe events (a change of state that has significance for the management of a CI) and alerts (a failure or breach of a threshold) and the difference between them. Be able to give examples of each.

Understand the role of event management in automation. Describe passive and active monitoring and the difference between them. Be able to give examples of each. Understand the importance of filtering events and explain how effective event management can reduce downtime. Be able to explain automatic responses to certain types of events.

Know how event management benefits the customer and the IT department. Understand the efficiency benefits to be gained by being able to have a small number of staff monitor huge numbers of CIs and services. Understand how improved availability through reduced downtime benefits the business.

Understand how event management can be used to monitor business events and environmental conditions. Be able to explain how the process of event management can be applied beyond the technical IT environment.

Review Questions

You can find the answers to the review questions in the appendix.

For which of these situations would implementing automation by using event management be appropriate?
1. Hierarchical escalation of incidents
2. Speeding up the processing of month-end sales figures
3. Notification of an “intruder detected” to local police station
4. Running backups
  1. 3 and 4 only
  2. All of the above
  3. 2 and 3 only
  4. 1, 3, and 4 only
Event management can be used to monitor which of the following?
1. Environmental conditions
2. System messages
3. Staff rosters
4. License use
  1. 1 and 2 only
  2. 2 and 3 only
  3. 1, 2, and 4 only
  4. All of the above
Which of the following are types of event monitoring?
1. Passive
2. Virtual
3. Active
4. Standard
  1. 1 and 2 only
  2. 2 and 3 only
  3. 1 and 3 only
  4. All of the above
Which of the following is the best description of an alert?
1. An unplanned interruption to a service
2. The unknown, underlying cause of one or more incidents
3. An event that notifies staff of a failure or that a threshold has been breached
4. A change of state that has significance for the management of a CI
Which of the following describes an active monitoring tool?
1. A tool that correlates alerts generated from configuration items
2. A tool that allows the interconnection of configuration items in a single database
3. A tool that continually polls configuration items about their status
4. A tool that integrates with the active directory system to identify users
What is the correct way to handle an event?
1. Only the people or team responsible for handling events should be notified of an event.
2. All support teams should be notified of an event.
  1. Both are true.
  2. None are true.
  3. Only 1 is true.
  4. Only 2 is true.
Which of the following is NOT a type of event defined in ITIL?
1. Emergency
2. Exception
3. Warning
4. Informational
Which of the following does NOT describe a correlation engine?
1. Software that uses rule sets to process events
2. Software that uses rule sets to decide which changes should be approved
3. Software that uses rules to determine the significance of each event
4. Software that uses rules to determine the predefined response to each event
Which of the following describes the correct sequence of initial activities in the event management process?
1. Occurrence, detection, correlation, notification
2. Occurrence, notification, detection, correlation
3. Notification, occurrence, detection, correlation
4. Detection, occurrence, notification, correlation
Which of the following are valid inputs to the event management process?
1. OLA and SLA requirements associated with events and their actions
2. Alarms, alerts, and thresholds for recognizing events
3. Event correlation tables, rules, event codes, and automated response solutions
4. Roles and responsibilities for recognizing events and communicating them
5. SLAs used by the correlation engine to determine the significance of an event or to identify a performance threshold
6. Rule sets provided by technical or application management staff based on the monitoring and management requirements
  1. 1, 2, 4, and 6 only
  2. All of these are valid inputs
  3. 1, 3, 5, and 6 only
  4. 2, 3, 5, and 6 only

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 36 Service Operation Processes: Event Management

Create new playlist

Sign In

Sign Up