Event Management

Modern infrastructure management depends to a large extent on the use of event monitoring tools. These tools are able to monitor large numbers of configuration items simultaneously, identifying any issues as soon as they arise and notifying technical management staff. The process of event management is responsible for managing events throughout their lifecycle. Event management is one of the main activities of IT operations.

An event can be defined as any change of state that has significance for the management of a configuration item (CI) or IT service. Note that this does not state that the change of state is a failure. Many events are purely informational. Examples of informational events could include notification of a user logging onto an application (significant because the use of the application may be metered) or a transaction completing successfully (significant because the notification of the successful completion may trigger the start of the next transaction). An event that notifies staff of a failure or that a threshold has been breached is called an alert. Examples of alerts could include notification that a server has failed or a warning that the memory or disk usage on a device has exceeded 75 percent. If you consider these concepts in a non-IT environment, a car console may issue an event to say that the system has successfully connected to a Bluetooth device, or it might raise an alert (together with a beep or flashing light) to warn that a threshold has been breached and the car is now low on gas.

There are two types of event monitoring tools:

  • Active monitoring tools will poll devices to check that they are working correctly. The tool will send a message and expect a positive response within a defined time, such as sending a “ping” to a device. A failure to respond will be notified to support staff. Some tools will have automated responses to such situations, perhaps automatically restarting a device or rerouting data to avoid the faulty CI so that the service is not affected.
  • Passive monitoring tools do not send out polling messages; they detect events generated by CIs and correlate them (that is, identify related events).

The Purpose of Event Management

The purpose of event management is to detect events, understand what they mean, and take any necessary action. Many devices are designed to communicate their status, and event monitoring will gather these communications and act upon any that need action. Some communications report operational information, such as “backup of file complete,” “print complete,” and so on. These events show that the service is operating correctly. They can be used to automate routine activities such as submitting the next file to be backed up or the next document to be printed. They may also be used to monitor the load across several devices, issuing automated instructions to balance the load, dependent on the events received. If the event is an alert, such as “backup failed,” “printer jam,” or “disk full,” the necessary corrective steps will be taken. An incident should be logged in the case of a failure.

The Objectives of Event Management

The objectives of event management include the following:

  • Detecting all “changes of state that have significance for the management of a CI or IT service” (see the definition of event earlier) and deciding upon the correct response, if any. This is then communicated to the appropriate staff to carry out.
  • Triggering automated processes or activities in response to certain events. This may include automatically logging an incident in the service management tool in the event of a failure.
  • Providing sufficient information to enable an accurate assessment of the performance of a service against the SLA target. This might include analyzing events that show the start and end of a process to enable the elapsed time for its completion to be calculated and compared to the SLA target.
  • Using such information and analysis as the basis for service reporting, in particular to measure the success or failure of improvement actions.

You do not need to know the process steps in detail for the exam, but an understanding of the key points will help you understand its objectives. The first step is the notification that an event has occurred. This depends on the monitoring tools being configured correctly to filter out notifications that have no significance. Without this, important events can be missed or lost among hundreds of spurious notifications. The event should then be logged; this may be an entry in the event monitoring log, or an automatic link to the incident management tool may raise an incident record. In the latter case, this interface should not be used until the appropriate filtering is in place to prevent spurious incidents from being raised. An analysis of the event should identify its significance; is it informational, a warning, or an exception? Dependent upon this analysis, any required actions are then taken.

The Scope of Event Management

Event management can be applied to any aspects of service management that need to be controlled and that could benefit from being automated. The service management tool set is an example, including automatically logging incidents in response to emails or events being received, escalating incidents when thresholds have been reached, and notifying staff of certain conditions (for example, a priority one incident being logged).

Configuration items can be monitored by event management tools; this monitoring can be for two different reasons:

  • Some CIs will be monitored to make sure that they are constantly available. An example of this is a network device where action needs to be taken as soon as the CI fails to respond to a ping.
  • Other CIs may need to be updated frequently. This updating can be automated using event management, and the CMS can be automatically updated to show the new state.

Other areas where event management can be used include the monitoring of environmental conditions. This might be for fire and smoke detection or for other environmental changes.


Using Event Management to Preempt a Major Incident
A large transport organization installed event monitoring across its infrastructure, including monitoring the server room environments. A screen showing current events was installed at the service desk. On the second day after this was implemented, its value was proved. The service desk called the head office 150 miles away to ask the staff there to check the server room, because there were environmental alerts showing on the screen. The head-office staff entered the server room to find that the air conditioning had failed and the room was extremely warm. Had the temperature increased much more, the servers would have failed, causing major disruption to the services. The head office staff members were able to avert the incident by using fans to lower the temperature until the air-conditioning engineer arrived to fix the fault.

Tracking license use is another possible use for event management tools; this ensures that there is no illegal use of an application by ensuring that the number of people using the software does not exceed the licenses held. This may also save money; by showing that there is less demand for concurrent use than was thought, the number of licenses can be reduced. Monitoring for and responding to security events, such as detecting intruders, is another use; the tools can also be used to detect a denial-of-service attack or similar event.

In addition to these uses, event management can be used for day-to-day management of the service. This might be monitoring performance of hardware or network equipment or tracking the use of a particular application.

Monitoring and Event Management

It is important to understand the difference between the two similar but different activities of monitoring and managing events.

  • Event management is concerned with generating events and detecting notifications that have been produced. These events are produced so that they can be monitored. They provide useful information regarding the status of the infrastructure.
  • Monitoring detects these notifications but goes further than this. Monitoring includes actively checking CIs to ensure that they are working as they should, whether or not an event has been generated.

As you have seen, event management can be enormously useful in managing large and complex infrastructures. It is often the case, however, that the full value of these tools is not realized. This is usually because there has been insufficient time spent making sure that they are configured correctly to notify staff only for those events where they need notification. Failing to specify the correct thresholds, for example, will mean that far too many breaches are reported. The staff then ignores the events, because they are seldom significant. Of course, this means that significant events are missed. It is all too common that technical management teams have impressive plasma screens on the walls with flashing red warnings that everyone ignores. Sometimes the attitude is that the users will call the service desk if there really is an issue, which of course negates one of the major advantages of using such tools, that of being able to detect and respond to incidents before the user is impacted! Failing to filter the events properly means that the ability to automatically raise incidents cannot be used, because the service management tool would be flooded with multiple spurious events.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset