CHAPTER 3
EVALUATING RISK
Understanding What Can Go Wrong

Luck: 1a, a force that brings good fortune or adversity; 1b, the events or circumstances that operate for or against an individual; 2, favoring chance.

INTRODUCTION

The heart of building a business continuity plan is a thorough analysis of events from which you may need to recover. This is variously known as a threat analysis or risk assessment. The result is a list of events that could slow your company down or even shut it down. We will use this list to identify those risks your business continuity plan must address.

First, let’s define the terminology we’ll use when discussing risk:

Image The potential of a disaster occurring is called its risk. Risk is measured by how likely this is to happen and how badly it will hurt.

Image A disaster is any event that disrupts a critical business function. This can be just about anything.

Image A business interruption is something that disrupts the normal flow of business operations.

Whether an event is a business interruption or a disaster sometimes depends on your point of view. An interruption could seem like a disaster to the people to whom it happens, but the company keeps rolling along. An example might be a purchasing department that has lost all telephone communication with its suppliers. It is a disaster to the employees because they use telephones and fax machines to issue purchase orders. The facility keeps running because their mitigation plan is to generate POs on paper and use cell phones to issue verbal material orders to suppliers.

Risk is defined as the potential for something to occur. It could involve the possibility of personal injury or death. For example, insurance actuaries work to quantify the likelihood of an event occurring in order to set insurance rates. A risk could be an unexpected failing in the performance of duties by someone you had judged as reliable. It could be a machine failure or a spilled container of toxic material.

Not all risks become realities. There is much potential in our world that does not occur. Driving to work today, I saw clouds that indicate the potential of rain. Dark clouds don’t indicate a certainty of precipitation, but they do indicate a greater potential than a clear sky. I perceive an increased risk that I will get wet on the long walk across the company parking lot, so I carry an umbrella with me. The odds are that it will not rain. The weatherman says the clouds will pass. I can even see patches of blue sky between the massive dark clouds. Still, to reduce my risk of being drenched, I carry an umbrella.

Some risks can be reduced almost to the point of elimination. A hospital can install a backup generator system with the goal of ensuring 100% electrical availability. This will protect patients and staff against the risk of electrical blackout and brownouts. However, it also introduces new risks, such as the generator failing to start automatically when the electricity fails. It also does not protect the hospital against a massive electrical failure internal to the building.

Some risks are unavoidable and steps can only be taken to reduce their impact. If your facility is located on the ocean with a lovely view of the sea, defenses can be built up against a tidal surge or hurricane, but you cannot prevent them. You can only minimize their damage.

Some risks are localized, such as a failure of a key office PC. This event directly affects at most a few people. This is a more common risk that should not be directly addressed in the facility-wide business continuity plan. Rather, localized plans should be developed and maintained at the department level, with a copy in the company-wide master plan. These will be used mainly within a department, whose members address these challenges as they arise. If a problem is more widespread, such as a fire that burns out just those offices, all the combined small reaction plans for that office can be used to more quickly return that department to normal.

Other risks can affect your entire company. An example is a blizzard that blocks the roads and keeps employees and material from your door. We all appreciate how this can slow things down, but if you are a just-in-time supplier to a company in a sunnier climate, you still must meet your daily production schedule or close your customer down!

In building the list, we try to be methodical. We will examine elements in your business environment that you take for granted. Roads on which you drive. Hallways through which you walk. Even the air you breathe. In building the plan, a touch of paranoia is useful. As we go along, we will assign a score to each threat and eventually build a plan that deals with the most likely or most damaging events (see Figure 3-1).

Image

FIGURE 3-1: Attributes of risk.

BUILDING A RISK ANALYSIS

At this point we can differentiate among several common terms. We will begin with a risk analysis. A risk analysis is a process that identifies the probable threats to your business. As we progress, this will be used as the basis for a risk assessment. A risk assessment compares the risk analysis to the controls you have in place today to identify areas of vulnerability.

The recommended approach is to assemble your business continuity planning team and perform the layers 1, 2, and 3 risk analyses (see the section below on The Five Layers of Risk) together. Your collective knowledge will make these reviews move quickly. Such things as the frequency of power or telephone outages in the past, how quickly these were resolved, and types of severe weather and its impact are all locked in the memories of the team members.

What Is Important to You?

A risk analysis begins with a written statement of the essential functions of your business that will be used to set priorities for addressing these risks. Essential functions could be business activities, such as the availability of telephone service. It could be the flow of information, such as up-to-the-second currency exchange rates. It is anything whose absence would significantly damage the operation of your business.

Most functions of a business are nonessential. You may think of your company as being tightly staffed and the work tuned to drive out waste. But think about the functions whose short-term loss would not stop your essential business from running. One example is payroll. Losing your payroll function for a few days would be inconvenient, but should not shut your business down. Most people can’t delay paying their bills for long, so over a longer period of time, this rises to the level of critical. This illustrates how a short-term noncritical function can rise to be a critical function if it is not resolved in a timely manner.

Another example is a manufacturing site that states its essential functions as building, shipping, and invoicing its products. Anything that disturbs those functions is a critical problem that must be promptly addressed. All other functions that support this are noncritical to the company, although the people involved may consider them critical. On a more local scale, there may be critical functions for a department or a particular person’s job. These are also important to resolve quickly. The difference is one of magnitude. Company-wide problems have company-wide impact and must be resolved immediately.

Another aspect to consider is the loss of irreplaceable assets. Imagine the loss or severe damage to vital records that must be retained for legal, regulatory, or operational reasons. Safeguarding these records must be added to your list of critical functions. Included in this category are all records whose loss would materially damage your company’s ability to conduct business. All other records are those that can be reproduced (although possibly with great effort) or whose loss does not materially affect your business.

With all of this in mind, it is time to identify those few critical functions of your facility. These functions will be broad statements and are the primary purposes toward which this site works. The easiest way to start is for the top management team to identify them. Often the company’s Operations Manager has some idea of what these should be. They would have been identified so that business continuity insurance could be purchased.

Another way to identify critical functions is for your team to select them. Based on your collective knowledge of the company, just what are they expecting you to provide? Another way to think of this is what is the essence of your site’s function?

Some examples to get you thinking:

Image A factory. To build, ship, and invoice products. This implies that the continuous flow of products down the assembly line is critical, along with prompt shipment and invoicing (to maintain cash flow).

Image A national motel chain call center. To promptly respond to customer calls, make accurate reservations, and address customer concerns in a timely manner. This implies that telephone system availability and speed of switching are critical, along with accurate databases to reserve rooms.

Image A public utility. To provide electrical service to all the customers, all of the time. This implies that no matter what other crises within the company are under way, the delivery of this product is critical.

SCOPE OF RISK

The scope of risk is determined by the potential damage, cost of downtime, or cost of lost opportunity. In general, the wider the disaster, the more costly it is. A stoppage to a manufacturing assembly line can idle hundreds of workers, so of course this is a company-wide critical event. Even a 15-minute stoppage can cost many thousands of dollars in idled labor. Consequently, a problem of this nature takes priority on the company’s resources in all departments to resolve the issue.

On a smaller scale, there may be a spreadsheet in the accounting department that is used to generate reports for top management. If this PC stops working, work has ceased on this one function, but the plant keeps building products for sale. The Accounting Manager can request immediate PC repair support. The problem and support are local issues peripheral to the company’s main function of building, shipping, and invoicing material.

When evaluating the likelihood of risks, keep your planning horizon to 5 years. The longer the planning horizon is, the greater the chance that “something” will happen. Since the purpose of the analysis is to identify areas of concentration for your business continuity plan, 5 years is about as far out as you can plan for building mitigation steps. If the risk analysis is updated annually, then 5 years is a sufficient planning horizon.

Cost of Downtime

Calculating the cost of downtime is critical to determining the appropriate investments to be made for disaster recovery. But calculating the costs due to the loss of a critical function is not a simple process. The cost of downtime includes tangible costs, such as lost productivity, lost revenue, legal costs, late fees and penalties, and many others. Intangible costs include things such as a possibly damaged reputation, lost opportunities, and possible employee turnover.

TANGIBLE COSTS The most obvious costs incurred due to a business interruption are lost revenue and lost productivity. If customers cannot purchase and receive your product, they may purchase from a competitor. Electronic commerce is especially vulnerable, because if your system is down, customers can in many cases simply click on a competitor’s Web site. The easiest method to calculate lost sales is to determine your average hourly sales and multiple that value by the number of hours you are down. While this can be a significant value, it is simply the starting point for calculating the total cost of downtime.

Lost productivity is also a major portion of the total cost of downtime. It is usually not possible to stop paying wages to employees simply because a critical process is unavailable, so their salaries and benefits continue to be paid. Many employees may be idle while the process is unavailable, while others may continue to work at a much-diminished level of productivity. The most common method to calculate employee downtime costs is to multiply the number of employees by their hourly loaded cost by the number of hours of downtime. You may need to do this separately for each department, as their loaded cost and their level of productivity during the outage may vary. You will also need to include the employee cost for those who are assisting with any recovery or remediation processes once the process is back up. These employees may be doing double duty once the system is back up, doing their regular jobs and also entering data that were missed or lost during the downtime.

Other employee-related costs may include the cost of hiring temporary labor, overtime costs, and travel expenses. You may also incur expenses for equipment rental for cleanup or for temporary replacement of critical machinery and extra costs to expedite late shipments to customers.

If the business interruption was due to damages, such as fire or flood, the direct loss of equipment and inventory must of course be added in. Other tangible costs may include late fees and penalties if the downtime causes you to miss critical shipments to customers. You may also incur penalties if the downtime causes you to miss deadlines for government-mandated filings. Stockholders may sue the company if a business interruption causes a significant drop in share price and they believe that management was negligent in protecting their assets.

INTANGIBLE COSTS Intangible costs include lost opportunities as some customers purchase from your competition while you’re down and may not return as customers. You don’t just lose the immediate sale, but possibly any future business from that customer. You need to calculate the net present value of that customer’s business over the life of the business relationship. If you have repeated problems with systems or processes being unavailable, some employees may become frustrated and leave the company. The cost to replace them and to train new employees should be considered. Employee exit interviews can help determine if this is at least a factor in employee turnover.

Other intangible costs can include a damaged reputation with customers, business partners, suppliers, banks, and others who may be less inclined to do business with you. Your marketing costs may increase if customers defect to the competition during an outage and you need to work harder to win back their business. Calculating the true total cost of an outage is not easy, but it is important to know when determining the investment necessary to prevent and/or recover from a disaster.

THE FIVE LAYERS OF RISK

The impact of risks varies widely according to what happens to whom and when. Your reaction to a disaster that shuts down the entire company will be quite different from that which inconveniences a single office or person. When considering risks, it is very helpful to separate them into broad categories (or layers) to properly prioritize their solutions. When evaluating risk, we look at five distinct layers. The layers range from what affects everyone (including your customers) in Layer 1 down to the processes performed by each individual in Layer 5.

The first layer concerns external risks that can close your business both directly and indirectly. These are risks from nature, such as flooding, hurricanes, severe snowstorms, etc. It can also include risks from manufactured objects, such as railroads or airplanes. Risks of this type usually disrupt our customers and suppliers as well as our own employees.

The second layer examines risks to your local facility. This might involve one or more buildings—everything at this site. Some of these risks are due to the way your offices were constructed; some risks are a result of severe weather, etc. Second-layer risks include those to basic services, such as electrical power and telephone access to your building. We will also look into issues such as bomb threats, hazardous material spills, and medical emergencies.

The third layer is your data systems organization. Everywhere throughout your organization computers are talking through a data network, sharing information, and performing other functions. In addition to operational issues, loss of data can lead to severe legal problems. Most data can be re-created, but the expense of doing so can be quite high. Data systems deserves its own layer, as its disasters can reach across your company. In most companies, if the computers stop working, so do the people.

The fourth layer is the individual department. This will drive the main part of your plan. Level four risks are the periodic crises we all confront on a weekly basis. Each department has critical functions to perform to meet its production goals and weekly assignments. These processes depend on specific tools. Each department needs to identify the risk that might prevent its members from performing their assigned work. These risks may not threaten the company’s primary functions, but over time can degrade the facilities’ overall performance.

The fifth and final layer is your own desk or work area. If you can’t do your job in a timely manner, it may not stop the company from shipping its products, but it sure adds a lot of unnecessary stress to your life. Typically the risk assessment you perform on your own job will be more detailed (because you know more about it), making it easier for you to take time off (as you will be more organized), and making bouncing back from the crisis of the week look so very easy.

LAYER 1: EXTERNAL RISKS

Many natural disasters are wide-area risks. That means they not only affect your facilities, but also the surrounding area. Consider, for example, a hurricane. The damaging winds can affect hundreds of square miles before slowly moving up the seacoast. These winds can bring on tidal surges and torrential downpours, spawn tornadoes, and result in downed power lines and other calamities all at the same time.

Now consider your business in the midst of this. All companies are affected by this disaster, including your customers, your suppliers, and your emergency services support. Damage can be widespread. Technicians and machinery you had counted on for prompt support are tied up elsewhere. Bridges may be out, your workers may be unable to leave the facilities, and fresh workers may be unable to come to work. Employees critical to your recovery may not be available due to damage to their homes or injuries to their families. The list of problems could go on and on.

Don’t forget to consider how the disaster may affect your employees’ ability to respond to the disaster. After the terrorist attacks on the World Trade Center, many disaster recovery plans called for surviving employees to be at the recovery site the next day. After watching their friends and coworkers dying around them, getting to the recovery site was not at the top of their priority list!

Don’t live in a hurricane zone? How different is this from a major snow storm? Power lines snap, which cuts off the electrical heat to your building, which causes sprinkler pipes to freeze and burst, etc. Impassable roads mean that help is slow to move around the area. Extreme temperatures reduce the productivity of power line technicians.

The risk to your site from natural disasters is determined by its topographic, hydrologic, and geologic conditions. This can be determined from maps provided by the United States Geologic Survey. The maps show elevations and drainage patterns.

The same goes for critical highways or railroads. Depending on where you live, a blocked highway may be easily bypassed. In some places, it may be the only practical route for tourists to reach your hotel. A damaged bridge on a key road could shut you down for days. A railroad derailment that spills toxic material may force an evacuation of your offices, even if it is quite a distance away.

With all of this “doom and gloom” in mind, let’s break external risks into four categories: natural disasters, manufactured risks, civil risks, and supplier risks.

WHAT TO DO?

Use Form 3-1, the “Risk Assessment Tool for Layer 1.” It is on the CD-ROM included with this book.

Evaluate the risk to your site in each of the categories over the next 5 years.

The columns of the tool are:

LIKELIHOOD is how likely this risk is to happen.

IMPACT is how bad you believe the damage would be.

RESTORATION is the length of time to get your critical functions back into service, not the amount of time for a complete recovery.

See section “Making the Assessment” at the end of this chapter for details on how to score each risk.

The risks listed in Form 3-1 are just a starting point. Add any other risks that you see for your site.

Natural Disasters

Natural disasters are the first events that come to mind when writing a disaster plan and are risks that we all live with. They vary greatly according to the part of the country in which you live. The damage from natural disasters usually covers a wide area. This not only affects your building, but also your employees, suppliers, customers, and the time required for a full recovery.

A major problem with wide-area disasters is that the help you are depending on for recovery may not be available or able to reach you. If major electrical lines are down, then your power company may take a long time to rerun the wire from the downed power pole to your building.

How much warning will you typically receive of an impending disaster? For a hurricane, you should know days before it arrives. In the case of an earthquake, you may not know until it is upon you.

TORNADOES Tornadoes are the most violent type of storm and can occur at any time of the year. They can appear with little or no warning anywhere at any time. Where you live has a great deal to do with the likelihood of a tornado occurring, with the greatest risk per square mile in Florida and Oklahoma. Tornadoes can do significant damage to facilities as well as to the homes of your employees.

You can obtain information about the likelihood of tornadoes in your area from the Severe Thunderstorm Climatology Web page of the National Severe Storms Laboratory of the National Oceanic and Atmospheric Administration at http://www.nssl.noaa.gov/hazard/hazardmap.html. This U.S. map displays the probability of tornadoes, wind, or hail for broad sections of the country. You can use this map, together with your team’s collective memory, to determine the likelihood of these events happening to you.

PANDEMICS A pandemic is an outbreak of disease that affects a large area. Pandemics in modern times are most often associated with outbreaks of an influenza virus for which there is little or no immunity in the affected population. In recent times severe acute respiratory syndrome (SARS) and H1N1 (the so-called swine flu) have impacted the ability of organizations to do business. A pandemic can have a major impact on the availability of your employees, as they or members of their family are sick from the disease. Many governments are requiring important industries, such as finance, energy, government, banking and transportation, to prepare plans for continuing operations during a pandemic.

EARTHQUAKES Earthquakes occur in all 50 states. They can affect both your facilities and the homes of your employees (see Figure 3-2). Forty-one of these states are in the moderate- or high-risk category. To see if your area has an earthquake risk, check out http://earthquake.usgs.gov/research/hazmaps/.

THUNDERSTORMS Information about the typical annual threat of severe thunderstorms in the United States can be found at http://www.nssl.noaa.gov/hazard/totalthreat.html. Severe thunderstorms include winds in excess of 58 mph and hailstones greater than .75 inches in diameter. These storms can include:

Image

FIGURE 3-2: Seattle, WA, March 2001. Businesses in and around Seattle were damaged by a February 2001 earthquake in Washington State. (FEMA News Photo.)

Image High winds that may rip off parts of your roof, exposing your equipment to damaging rain. High winds may also pick up objects and smash them into your windows, or even tip over semitrailers and close mountain passes.

Image Hail that can be smaller than a pea or larger than a softball. It can destroy field crops, put a massive number of dents in a car, damage unprotected material you have stored outside, and can be extremely annoying if you own a car lot.

Image Deluge and flash flooding that can cause roads to close, which slows the flow of customers, employees, and material in and out of your facility. Your building may change from a hilltop with a view to an island in a sea of muddy water.

Image Lightning that can damage electronic equipment without striking it. The charge can run up telecommunication wires to a PC and toast it easily. It can also damage electronics in your office without leaving a mark. Lightning is a danger to your employees, and steps should be taken to protect them from the danger of being struck and from lightning igniting flammable gases.

SNOW Heavy snow or blizzards can close access roads leading into and out of your building, keeping employees in and the next shift at home. Even if your local weather is manageable, you may still close if trucks full of materials cannot drive over snow-blocked roads. Snow storms should be monitored for wind speed and the distribution of snow. Snow piled high against buildings or on roofs can lead to structural problems or failure (see Figure 3-3).

EXTREME TEMPERATURES Extreme temperatures, whether hot or cold, can wreak havoc on your facility, your materials, and your employees. These are also peak energy demand times, which will further throw off your operating budget. Like snow and other risks, your team can decide what an extreme temperature is and the risk it will occur within the next 5 years.

HURRICANES Hurricanes are severe storms that form in tropical waters anywhere in the world. Their occurrences can be predicted by the weather service, but they cannot accurately predict where they will strike landfall and at what strength. Organizations located in or near coastal areas must have an evacuation plan in place for when hurricanes threaten. Hurricanes can spawn tornadoes, create tidal surges, and cause flooding. Evaluate the risk of just a hurricane occurring. Then evaluate the risk to each of the other categories separately.

FLOODS Floods or tidal surges are usually detected by the weather service. Thus, you have some warning that trouble is coming. The Federal Emergency Management Agency (FEMA) reports that more than 90% of natural disasters involve flooding. The tidal surge may be the result of a hurricane or severe storm at sea. Floods can result from melting snow, severe downpours in the areas upriver from your location, and other natural causes. Usually, there will be some warning, but there may not be enough time to evacuate all your vital records and machinery.

Image

FIGURE 3-3: Little Rock, AR, December 29, 2000. Downed power cables were among the damage after an ice storm. (Photo by John Shea/FEMA News Photo.)

Floods damage your property in many ways (see Figure 3-4):

Image A flood will damage just about everything by soaking it in water. Office materials, computers, and manufacturing materials all can be seriously damaged by water. When the water finally moves out, mold can move in.

Image The flood waters themselves may contain raw sewage or chemicals that will end up inside your building.

Image Debris of all sizes is carried in the flood waters and can batter your walls, smash in windows, and be left strewn about when the waters subside.

Image Flood waters typically contain mud and sand that will coat the floors and walls as the waters recede. This material will also be contaminated with whatever was in the flood waters.

Image

FIGURE 3-4: Mullens, WV, July 17, 2001. An office supply store was in shambles after flood waters up to 9 feet hit earlier in the month. (Photo by Leif Skoogfors/FEMA News Photo.)

OTHER NATURAL DISASTERS Forest fires or large brush fires may threaten your facility or the access roads to it. Landslides can close roads and damage facilities, depending on your topography. This is more common if your facility is located on or near a hill or your main roads pass along hillsides. Mudslides can result from heavy rainfall. Sinkholes (subsidence) are the result of surface collapse from a lack of support underneath, as might be caused by groundwater dissolving a soft material such as limestone, or from abandoned mine tunnels. Sandstorms resulting from high winds can damage vehicles, seep dust and grit into machine shops, and close access roads.

Manufactured Risks

All around you are potential human-created risks. If you are in a city, this is an even greater problem. These risks are the result of someone else’s disaster or actions that affect your daily operations. Stand outside for a moment and look around. Drive around the nearby roads and make notes of what you see. Look for large outside storage tanks, semitrailers with gas, or hazardous warning signs.

HOW TO IDENTIFY MANUFACTURED RISKS:

Get a map of your area from FEMA. It will show the routes taken by hazardous material carriers. It will have similar information on railroad usage and pipelines. Determine if a problem with these would block your only decent road access or if a toxic gas leak were blown your way, how close must it be to cause your facility to be evacuated.

Get a good local road map. Mark any obstacles that would hinder or prevent access to your facility if routes were inaccessible, such as major bridges and primary highways. Now mark those things whose operation would stop or hinder access, such as drawbridges or surface-level railroad tracks. This map will be further used when studying Layer 2 risks.

INDUSTRIAL SITES Note any industrial sites with large outdoor storage tanks. What is in them? Do they contain distilled water or industrial chemicals? A major chemical release could cause a wide area to be evacuated. Your facility or access to your facility could be affected while the chemical spill is being contained.

TRANSPORTATION Major highways may be used to transport toxic materials through your area. If a truck flipped over and there was a major toxic spill, do you have another access road into your facility? (If this occurs close by, your building may need to be evacuated.) Bridges across large bodies of water or intercoastal waterways can be damaged by collisions with barges or boats. If you are on an island, do you have another suitable way in? If the bridge arches high into the air to allow seagoing vessels to pass underneath, is it often closed during high winds or ice storms? Railroads also transport toxic material. Does your building have a railroad siding next to it where someone else’s railcars with potentially hazardous cargo could be temporarily stored? Is your facility located on or near a flight path? This includes small dirt strips as well.

PIPELINES Are there any underground pipelines in your area? These often carry fuels. A pipe rupture can force an evacuation lasting several days.

CHEMICAL USERS These are all around, often unknown to their neighbors. For example, many water treatment plants use chlorine to treat water. A chlorine gas leak can force an evacuation of a wide area.

DAMS Dams require regular maintenance. In extreme weather, they may overflow or become damaged; ask about soft spots.

Civil Risks

The risk from civil problems is a tough area that covers a lot of ground. Organizations are susceptible to civil disturbances because of some political agenda or they might simply be located in an affected area.

RIOTS What is the risk of a riot occurring in your area? Is it higher in an urban area (where the people are) than in a rural area? In general, it would be less likely in an affluent area than in an area with a concentration of less affluent people. It might be less likely in the middle of an industrial park than on a busy street corner.

LABOR DISPUTES Another risk is the potential of a labor dispute turning into a strike. The picket lines that usually accompany a strike might cause material and employee flow problems if truck drivers and employees refuse to or cannot cross the picket lines. Similar to a labor stoppage is the risk of secondary picketing. If your labor relations are sound, but one of your suppliers is in the midst of a labor dispute, their employees may choose to publicize their dispute by picketing companies that continue to use products made by their company. Even though these picket lines tend to be much smaller, you may have union truck drivers who will not drive across them.

TERRORISM The threat from terrorism is unfortunately a growing problem worldwide. It is typically defined as the calculated use or threat of violence against civilians for reasons that are political, religious, or ideological in nature. Acts of terrorism can include bombings, kidnappings, hijackings, hacking, or other forms of violence or intimidation. As the attacks on 9/11 demonstrated, terrorism can have an impact over a wide area both on physical facilities and the ability of employees to do their jobs.

BIOLOGICAL ATTACKS This is the intentional release of germs or other biological agents in an attempt to cause serious illness or death over a wide area. Some agents are contagious and can spread from person to person (e.g., smallpox) or are limited to individuals who come into direct contact with the agent (e.g., anthrax). As we have seen in the many anthrax scares recently the material does not have to be real to cause a disruption to your business.

Supplier Risks

Another category of risk is how well your suppliers can maintain their flow of goods into your facility. Make a list of your key suppliers and ask yourself, in every case, what is the risk that they cannot manufacture and deliver your required material to your dock on time in the event of any of the aforementioned disasters. This is critical for manufacturers who depend on just-in-time deliveries.

You need to consider the condition of the access roads or rail service between your facility and your key suppliers. This could be interrupted by area-wide disasters, such as blizzards or flooding.

SUPPLIER RISKS

What to Do?

1. Make up a list of key suppliers or service providers whose absence for more than 48 hours would shut you down. (You can change the 48 hours to whatever value you think is appropriate.)

2. Plot their location on a map (down to the road intersection if local, or to the town if distant). Pushpins work well for this.

3. Identify potential problems along their routes. For example, are they in St. Louis and need to cross the Mississippi River to reach your facility? If so, what is the risk they can’t get across in the event of a major flood?

4. For local suppliers, check to see if they have multiple routes to reach you or have their own traffic flow bottlenecks.

Sources of Information for Layer 1 Risks:

Earthquakes: http://earthquake.usgs.gov/research/hazmaps/

Tornadoes:http://www.nssl.noaa.gov/hazard/hazardmap.html

Severe storms: http://www.nssl.noaa.gov/hazard/totalthreat.html

Manufactured hazards: Your local Federal Emergency Management Agency (FEMA) office can be found in the county or state sections of your local telephone book or at the FEMA Web site at http://www.fema.gov/about/contact/statedr.shtm. They will be an invaluable source of the risks and mitigation actions for Layer 1 risks in your locale.

Access hazards: A road map and a topographical map.

LAYER 2: FACILITY-WIDE RISK

A facility-wide risk is something that only impacts your local facility. Some companies span many locations and will need to make a separate risk assessment for each location. Each assessment can be for one building or a cluster of buildings. In either event, a facility-wide risk involves multiple departments and would slow or stop the flow of business.

An example might be a facility that takes toll-free calls from around the country for hotel reservations. The loss of their internal telephone switch could idle hundreds of workers. Customers who could not complete their calls would phone a different hotel chain. This costs the company in direct revenue and is compounded by the loss of valuable customer goodwill through the uncompleted calls.

Another example is the loss of electrical power. Unless you sit next to a window on a sunny day, the loss of electrical power will mean all work stops when the lights go out. In addition, all your desktop PCs will “crash” and lose any data in their memories. Just the labor time alone to reboot this equipment can be substantial.

We will begin with the essential utilities we all take for granted, and then move into the important areas of people risks. There are five basic office utilities that we all take for granted, but without them, the doors might close quickly. They are:

Image Electricity

Image Telephones

Image Water

Image Climate Control

Image Data Network

WHAT TO DO?
Use the local map that was marked up in Layer 1 and indicate the location of the local fire department, ambulance service, hospital, and police station. Look for access problems.

Electricity

Electricity gives us lights. It powers our office and manufacturing machines. It is magically there every time we need it—just plug in! Stop and think of the complexity involved in generating electricity and then moving it hundreds of miles to where it is needed. This is truly an engineering marvel. And it is very reliable. So reliable that when it is stopped, people become very annoyed as if something they had a right to expect was taken from them.

To properly determine the risk of an electrical outage, begin with the team’s own experiences with the frequency, timing, and length of outages in this area. Frequency is how many times it might occur within your 5-year planning window. Timing is what time of day or day of the week it usually happens. In some places, it seems most likely to occur during severe thunderstorms. In other locales, it might be most likely to stop during ice storms.

The second step is to consult your facilities maintenance department. Find out how many power feeds run into the building and if they enter from opposite ends of the building. It is not uncommon to only have one. If so, then you have just uncovered a potential single point of failure. It is better to have more than one power feed to your building.

One thing to understand is that even if electricity is unavailable across a wide area, the landline telephone system may still work. You might consider maintaining at least one landline connection if your organization moves to other technologies such as voice-over-IP (VoIP) or all cell phones, as a blackout could last longer than your UPS or cell phone batteries. You can use this to notify the power company of the outage, to see how widespread it is, and to ask when they expect to have it operational again.

Telephones

Telephones are your window to the world. In the blink of an eye, you communicate with customers and suppliers in any corner of the world. Telephones also provide a crucial lifeline to emergency services during a disaster. Loss of telephone service hurts some companies more than others, but few companies can function without it for an extended period of time.

A critical aspect of telephone communications is that your external company data network often runs over the same cables. So if a backhoe operator cuts the cable to your building, you could lose both the telephones and the external data lines at the same time.

When evaluating your telephone risk, check out your local telephone service architecture. If the local central office was inoperable, would your telephones still work? If you can reach multiple central offices, then the answer is yes. If you are only connected to one central office, then its loss is your loss.

Most companies have their own Private Branch Exchange (PBX) system. Damage to this room could very effectively shut down your internal telephone system. How do you rate the risk or likelihood of this happening?

Water

One thing we can look forward to every winter is the breaking of water mains. As the ground is saturated with fall or winter moisture and then freezes, it expands and contracts, stressing older water main lines. Eventually, one will give way and a section of the town will be without fresh water until it is fixed.

If you are operating a restaurant, you use a lot of water for sanitation and for customers. So, of course, if a water main broke you could be closed for several hours. If this occurred during a particularly profitable time of day or day of the week, you could lose a lot of money. If it happened very often, you could lose customer goodwill.

Office buildings are also major water users. Many computer and PBX rooms are cooled by “chilled water” systems. If these units lose water pressure, they can no longer cool the air and the central computer equipment could overheat. If this occurred on a weekend, you might find out when everyone streams in on Monday. By then, the heat has damaged expensive electronic components and your systems are useless.

Office buildings also use water for sanitation. If you have 500 people in a building, you have a lot of flushes in one day. If your neighborhood water main was broken, how long would your building be habitable?

Climate Control

Loss of heating or air conditioning might be an inconvenience depending on the time of the year. In the depth of winter or the height of summer, this could make for very uncomfortable working conditions and be very damaging to your manufacturing materials and electronic systems.

Loss of heat in the depths of winter:

Image Can cause your building to cool to the point of freezing. This could lead to frozen sprinkler pipes that could rupture and leak upon melting.

Image Can affect integrated circuits in electronic equipment that are not designed for extreme cold and may malfunction.

Image Can, in a manufacturing environment, stop production as the viscosity of paint, lubricants, and fluids used in normal production is increased. Water-based products may be ruined if frozen.

Loss of air conditioning in the heat of summer:

Image Can result in office closures because the high heat could lead to heat stroke or heat exhaustion. Remember to consult the heat index for your area, as humidity can make the air temperature feel much warmer and can impact people sooner.

Image Can, in a factory, lead to the overheating of moving machinery much faster and potentially beyond its rated operating temperature.

Image Requires that you monitor the temperatures of your computer and PBX rooms and shut down if it is in excess of the manufacturer’s rated temperatures or risk losing warranty claims.

Image Can result in a loss of humidity control that may add moisture to your vital records storage room, leading to the potential for mildew growth.

Data Network

Most companies depend heavily on their data communication network to conduct daily business. It is the tool that allows desktop workstations to share data, send e-mail confirmations, and receive faxed orders into e-mail, as well as providing a wealth of other benefits. In many companies, losing the data network is as severe a problem as losing electricity. We’ll discuss data communications issues more thoroughly below in Level 3, Data Systems Risks.

Other facility-wide risks to review are those that endanger the people in the facility. These people risks include:

Image Fire

Image Structural Problems

Image Security Issues

Image Medical Concerns

FIRE What do you think the risk is of a fire occurring in your facility? This can be a fire of any size depending on what you see in place today to deal with it. There may be fire extinguishers in every corner, but that does not mean there is a low risk of fire. This risk should take into account the local conditions (does it get very dry in summer), the amount of combustibles stacked around the facility, and the construction of the building itself (wood, cement, etc.).

Another risk factor to add is the reaction time for fire crews to reach your site. If it is rural, it may take additional time to collect volunteer firefighters at the stationhouse before they can respond (see Figure 3-5).

STRUCTURAL PROBLEMS Structural problems may be caused by design flaws, poor materials, or even human mistakes. In any event, consider the risks of damage from the very building you are sitting in.

Image Weather-related structural failure might arise from a heavy snowfall weighing on the roof or even from high winds.

Image

FIGURE 3-5: NOAA news photo. (From Frankel et al., U.S. Geological Survey, 1997.)

Image A fire on one floor of a building may be quickly contained, but the water used to extinguish it will seep through the floor and damage equipment and vital records stored below. Any large fire, no matter how quickly it is contained, has the capability to weaken an entire structure.

Image Water pipe breakage can occur from a part of the building freezing from heat shut off over a holiday, or from a worker snapping off a sprinkler head with their ladder as they walk down a hall.

Image Lightning does not have to hit your building to damage sensitive electronic components. However, if it does, you could lose valuable data and equipment in a very, very short time. Buildings must have proper grounding and lightning protection.

SECURITY ISSUES The quality of security surrounding a workplace has gained widespread attention in recent years. Historically, the facility’s security force was used to prevent theft of company property and to keep the curious away from company secrets. In more recent years, the threat of workplace violence, often from outsiders, has led to a resurgence of interest in having someone screen anyone entering your facility. Issues that your security people must be trained to deal with include:

Image Workplace Violence. What is the risk of someone in your facility losing his or her temper to the point of a violent confrontation with another person?

Image Bomb Threats. Every occurrence of a bomb threat must be taken seriously. A bomb threat can disrupt critical processes while police investigators determine if there is a valid threat to public safety or if it is just a crank call. This risk can vary according to the public profile of your company, the type of products you produce, or even the level of labor tension in your offices.

Image Trespassing. Employee and visitor entrance screening is critical. What is the likelihood of someone bypassing or walking through security screening at your entrance? You might wish to break this down further into the risk of a deranged nonemployee out to revenge some imagined wrong by an employee to a thief looking to rummage through unattended purses. These things can tragically occur anywhere, but you can set this risk according to the team’s experience at this facility.

Image Physical Security of Property. This involves theft, either by employees or outsiders. The thief can steal from employees or from the company. It is expensive for a company to have a laptop PC stolen. It is even more expensive if that PC has company confidential data in it. Physical security involves employee identification badges, a key control program, and electronic security access to sensitive areas.

Image Sabotage. Sabotage is the intentional destruction of company property. This can be done by an employee or by an outsider. There are some parts of your facility that are only open to authorized people. Examples are the PBX room, the computer room, and the vital records storage. What is the risk that someone will bypass the security measures and tamper with or destroy something in a sensitive area? Another thing to think about is to determine if all your sensitive areas are secured from sabotage.

Image Intellectual Property or Theft of Confidential Company Information. What is the risk that valuable company information will miss a shredder and end up in a dumpster outside? This could be customer lists, orders with credit card numbers, or even old employee records.

WHAT TO DO?

Obtain copies of your company policies for security and safety. The security team often has emergency procedures for fire and police support. Add them to your plan.

Examine your security policy for a date that it was last reviewed or published.

Compare the written policy to how security is actually implemented at your facility.

MEDICAL CONCERNS The standard answer you hear to evaluating medical risks usually involves calling for an ambulance. This is a good answer. But when evaluating the likelihood of these risks, you might add to your disaster plan equipment and personnel who could provide aid while waiting for the ambulance to arrive. Examples are hanging emergency medical kits or defibrillators around the facility. Some companies register all employees who are certified Emergency Medical Technicians (EMTs) and pay them extra to carry a pager. In the event of a medical emergency, they are dispatched to the location to assist until proper medical support arrives. It may even make sense to staff an industrial nurse during production hours. Medical issues might include these:

Image Sickness. What is the risk of someone coming down with a serious sickness while at work? Some serious illnesses can come on suddenly.

Image Sudden Death. What is the risk of someone falling over dead? This risk should factor in the age of the workforce and the types of materials used in your facility.

Image Serious Accident. Do you use heavy machinery or high voltages in your processes? Are serious accidents a real risk in your line of business?

Image Fatal Accident. Along the lines of the serious accident, is there a risk of a fatal accident at your site?

What other Layer 2 Risks can you or your team identify? Add them to Form 3-2 on the CD-ROM.

WHAT TO DO?

Find out about local fire/ambulance service. What hours is it staffed? Is it full time or run by volunteers?

What is the distance from the stationhouse to your door?

Are there obstacles that might delay an ambulance, such as a drawbridge or surface-level railroad tracks?

What is the distance to a hospital?

LAYER 3: DATA SYSTEMS RISKS

Data systems risks are important because one problem can adversely affect multiple departments. Data systems typically share expensive hardware, such as networks, central computer systems, file servers, and even Internet access. A complete study of data system risk would fill its own book, so this chapter examines these risks from an end-user perspective.

Your data systems architecture will to a great degree determine your overall risks. Its design will reflect the technology costs and benefits of centralized/decentralized software and data. A more common company-wide risk is a loss of the internal computer network. With a heavy dependence on shared applications and data files, many companies are at a standstill without this essential resource. Even a short interruption will lose valuable employee time as they reconnect to the central service.

A major goal in examining data systems risks is to locate your single points of failure. These are the bottlenecks where a problem would have wide-reaching impact. In later chapters, we will review our single points of failure for opportunities to install redundant devices.

Some of the hidden risks in data systems are processes that have always been there and have worked fine for a long period of time. It is possible that they are running on obsolete machines that could not be repaired if damaged in a disaster, and their software program likely could not be readily transferred quickly to another processor. Your only choice is to try to make your old program function on the new hardware. As anyone who has tried to use an old program while leaping generations of hardware technology can tell you, this can be a time-consuming process. Due to the sudden change to new equipment and operating software, your programs may require substantial fine-tuning to run. This “forced upgrade” will delay your full recovery.

Computer programs exist in two forms. The “English-like” source code is what the programmer writes. The computer executes a processed version of the program called “machine code.” A typical data processing problem is finding the original source code. Without this, programs cannot be easily moved to a different computer. This leads to processes relying on obsolete languages or programs to work.

The risk analysis at this level is from the end-user perspective, as the data department should already have a current plan. If so, these items may be lifted from their plan.

WHAT TO DO?

Use the Critical Process Impact Matrix (Form 3-3) found on your CD. We will also use this matrix for Layers 4 and 5.

The Critical Process Impact Matrix will become a very valuable part of your disaster recovery plan. Whenever the IS department wants to restart the AS/400 over lunchtime to address an important error, you can sort the matrix by the platform column and see which systems will stop working during this time and thereby quickly see the impact of this action. You would also know which customer contacts to notify.

The matrix has the following columns:

Image System. Enter the name commonly used to refer to this overall computer system, such as Accounts Payable, Materials Management System, Traffic Control System, etc. However, this does not have to be a computer-based system as it can apply to any important process.

Image Platform. Enter the computer system this runs on, such as AS/400 #3, a VAX named Alvin, etc.

Image Normal Operating Days/Times. What times and days do you normally need this? Use the first one or two letters for the days of the week and enter 24 hours if it must always be up.

Image Critical Operating Days/Times. Use the same notation as for normal times and days. Some systems have critical times when it must be up for 24 hours, such as when Accounting closes the books at the end of the month, end of quarter, etc. Use as many critical days/time entries as you need.

Image Support Primary/Backup. Who in the IS department writes changes or answers questions about this system? These must be someone’s name and not a faceless entity like “Help Desk.”

Image Customer Contacts Primary/Backup. Who should the IS department call to inform them of current or upcoming system problems? Often this is a department manager.

Fill in the matrix. This will take quite a while. Every system on this list must have at least a basic disaster recovery plan written for it—but more on that later.

Now that we have identified the critical processes, we need to break each process down into its main components. Remember, this is only necessary for your critical processes. Use the Critical Process Breakdown matrix (Form 3-4 found on your CD). This matrix helps to identify the critical components for each system. By focusing on the critical components, we can keep this sheet manageable. If your facility is ISO compliant, then much of this is already in your process work instructions.

Image System. This name ties the Breakdown matrix to the Critical Process Impact Matrix. Be sure to use the same system names on both matrixes.

Image Platform. Enter the computer system this runs on, such as AS/400 #3, a VAX named Alvin, etc.

Image Key Components. There may be more than one of each item per category for each critical process.

Image Hardware. List specialized things here such as barcode printers, check printers, RF scanners, etc.

Image Software. What major software components does this use? This is usually multiple items.

Image Materials. List unique materials needed, such as preprinted forms or special labels.

Image Users. If this is widely used, list the departments that use it. If its use is confined to a few key people, then list them by name or title.

Image Suppliers. Who supplies the key material? If the materials required are highly specialized, then list supplier information. Ensure this is included on the key supplier list. If the material is commonly available, then we can skip this.

Data Communications Network

The data communications network is the glue that ties all the PCs to the shared servers and to shared printers. Without the data network, the Accounting department cannot exchange spreadsheets, the call center cannot check its databases, and the Shipping department cannot issue bills of lading.

A data network is a complex collection of components, so the loss of network functionality may be localized within a department due to the failure of a single hub card.

Based on the collective knowledge of your team, what do you believe is the likelihood of a failure of your data network? Ask the same question of your network manager. Based on these two answers, plug a value into the risk assessment for this category.

Telecommunications System

Modern Private Branch Exchanges (PBXs) are special-purpose computers, optimized for switching telephone calls. They may also include voice mail and long-distance call tracking.

Your facility’s telephone system is your connection to the outside world. If your company deals directly with its customers, special care must be taken because a dead telephone system can make them very uneasy. Telephones are used constantly internally to coordinate between departments and, in an emergency, to call outside for help.

Based on the collective knowledge of your team, what do you believe the likelihood is of a failure of your company’s telephone system? Ask the same question of your Telecommunications manager. Based on these two answers, plug a value into the risk assessment for this category.

Shared Computers and LANs

There are many types of shared computers used by companies. They usually are grouped under the old name of “mainframe” but refer to shared computers of all sizes. It also includes the common term of LAN (Local Area Network). These computers typically support a wide range of programs and data. When evaluating the risks here, you have two questions:

Image What is the risk of losing a specific shared application (such as inventory control, payroll, etc.)? You should list each critical application separately.

Image What is the risk of losing use of the machine itself? This could be due to damage to the machine or more likely through a hardware failure.

These risks should be based on the collective knowledge of your team. Ask the same question of your computer operations manager. Based on these two answers, plug a value into the risk assessment for this category. If desired, list each of the network servers individually.

Viruses

What do you think the likelihood is of a computer in your facility contracting a software “virus”? How severely would this interrupt business? What would your customers think of your company if, before it was detected, you passed the virus on to them? What if it struck a key machine at a critical time? What if its mischievous function was to e-mail out, to anyone in your address book, anything that had the words “budget,” “payroll,” or “plan” in the file name?

Most companies have an Internet firewall and virus scanning software installed. When evaluating this risk, ask your data manager’s opinion of the quality of his software. Ask how often the catalog of known viruses is updated.

Viruses can also enter your company through many other sources. Often they come in through steps people take to bypass the firewall or virus scanning, both of which take place only on files coming into your facility from the outside over your external data network.

Image Does your company allow employees to take their laptop computers out of the office, for example, to their homes? Are their children loading virus-laden programs? Are the employees downloading files from their home Internet connection that would be filtered out by their desk-side connection?

Image Does your antivirus software automatically update its catalog of known viruses, or must each person request this periodically?

Image Do consultants, vendors, or customers bring laptop PCs into your facility and plug into your network to retrieve e-mail or to communicate orders?

Image Is there virus-checking software to validate the attachments to your e-mail?

Data Systems

Theft of hardware (with critical data) can be a double financial whammy. You must pay to replace the hardware and then try to recreate valuable data. This risk spans your local site (do PCs disappear over the weekend?) all the way through laptop PCs taken on business trips.

Theft of software can be a major issue if someone steals a PC program and then distributes illegal copies of it. You may find yourself assumed guilty and facing a large civil suit. This can also happen if well-meaning employees load illegal copies of software around the company.

Theft of data can occur, and you will never realize it. This could be engineering data, customer lists, payroll information, security access codes, and any number of things. What do you believe your risk is of this?

Data backups are the key to rapid systems recovery. But what if you reach for the backup tapes and they are not readable? What is the risk that these tapes are not written, handled, transported, and stored correctly?

Hacker Security Break-In

One aspect of connecting your internal network to the Internet is that it is a potential portal for uninvited guests to access your network. Even well-built defenses can be circumvented with careless setup or news of gaps in your security firewall software. In some cases, they invade your system only to mask their identity when they attack a different company. This way, all indications are that you originated the attack!

Hackers generally fall into several categories, none of them good for you:

Image Curious hackers just want to see if they can do it. You never know when this person will advance to the malicious level, and they should not be in your system.

Image Malicious or criminal hacking involves invading your site to steal or to damage something.

Image In extreme cases, a hacker may conduct a denial of service attack and shut you down by bombarding you with network traffic, which overwhelms your network’s ability to answer all the messages.

What other Layer 3 risks can you and your team identify? Add them to the list in Form 3-5, Risk Assessment Form Layer 3, on the CD-ROM.

LAYER 4: DEPARTMENTAL RISKS

Departmental risks are the disasters you deal with in your own department on a daily basis. They range from the absence of a key employee to the loss of an important computer file. Most of these obstacles are overcome through the collective knowledge of the people in the department who either have experienced this problem before or know of ways to work around it.

At this stage of the risk analysis, we are looking at disastrous local problems. Consider for a moment what would happen if a worker changing light bulbs were to knock the head off a fire sprinkler. You know the ones I mean. A fire sprinkler nozzle typically protrudes from the ceiling into your office.

Losing a sprinkler head will put a lot of water all over that office very quickly. Papers will be destroyed, PCs possibly sizzled, and all work stopped for hours. The carpets will be soaked, water seeps through the floor to the offices on the floor below—what a mess!

A small fire is another localized disaster. It may spread smoke over a large area, making an office difficult to work in. Depending on how it was started and the extent of the damage, that area might be inaccessible for several days, especially if the Fire Marshall declares an arson investigation and no one is allowed near the “crime scene”!

Departmental risks also include the situation referred to in the data systems section where a unique device is used that is not easily or economically repairable. If this device is also a single point of failure, then you had better treat it like gold.

To build a departmental risk assessment, assemble a department-wide team to identify your critical functions, risks unique to your department, and risks to other departments that will cause problems in your group. Draft a fresh list of the critical functions that apply to your department. You can omit those functions already listed in the first three layers unless you are particularly vulnerable to something.

If a risk from an earlier layer will cause you to take particular action in your department, then include it here also. For example, if the loss of telephone service for your facility can be charged back against your telephone bill (based on your service agreement), then the Accounting department would need to time the outage and make the proper adjustment to their monthly bill. Another example is if you run the company cafeteria and an electrical outage threatens the food in your refrigerators.

Some examples of critical functions might include:

Image Payroll

Image To provide correct pay to all employees on time.

Image To maintain accurate payroll records for every employee.

Image To deduct and report to the appropriate government agency all payroll taxes that apply to every employee.

Image Materials

Image To maintain an accurate accounting of all material and its location in all storage locations.

Image To maintain an accurate accounting of all materials issued.

Image To ensure that material constantly flows to the manufacturing floor with minimal stock-outs, and with minimal inventory on hand.

Image Building Security

Image To provide immediate first aid to stricken employees until proper medical assistance arrives.

Image To maintain the integrity of the building security cordon at all times, even in the face of disaster.

Image To detect and notify appropriate authorities of any emergencies observed by security personnel.

Image To monitor all personnel on the premises after normal business hours and during weekends and holidays.

WHAT TO DO?

Make a list of critical processes for your department.

Take a copy of the Critical Process Impact list and pull off those processes unique to each department. Now expand it to include the critical processes in your department. Not all critical processes involve computers.

Break down the newly added critical processes into their components.

Key Operating Equipment

After identifying your department’s critical functions, make a list of your processes and equipment. This list will drive your department’s recovery plan. A process would be something like “Materials Management.” That process requires (within the department) access to the materials database, materials receiving docks, order processing, etc.

Is there a piece of equipment in your department whose absence would hinder your ability to perform your critical tasks? Is there an important printer directly tied to a far-off office or company? Is your only fax machine busy all the time? Does your payroll department have a dedicated time clock data collection and reporting system whose absence might prevent accurate recording?

Make a list of all your critical equipment. Be sure to include unique items not readily borrowed from a nearby department.

Lack of Data Systems

Begin with a list of all the data systems you use in your department. Add a column of who uses each system and for what function (some people may perform updates, some people may only write reports from it). You will find this list very useful later.

Most data systems have a manual process to record data or work around when it is not available. But set that aside and examine the risk that each system on your list might not be available. Here is a good place where the team’s collective experience can state how often a system seems to be unavailable.

Vital Records

What are the vital records originated, used, or stored by your department? List each category of records and where they are stored. Identify the risk (or damage) to the company if these records were lost or destroyed. Vital records are paper or electronic documents retained to meet business, regulatory, legal, or government requirements.

What other Layer 4 risks can you and your team identify? Add them to Form 3-6, Risk Assessment Form Layer 4, on the CD-ROM.

LAYER 5: YOUR DESK’S RISKS

This means more than avoiding paper cuts. You must examine every process (manual and automated), tool, piece of incoming information, and required output that makes up your job. Since you are so familiar with your daily work, this will be faster than you think. You are also familiar with your office priorities and can focus on the most critical functions.

Performing a Layer 5 risk analysis may seem to be a bit of overkill, but it closely resembles what was done at the department level. It is useful for ensuring that everything you need to do your job is accounted for in some manner, and may be in your department’s disaster recovery plan as nice to have but not essential. Still, if you want to go on vacation sometime, this documentation will make slipping out of the office a bit easier.

Layer 5 risks are a bit different because it really includes all of the risks from Layers 1 through 4. You should be able to start figuring out your critical functions from your job description. Next, you add in what you actually do and then you will have your critical functions list.

Make a list of the tools and data systems that you use every day. All of these should be in the departmental risk assessment. What is the likelihood that one of these tools will be missing when you need them? This means that the tools are only missing from your desk. Everyone else in the department can do their job. Therefore, if your job is the same as the person’s next to you, the risk at this layer is quite low that you could not complete your work since you could borrow the necessary equipment.

If you had confidential files on your PC and it crashed, that would be a risk. If you had a unique device that you used for your job, such as a specialized PC for credit card authorizations, then that is also a unique risk (but is probably in your departmental plan if it impacts one of their critical functions).

Another area to consider is vital records. Do you build or store vital records on or around your desk? Could there be a localized fire, water pipe breakage, etc., in your area that would soak these papers? This could be backed-up personal computer files, engineering specifications of old parts, employee evaluations, etc.

What other Layer 5 risks can you or your team identify? Add them to Form 3-7, Risk Assessment Form Layer 5, on the CD-ROM.

WHAT TO DO?

Make a list of critical processes for your department.

Take a copy of your department’s Critical Process Impact list and pull off those processes unique to your job. Now expand it to include all the critical processes for your position. Not all critical processes involve computers.

Break down the newly added critical processes into their components.

SEVERITY OF A RISK

As you consider such things as fire, you quickly notice that except in the total loss of the structure, it all depends on where and when the fire occurs. In addition, it depends on the day of the week and the time of day.

Time of Day

Imagine a large factory. It’s 7:00 AM and the assembly line has begun moving. Off to one side of the assembly line is a 300-gallon “tote” of paint, waiting for a forklift to carry it to another part of the facility. When the forklift approaches, the operator is distracted and hits the tote at a high rate of speed, puncturing it near the bottom with both of his forks. The punctured tote begins spewing hundreds of gallons of potentially toxic paint across the floor, into the assembly line area, etc. Of course, the assembly operation is shut down while a long and thorough cleanup process begins.

If this same forklift and the same operator were to hit the same tote after normal working hours, we would have the same mess and the same cleanup expense, but we could possibly have avoided shutting down the assembly line. With hard work, the assembly line could be ready for use by the next day. Therefore, the time of day that a disaster event occurs can have a major impact on its severity.

Day of Week

Along the same lines as the time of day, the day of the week (or for that matter, the day of the year) also determines the severity of a problem. If this same factory were working at its peak level with many temporary workers in an effort to deliver toys to stores in time for the Christmas season, this situation would be much worse than if it occurred during their low-demand season. If it happened on a Saturday instead of on a Monday, the severity would also be less as you have the remainder of the weekend to address it.

Location of the Risk

In terms of where this theoretical toxic material spill occurred, you can also quickly see that its location, near the assembly line, had an impact on how damaging it was. Some risks, like paint containers, float around a manufacturing facility. In an office, a similar situation exists. A small fire in an outside trash dumpster might singe the building and be promptly extinguished. The damage would be annoying, but your office productivity would not miss a beat.

The same small fire in your vital records storage room would be a disaster. Water damage to the cartons of paper would cause papers to stick together, cartons to weaken and collapse, and a general smoky smell that will linger for a long time. There is also a potential long-term problem with mold damaging the records.

SOURCES OF RISK ASSESSMENT INFORMATION

The Federal Emergency Management Agency (formerly known as Civil Defense) can provide you with a wealth of local information about your Layer 1 risks. It has already mapped the approved hazardous materials routes and know what the local natural disaster likelihood is. FEMA is listed in your telephone directory and can also be found at http://www.fema.gov. Figure 3-6 shows a sample of the type of maps available from the government that show the likelihood of various hazards; this map shows the probability of an earthquake occurring.

Local fire and police departments are also likely sources for information on anticipated arrival times for help. If you have a volunteer fire department, you would like to know their average response time for your area and what you might expect for timely ambulance support. The longer the delay in responding, the more mitigation steps that your company should plan for. Some volunteer departments staff a few full-time members to provide an immediate response and the rest of the volunteers join them at the accident site.

The local law enforcement authorities can also provide insight into crime activity patterns for determining your risk of theft or civil disorder.

Image

FIGURE 3-6: U.S. Geological Survey National Seismic Hazard Mapping Project.

MAKING THE ASSESSMENT

Wow! Now that we see that risks are all around us, that they vary in time, magnitude, and business impact, let’s make some sense of all of this. This is a good time to bring your Disaster Planning Project team together. The more “institutional knowledge” you can tap for this list, the better tool it becomes.

Scoring

OK, now the risk analysis sheets have been filled and the scores calculated. Now it is time to identify the more likely risks and build plans for them.

Scoring the list involves your judgment of several factors. First, how likely is it that this will occur? If you think about, given an infinite amount of time, you could predict that about everything will occur at least once. So for this scoring exercise, let’s use a 5-year horizon. Of course, you can use any timeframe you wish. Just be consistent.

We will use the electrical power outage as an example as we examine the column headings:

Image Grouping. These are the overall categories provided to keep similar issues together.

Image Risk. This is where you list the various risks to your business.

Image Likelihood. 0 through 10, with 0 being no likelihood at all, 1 to 3 if there is little chance of this type of disaster occurring, 4 to 6 if there is a nominal chance of occurrence, 7 to 9 if the disaster is very likely to occur, and 10 if it is a sure thing that the disaster will occur. Remember your planning horizon. If it is 5 years, be sure to keep that in the forefront of everyone’s mind. So over the next 5 years, what is the likelihood that the facility will lose electrical power at any time of the day, or any day of the week?

Image Impact. 0 through 10, with 0 being no impact at all, 1 to 3 if there is an inconvenience to some people or departments, 4 to 6 if there is a significant loss of service to some people or departments, 7 to 9 if there is a loss of a mission critical service, and 10 as a death sentence for the company. How badly would this disaster hurt us? To judge this, consider the problem occurring at the busiest time of the day, on the busiest day of the year.

Image Cost of Mitigation. 1 through 10, with 10 being there is little to no cost to mitigate the risk, 7 to 9 if the cost to mitigate can be approved by a supervisor, 4 to 6 if the cost to mitigate requires a department head to approve, and 1 to 3 if senior management approval is required to cover the cost of mitigation. This scale runs the opposite of the other two columns, as we assign high values to risks that are easier to mitigate. Carrying forward the electrical service example, what would it cost to mitigate the risk of losing power (which would probably require the installation of a standby generator)?

Sorting

The spreadsheet multiplies the Likelihood times the Impact times the Cost of Mitigation to get a rough risk analysis score. As you can see, a zero value in the Likelihood or Impact columns makes the risk score a zero.

You should sort the spreadsheet on the “score” column in descending order. This will bring your biggest risks to the top. These will be the risks that are the most likely, have the biggest impact on your operations, and are the easiest to mitigate. As you start your disaster recovery and mitigation plans, these risks deserve the most attention.

Setting Aside the Low Scores

It is true that there is a risk that the sun may quit shining within the next 5 years, but it is very low. So along with the risk of being run over by an iceberg, we will discard any of the extremely low likelihood risks. We will be fully occupied addressing the more likely ones.

Pick a point on each list and draw a line across it. All critical systems above the line will have plans written for them and plans for all below the line will come at some later time.

CONCLUSION

Your assessment of the risks faced by your operation is a critical piece of the business continuity puzzle. The steps in identifying the major risks to your operation as discussed in this chapter are:

1. First, determine the cost of downtime. This is critical when evaluating the potential avoidance and mitigation options.

2. Identify the potential risks at each of the five levels. Use a 5-year time horizon to keep things manageable.

3. For each risk, determine the impact based on the time of day, the day of the week, and the location where the disaster occurred. Each of these factors has an impact on the severity of the risk.

4. Identify and use outside sources of risk information, such as emergency response operations at the local and state level.

5. Prioritize the risks based on the severity of the possible damage, the probability of the risk occurring, and the difficulty of available avoidance and mitigation options. You’ll want to start with the risks that do the most damage, are the most likely, and are the easiest to avoid or mitigate.

Now that you’ve identified the risks that can affect your business, you are much better prepared to recover from any disaster. The steps required to identify risks are time consuming but are critical in building a foundation for your business continuity plans.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset