CHAPTER 6
WRITING THE PLAN
Getting It Down on Paper

No one plans to fail; they just simply fail to plan.
Disaster Recovery Journal

INTRODUCTION

Writing a plan is not difficult. It is as simple as telling a story to someone. It is the story of what to do. It addresses the basic concepts of who, what, where, when, why, and how of a process. Although you cannot predict exactly what will happen where, upon reflection, you can identify the basic steps that must be done in any emergency.

Throughout your plan writing process, keep in mind that emergencies affect people in different ways. Some will panic, others will sit and wait for the expert (but many are really waiting for someone else to take responsibility for any recovery errors), and some will make excuses and leave. The goal of your plan is to minimize this chaos by providing some direction to the people onsite so they can get started on containment and recovery. Once team members are in motion, the chaos lessens and their professional training will kick in.

It is impossible to write a specific recovery plan for every possible situation. Instead, the plans provide a set of guidelines to reduce the chaos at the point of incident and to position the company for a recovery once adequate facts become available. Whether you are rebuilding a data center due to a fire or due to a roof collapse, it is the same set of steps.

Business continuity plans come in many forms according to local requirements and the preferences of the person writing them. The CD contains four separate plans. Each plan is executed by a different team, based on the circumstances of the incident. They are:

Image Administrative Plan. Contains reference information common to all plans, such as vendor call lists, recovery strategy, risk assessment, etc.

Image Technical Recovery Plans. Many independent plans that contain the step-by-step actions to recovery of a specific process or IT system from scratch. These plans assume that the process must be rebuilt from nothing. These plans are often referred to when addressing local emergencies.

Image Work Area Recovery Plan. The details for relocating the critical company office workers to another site.

Image Pandemic Management Plan. Actions the company will take to minimize the impact of a pandemic. Unlike a data center disaster whose recovery can be completed in a few hours or days, a pandemic can easily run for 18 months or more.

The essential elements of a business continuity plan are that it is:

Image Flexible to accommodate a variety of challenges.

Image Understandable to whoever may read it (assuming they know the technology),

Image Testable to ensure that it completely addresses interfaces to other processes (in and out).

Writing your plan is simply documenting before the fact what should be done when a disaster strikes. The basic steps to follow are:

1. Lay the Groundwork. Here the basic decisions are made about who will execute the plan, what processes need a plan, the format of the plan, etc.

2. Develop Departmental Plans. Departments are the basic structure around which organizations are built; they are a good place to start developing your plans.

3. Combine your Departmental Plans into an Overall Corporate Plan. Here you check to ensure that departmental recovery activities do not conflict with one another and that any interdependencies are considered.

LAY THE GROUNDWORK

Your first step in developing continuity plans is to establish a standard format. This will give at least the first few pages of each plan the same “look and feel.” When drafting your plan, consider the following:

Image Who will execute it? If you are the local expert on that process, then why do you need a plan? The odds are you don’t. In a crisis, you would know what to do. But if you like to take days off, occasionally get sick, or even take a vacation, then whoever is on the spot when the emergency occurs must be able to stand in for you and address the problem. A plan must consider who may be called on in an emergency if the expert is not available. Another consideration is that if you are the manager over an area, and you want to be able to recover a process in case the “expert” is promoted, transferred, quits, or is discharged, a written plan is essential. You should especially look for highly stable processes that never break and no one has experience working on. They must have a plan on file since whoever worked on them may have already left the company.

Image How obvious is the problem? Some problems, like magnetic damage to backup tapes, are invisible until you try to read them. Other problems, such as the entire building shaking in a massive earthquake, are easier to recognize. If a problem is hard to detect, then step-by-step troubleshooting instructions are necessary.

Image How much warning will there be? Is a severe thunderstorm in your area often a prelude to a power outage? Will the weather forecast indicate a blizzard is imminent? However, if a local building contractor cuts your connection to the telephone company’s central office, there is no warning of an impending problem at all. Emergencies that provide a warning, such as a weather bulletin, often trigger automatic containment actions. This might be to purchase extra flashlight batteries, install sandbags, or to have essential technical personnel pitch camp within the building in case they are needed.

Image How long must they continue running with this plan before help arrives? Should they have enough information to contain the problem for 10 minutes or 2 hours?

Image How soon must the process be restored before the company suffers serious damage? This is called the recovery time objective for this process.

Image Are there any manual workaround actions that can be used until the process is restored? For example, if your payroll computer system dies at the very worst moment, can you write 40-hour paychecks for everyone? This makes a mess for the Accounting department to clean up later, but in an organized labor facility, the worker’s contract may allow them to walk off the job if their paychecks are late.

What Needs Its Own Plan?

Is the answer anything that could break? Some processes are like links in a chain, where the failure of any single item brings the entire process to a stop, such as a data network. In this case, any number of items along a chain of equipment could be at fault. The plan would step you through the basic fault-location steps and tell you what to do to address the problem you find. Some problems are isolated to one or a few devices, such as a Web server failure. In this case, you would focus all efforts on the server and its connections to the network.

You must have a plan for every critical business function identified by the Business Impact Analysis (BIA). This includes manual processes and every piece of critical equipment that supports the facility. For each critical business function, explain the steps necessary to restore the minimal acceptable level of service. This level of service might be achieved by performing machine functions with manual labor. It might be achieved by shifting the work to another company site or even paying a competitor to machine parts for you. The goal is to keep your company going. Optional plans may be written to support those functions essential to your own department (and peace of mind), but that are not essential to the facility’s critical business functions.

Consider how a plan will be used when you write it. Your goal is not a single large soups-to-nuts document. Usually a department has an overall plan for recovering its main processes or machinery and then specific action plans for individual problems. For example, Vital Records may have detailed plans for recovering documents based on the media on which they are stored. This information should be readily available to the department. But some specific action should be kept on laminated cards and provided to the security guards (for after-hours emergency action) and posted on the walls of the rooms it affects. Examples of these laminated pages might be immediate actions to take for a water leak, for an electrical outage, for a fire, etc. See Form 6-1, Sample Business Continuity Action Plan (on the CD), for an example.

Another example would be an electrical outage in the computer room. The overall plan will contain information on calling the power company and who to call for emergency generators, etc. But, a notice on the wall of the computer room will provide specific power shedding instructions and indicate immediate steps to take to monitor and potentially reduce the load on the UPS.

Word Processing Guidelines

Your company may have some specific guidelines in place for important documents like this. If not, consider these guidelines for the plans.

PAGE LAYOUT

Image Set your word processor to default to 12 point, Arial font (don’t make me search for a pair of glasses in the midst of a crisis!).

Image Set the page footers to include a page number in the center and the current date in lower lefthand corner. This date will help to indicate which copy of the plan is the latest. The footers should also include the phrase “Company Confidential” on every page.

Image Each document should read from major topic to minor topic—or broad view to narrow view. The beginning of the document deals with actions that would affect the entire process and, as you move further into the document, more specific issues would be addressed.

DOCUMENT FORMAT

Image On the first page, include a brief narrative (one paragraph) of the business function of the equipment that this particular plan supports.

Image The name of the primary support person.

Image The name of the secondary support person.

Image The name of the primary customer for this process (Accounting, Manufacturing, Sales, etc.). It is better that you tell them what is wrong than they find out there is a problem the hard way.

Image Immediate action steps to contain the problem.

Image Known manual workaround steps to maintain minimal service.

Image In the case of telecommunications, data networks, or data processing services outages, include the names of other technical employees in sister companies with expertise in this area who can be called onsite in a crisis.

DEPARTMENTAL PLANS

A departmental recovery plan has several components. The main component is the plan itself, a narrative that explains the assets involved, the threats being addressed, the mitigation steps taken, and what to do in the event of a disaster. This sounds simple enough, but such a plan could easily fill notebooks. Instead, base your plans on a primary scenario with specific threats addressed in attached appendices. In addition, more abbreviated instructions for security guards, computer operators, etc., should be included as part of the departmental recovery plan.

The main part of the plan has three major components:

1. Immediate Actions. Steps that anyone can take to contain the damage (similar to applying first aid to an injured person). This involves simple tasks, such as shutting off the water main to stop a leak, evacuating people if there is a toxic spill, or opening the computer room doors if the air conditioning fails. Once people are safe, an early action in “Immediate Action” is to alert the appropriate people for help. It takes time for them to drive to the disaster, so the earlier you call, the sooner they will arrive.

2. Detailed Containment Actions. To reduce the spread or depth of damage. What else can be done until the “experts” get there? What actions should the “experts” take after they arrive to stop the damage from spreading?

3. Recovery Actions. To return the process to a minimal level of service is an important third component of every plan. This is the part that most people think about when considering disaster recovery planning.

There are four inputs into building your plan. First, begin with the Critical Process Impact Matrix you developed in your BIA (Form 3-3). This lists the critical processes and the time of day that they are essential. This list was further broken down in the Critical Process Breakdown matrix (Form 3-4). These two tools can provide the essential information for building your plans. Add to these lists your risk assessment and your process restoration priority list. With these items, you have everything necessary to write your plans. Write your primary plan for the worst case scenario—complete replacement of the process.

In many cases, the damage is caused by multiple threats, but their associated recovery steps are the same. Therefore, a plan that details what to do in one disaster situation is probably applicable to most other situations. For example, the loss of a critical computer server due to a fire, physical sabotage, or a broken water pipe would have essentially the same recovery steps. Separate plans are not necessary, although the mitigation steps for each threat in the example would be quite different.

Begin by drafting your plan to address this central situation. Add to the central plan an appendix for any other specific threats or recovery actions you think are appropriate. All together, this is your department’s (or critical processes’) disaster recovery plan and should be available in your office, with a printed copy at your home and your assistant’s home. In addition, the plan administrator must maintain both a printed copy and an electronic copy. (Recovery plans contain information useful to people with bad intentions, so keep them in a secure location.)

Looking at your department’s main plan, you still have a document that is too unwieldy to use in the first few moments of the crisis. Remember, emergencies are characterized by chaos. Some people are prone to act, and others are prone to run in circles. You need to have something quick and easy to follow in the hands of those who will act. These terse instructions must detail basic disaster steps to safeguard people and to contain the damage. They are usually laminated and posted on the wall. Include them as an appendix to your plan identified with their own tab.

As you write your plan, consider the following:

Image Who Will Execute This Plan? A minimum of three people must be able to execute a plan: the primary support person, the backup support person, and the supervisor. Usually, the weak link is the supervisor. If that person cannot understand the plan, then it is not sufficiently detailed or it lacks clarity.

Most facilities operate during extended first-shift hours, from Monday through Friday. However, if this plan is for a major grocery store, it might be open 24 hours a day, 7 days a week. Problems occur in their own good time. If they occur during normal working hours, and your key people are already onsite, then the emergency plan is to summon these key people to resolve the problem. Referring to the written plan will also speed recovery, since time is not wasted identifying initial actions.

However, if the problem arises at 3:00 AM on a Sunday and is discovered by the security guard, he needs to know the few essential containment actions to take until help arrives. Because this is the worst case scenario—someone unfamiliar with an area tasked to contain a problem—this is the level of detail to which you must write. One of the first action steps is always to notify the appropriate person of the problem. This gets help in motion. Then, the person on the spot works on containment until that help arrives.

This approach works well with crises that are common knowledge or are basically understood by the general population, such as the sounding of fire alarms, burst water pipes, or power outages. But for some of the technical areas, such as data processing, writing such a level of detail would make a volume of instructions so thick that the computer room would have long since burned down while the containment team struggled through the text. In those cases, the level of detail should be sufficient for someone familiar with the technology, but unfamiliar with this particular piece of equipment, to work through the steps. In addition, specific containment actions should be posted on the wall so that the vital first few minutes are not wasted looking for a misfiled disaster plan book.

Image How Obvious Is the Problem? Standing in an office with water lapping over your shoe tops is a sure sign of a problem. Smoke pouring out of a room is likewise a sign that immediate action is needed. When drafting your plan, consider how obvious the problem might be to the typical person. Obvious problems are usually of the on/off type, such as electrical service, air conditioning, machine-works-or-it-doesn’t type of situations.

Problems that are difficult to pinpoint require step-by-step troubleshooting instructions. In these cases, something stops functioning, but the cause isn’t obvious. In these instances, the call for help goes out first, but if there is anything that the person on the spot can do, then he or she should have detailed instructions on how to do it. For example, if a critical piece of shop floor machinery stops working, yet everything else in the factory is working fine, your immediate action troubleshooting steps would include tracing the data communications line back to the controller and back to the computer room to look for a break in the line. The plan should identify all the system interdependencies so they can be checked.

Image How Much Warning Will They Have Before the Problem Erupts? Most weather-related problems are forecast by local news services. Flood warnings, severe thunderstorm warnings, and tornado watches are all forewarnings of problems. If your facility is susceptible to problems from these causes, then you can prepare for the problem before it strikes. However, the first indication of many problems does not appear until the problem hits, such as a vital machine that stops working or the loss of electrical power.

Image How Long Must They Continue Running with This Plan Before Help Arrives?
Begin with immediate actions steps, sort of like first-aid. There are always some basic actions that can be taken to contain the damage and prepare for the recovery once the “experts” appear. Detail these in your plan.

Some plans have a short duration. For example, in the case of a computer room power outage, only so much electrical power is available in the UPS before the batteries run dry. By turning off nonessential equipment, this battery time can be extended in the hopes that power will be restored soon. This assumes the person standing in the computer room knows which equipment is not essential or has a way to identify these devices. In this case, the time horizon for the containment plan is the maximum time that battery power remains available, or until the computer operations manager arrives to begin shutting down noncritical servers.

A different example is in the case of a broken water pipe. Shutting the water main to that portion of the building is the immediate action to stop the damage, at which time you switch over to containment efforts to prevent the water from spreading and the growth of mold. Your immediate actions steps would list the facility maintenance emergency telephone number or tell the person the location of the water shutoff valve.

In any case, if people in the affected room or adjacent rooms are in danger, the first step is always to notify and evacuate them. Safeguarding human life is always the number one immediate action step!

Image Manual Workaround. Most automated processes have a manual workaround plan. Unfortunately, this plan is rarely written down. If you know that one exists, put it on paper immediately. If you don’t know about this, ask the process owner. Manual workaround processes may not have the same quality, they may require many more workers, and they may require substantial overtime work just to keep up, but they may quickly restore your process to a minimal level of operation (the least that a disaster plan should provide). Manual workarounds may allow you to go directly to the recovery phase with minimal containment actions.

Some manual workaround processes for computer systems will require a data resynchronization action when the computer system returns to service. In those cases, work logs must be maintained of the items processed manually so that the data files can return to accuracy.

I Still Don’t Know What to Write!

Write your plan in the same way as if you were explaining it to someone standing in front of you. Overall, you start with the overview and then drill down to the details. For example, if you were writing a plan to recover the e-mail server, you would state what the system does, its major components, and any information about them. Then you would have a section explaining each major component in detail.

Imagine that you are standing in a room when an emergency occurred. Also imagine several other people in the room who work for you and will follow your directions. Now imagine that you can speak, but cannot move or point. What would you tell them to do? Where are your emergency containment materials? Whom should they call, and what should they say? Write your plan in the same conversational tone that you use when telling someone what to do.

Include pictures and drawings in your plan (for example, floor plans showing the location of critical devices in a building). Digital cameras can be used to create pictures that can be easily imported into a word-processing program.

It is also very important to include references to the names of the service companies that have support contracts for your equipment. In the back of the notebook, include a copy of the vendor contact list, so they know whom to call with what information (such as the contract identification number).

So, the plan for your department will include:

1. Immediate Actions.

a. Whom to call right away.

b. Appendices: specific threats.

Image Loss of electricity.

Image Loss of telephone.

Image Loss of heating, air conditioning, and humidity control.

Image Severe weather and low employee attendance (perhaps due to a blizzard or flood; how can you maintain minimal production?).

2. Detailed Containment Actions.

a. What to do to reduce further damage.

b. First things the recovery team does once onsite.

3. Recovery Actions.

a. Basic actions.

b. Critical functions.

c. Restoration priorities.

4. Foundation Documents.

a. Asset List.

b. Risk Assessment.

c. Critical Process Impact Matrix.

d. Critical Process Breakdown Matrix.

5. Employee Recall List.

6. Vendor List.

7. Manual Workaround Processes.

8. Relocating Operations.

How Do I Know When to Stop Writing?

Your primary plan only needs to contain enough explanation for someone to restore service to minimal acceptable levels. Once you have established that, your normal approach for handling projects can kick in. Some plans only cover the first 48 hours. As an alternative to setting a time guideline, link it to the function the plan is intended to protect and then it takes however long it takes.

Provide as much detail as necessary to explain to someone what they need to do. For the Immediate Action pages, assume they are unfamiliar with the details of the function and keep your instructions simple and to the point. For your primary plan, assume they are familiar with the function and understand basically how it works.

To be useful, your plan must be clear to others and include all pertinent details. The best way to know if your plan is sufficient is to ask someone to read it. Hand it to someone and then leave the room. See if they can understand and would be able to act on it. What is clear as day to you may be clear as mud to someone else. Then test it again without the involvement of your key staff members.

RECOVERY PLANNING CONSIDERATIONS

Prompt recovery is important to a company. It is also important to you because if the company has a hard time recovering, the owners may simply close your office and absorb the loss. For the sake of yourself and your fellow employees, include recovery considerations in your plan.

1. Planning. Each of these steps can provide valuable information for your plan development.

Image Before an emergency arises, contact disaster recovery organizations that support your type of department. For example, if you are in charge of the company’s Vital Records department, you might meet with and negotiate an on-demand contract for document preservation and recovery. Then you would know whom to contact and what to expect from them. They might offer some free advice for inclusion in your plan.

Image Every department must have a plan for relocating its operations within the facility. A classic example of this is an office fire where the rest of the facility is intact. Your offices would be moved into another part of the facility until the damage is repaired, but the company’s business can continue.

Image Meet with your insurance carriers to discuss their requirements for damage documentation, their response time, and any limitations on your policies. This is a good time to review the company’s business disruption insurance policy to see what it does and does not cover. Different parts of the facility may have different insurance specific to their type of work.

Image Meet with vendors of your key equipment to understand how they can help in an emergency. Some equipment suppliers will, in the case of a serious emergency, provide you with the next device off of their assembly line. (Of course, you must pay full retail price and take it however it is configured.) If this is something you wish to take advantage of, then you must clearly understand any preconditions.

Image Meet with the local fire, police, and ambulance services. Determine what sort of response time you should expect in an emergency from each. Identify any specific information they want to know from you in an emergency. Understanding how long it will take for the civil authorities to arrive may indicate how long the containment effort must allow for, such as for a fire or for first aid in a medical emergency.

Image Consider shifting business functions to other sites in case of an emergency such as specific data processing systems, the sales call center, and customer billing. The effort is not trivial and may require considerable expense in travel and accommodations, but again, the goal is to promptly restore service.

2. Continuity of Leadership. When time is short, there is no time for introductions and turf battles. Plan for the worst case and hope for the best. Assume that many key people will not be available in the early hours of an emergency.

Image Ensure that your employees know who their managers are, and who their manager’s managers are. A good way to approach this is to schedule luncheons with the staff and these managers to discuss portions of the plan.

Image If you plan to use employees from a different company site in your recovery operations, bring them around to tour the site and meet with the people. Although an introduction is a good start, the longer the visit the better the visual recognition later during an emergency.

Image When exercising your plan, include scenarios where key people are not available.

3. Insurance. Cash to get back on your feet again. Evaluating your current insurance and selecting additional coverage should involve insurance professionals to sift through the details. In light of that, consider:

Image What sort of documentation does the insurance company require to pay a claim? Does it need copies of receipts for major equipment? If I show a burned-out lump of metal, will the insurer believe me that it used to be an expensive server?

Image If the structure is damaged, will the insurer pay to repair the damage? What about any additional expense (beyond the damage repair) required for mandatory structural upgrades to meet new building codes?

Image In the event of a loss, exactly what do my policies require me to do?

Image What do my policies cover? How does this compare to my risk assessment?

Image Am I covered if my facility is closed by order of civil authority?

Image If attacked by terrorists, does the company still have a claim or is that excluded under an “acts-of-war” clause?

Image Can I begin salvage operations before an adjuster arrives? How long will it take them to get here? What about a wide-area emergency? How long must I wait for an adjuster then?

4. Recovery Operations.

Image Establish and maintain security at the site at all times. Prevent looting and stop people from reentering the structure before it is declared to be safe.

Image During recovery operations, keep detailed records of decisions, expenses, damage, areas of destruction, and where damaged materials were sent. Use video and still cameras to photograph major damage areas from multiple angles.

Image Plan for a separate damage containment team and a disaster recovery team. The containment team focuses on limiting the damage and is very much “today” focused. The recovery team starts from the present and focuses on restarting operations. Its goal is to restore the minimal acceptable level of service.

Image Keep employees informed about your recovery operations. They have a lot at stake in a recovery (their continuing employment) and are your staunchest allies.

Image Protect undamaged materials from such things as water, smoke, or the weather by closing up building openings.

Image Keep damaged materials onsite until the insurance adjuster releases them.

PREPARE A DOCUMENT REPOSITORY

A business continuity program generates a lot of documents. Recovery plans, Business Impact Analyses, risk assessment, and test results are just examples of the many things that must be kept handy. Further, many people contribute and maintain these documents. A central place is necessary to store everything so that it can be found when needed. There are several popular options:

Image Establish a file share with subdirectories to separate the technical plans from the public areas. This is inexpensive and access permissions are controlled by the Business Continuity Manager.

Image Use a document management product, such as Microsoft’s Sharepoint™. This also tracks who has which document checked out for updates.

Image Another alternative is to purchase a purpose-built product such as Strohl’s™ LDRPS (Living Disaster Recovery Planning System) which can be used to build an automated DR plan.

The challenge is to control access to plans so that the Business Continuity Manager ensures the quality and accuracy of anything accepted for storage. Some people will write little and call it enough. They will want to store it and declare the job complete. Other well-meaning people may want to use their unique recovery plan format which will also cause confusion. Whatever tool you use, set aside a submissions area to receive proposed plans that will be reviewed

To be useful in a crisis, the repository must be available at the recovery site. This may mean that it runs on a server at a third-party site or at the recovery site. This introduces other issues such as ensuring the network connection to the server is secured.

CONCLUSION

Writing a business continuity plan seems like a big project. As with any big project, break it down into a series of smaller projects that are not quite so intimidating. Starting at the department level, work up the organization, combining department plans as you build toward an organization-wide business continuity plan.

Developing the plan is an iterative process, and you won’t get everything right the first time. Testing the plan, discussed in a later chapter, will help to verify what you’ve written and point out gaps in the plan. Your plan should become a living document, never finally done, but changing as the organization grows and changes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset