Chapter 5

A Typology of Resilience Situations

Ron Westrum

‘Resilience’ involves many different matters covered by the same word. None the less I believe we can see some commonalities in the threats that resilience protects against and the modes in which it operates. The aim of this paper will be to sort out these threats and modes, which I believe provide different implications for developing resilient systems.

Resilience against What?

The first question to examine is the nature of the threats to the integrity of the system that require it to be resilient. There are basically three aspects to threats:

•  The predictability of the threat. Predictability does not mean that we can predict when the event will take place, but only that it takes place fairly often.

•  The threat’s potential to disrupt the system.

•  The origin of the threat (internal vs. external). If the origin is internal, we think of internal checks, safeguards or quality controls that would repair internal errors or pathogens and keep them from spreading. An external origin requires a response on the system’s part. (Protection against threats are usually covered under the rubric of ‘defenses’, but this word does not distinguish between the two origins.)

With these basic aspects, let us see what we can do to develop a classification of situations. The situations described here are merely illustrative of the variety of scenarios likely to be met. Clearly, for practical affairs there might be reasons to develop a more exhaustive and complete set, thus a far more detailed tableau could be created.

Situation I. The Regular Threat

Regular threats are those that occur often enough for the system to develop a standard response. Trouble comes in one of a number of standard configurations, for which an algorithm of response can be formulated. Clearly, the least scary threat is the predictable one from inside the system, with low potential to disrupt the system as a whole. Medication errors, of which Patterson et al. (2004a) have given an excellent example, would fall into this category. This does not mean that such errors cannot be fatal. As the case study shows, they often have this potentiality. Nonetheless, they occur often, usually only implicate a single patient in the hospital, and potentially can be brought under control.

A similar example would be the following story by chemistry Professor Edgar F. Westrum, Jr, my father:

About once a year, a chemistry student would go into one of the laboratories, start mixing chemicals at random, and eventually there would be a ka-whump! when one of the chemicals proved combustible when added to the others. People would dash to the lab to see what happened. The student would be taken to the hospital for repair, and the laboratory would be cleaned up.

At the next level we have the predictable external threat. This is more disturbing. In Israel bus bombings occur regularly enough to have allowed one of the hospitals to provide a standard response. Richard Cook, during the conference, described how the Kirkat Menachem hospital was able to shift resources between services to provide an effective response when a bus bombing occurred (Chapter 13). In countries where earthquakes are common, programmed responses, such as going outside, are also drummed into the local population.

Situation II. The Irregular Threat

The more challenging situation is the one-off event, for which it is virtually impossible to provide an algorithm, because there are so many similar low-probability but devastating events that might take place; and one cannot prepare for all of them. This kind of threat provides an understood, but still challenging problem. The Apollo 13 moon mission accident would be a good example here. When the event took place, it was entirely unexpected, though hardly seen as impossible. The spacecraft’s spontaneous explosion and venting, though it did not destroy the whole structure, required the most drastic measures so that the astronauts could survive and return to earth. Fortunately, the Americans had a gifted team at NASA flight control, and so even though the mission was abandoned, the astronauts were brought safety back (Kranz, 2000).

While Israeli bus bombings typically fall into Situation I, other bombings, with far more casualties, belong in situation II. One such case was the suicide bombing of an American mess tent in Forward Operating Base Marez, in Iraq, on December 21, 2004. In 11 hours the 67th combat support hospital took in 91 casualties. There were 22 deaths. The halls were crowded, the parking lot overflowed with the dead and the dying, the conference room was turned into a blood donation center. Armored combat vehicles served as ambulances ferrying patients. Nine surgeries were carried out in a small operating theatre inside, while ten more had to be performed outside the operating room. At one point, a doctor began marking the time of death on the chests of the deceased with a felt-tip pen. They could barely get the dead into body bags fast enough. Finally, the hospital compound itself came under fire. The emergency was handled, but only just. This kind of emergency clearly tests the organization’s ability to self-organise and respond effectively to crisis (Hauser, 2004).

Situation III. The Unexampled Event

Situation III is marked by events that are so awesome or so unexpected that they require more than the improvization of Situation II. They require a shift in mental framework. It may appear impossible that something like the event could happen. Whereas Situation II is basically a scale-up of Situation I, Situation III pushes the responders outside of their collective experience envelope. The 9/11 bombing of the World Trade Center is a prime example of such an event. The explosion at Chernobyl nuclear center is a similar example. An extreme example would be an invasion from outer space (see Dwyer, 2002).

Again unlike categories I and II, this level of threat cannot be anticipated neatly enough to permit construction of a response algorithm. Instead the basic qualities of the organization, its abilities to self-organize, monitor and formulate a series of responses, will determine whether it can react effectively. The response of both developed and undeveloped countries to the AIDS pandemic in the 1980s shows the great differences in coping styles of different countries (Institute of Medicine, 1995; Kennis & Marin, 1997).

Especially in the developing world, Situation III events can be devastating, because the level of resilience is so low. The third world both seems to take greater risks, and has fewer resources to respond when things go wrong. All this was horribly illustrated when the overloaded ferry Le Joola sank off Senegal on the night of 26 October 2002; 1863 people died. This was a larger death toll than the sinking of the Titanic, whose loss of life was 1500+. The ferry, which rolled over in a storm, had been loaded to triple its design capacity, the result of several unsafe bureaucratic practices, including a suicidal inability to limit entry to the ship. Fishermen saved 64 people after the accident, but there was no further rescue. Many of those who drowned were the victims of the incredibly slow response to the disaster. The saving of more survivors was imperilled by a corrupt bureaucracy. The radio operators on shore were not at their stations overnight, so the sinking was not noticed until the next day, when it was too late to rescue most of those who might have been saved by an early response. And even then help was further delayed (Fields, 2002). The approach of many third world countries to safety could well be described by the phrase ‘risk-stacking,’ since several types of vulnerability are taken on at once. Similarly, after the Bhopal fertilizer plant explosion in India, the government’s response was so ineffective that many more people died than would otherwise have been the case (Shrivastava, 1987).

Time: Foresight, Coping, and Recovery

A second issue is: Where does the event lie on the organization’s time horizon? Protecting the organization from trouble can occur proactively, concurrently, or as a response to something that has already happened. These are all part of resilience, but they are not the same. Resilience thus has three major meanings.

•  Resilience is the ability to prevent something bad from happening,

•  Or the ability to prevent something bad from becoming worse,

•  Or the ability to recover from something bad once it has happened.

It is not clear at this point that possession of one of these capabilities automatically implies the others, or even that they form a Guttman scale, but either of these might be possible.

Foresee and Avoid

The ability to anticipate when and how calamity might strike has been called ‘requisite imagination’ (Adamski & Westrum, 2003). This ability, often important for the designer of technology, is also important for the operating organization. One might want to call this quality ‘the wily organization.’ The kind of critical thinking involved here has been studied in part by Irving Janis, famous for his analysis of the opposite trait, groupthink (Janis, 1982).

For the sake of analysis, let us examine two types of foresight. The first comes from properly learning the lessons of experience. In this case predictable threats are handled by programming the organization to remember the lessons learned. A recent study of Veterans’ Administration physicians showed that they frequently forgot to use known effective methods of dealing with cardiac and pneumonia problems. The results were devastating; there have been thousands of unnecessary deaths. After many of the simple methods forgotten were reinstated, hospitals saw their relevant mortality rates drop by as much as 40% (Kolata, 2004).

The second type of foresight is that associated with the processing of ‘faint signals.’ These can include symptomatic events, suspected trends, gut feelings, and intelligent speculation. In another paper, I have suggested that ‘latent pathogens’ are likely to be removed in organizational cultures with high alignment, awareness, and empowerment. These boost the organization’s capability to detect, compile and integrate diverse information. This capability helps in detecting ‘hidden events’ but also encourages proactive response to dangers that have not yet materialized.

Another striking example is the surreptitious repair of the Citicorp Building in New York City. Its architectural consultant, William LeMessurier, suspected and found that the structure had been built with changes to his original plan, using rivets instead of welds. A student’s question stimulated LeMessurier to check his building’s resistance to ‘quartering winds,’ those that attack a building at the corners instead of the sides. When it was clear that the changes were fateful, the architect, the contractor, and the police conspired to fix the building without public knowledge, in the end creating a sound structure without inciting a public panic or a serious financial loss for Citicorp (LeMessurier, 2005).

One of the key decisions affecting the Battle of Britain was taken by a private individual, Sidney Camm, head of the Hawker firm of aircraft builders. Even before the war with the Axis had begun, Camm realized that Britain would need fighters in the event of a German air campaign against England. He expected soon to get an order to build more Hurricane fighters. But rather than wait until the government ordered them, Camm commenced building them without a contract, the first production being in October 1937. In the Battle of Britain every single aircraft proved indispensable, and Hurricanes were the majority of those Britain used. Camm was knighted for this and other services to England (Deighton, 1996; org. 1978).

By contrast, organizations that indulge in groupthink and a ‘fortress mentality’ often resist getting signals about real or potential anomalies. This may take place in the context of faulty group discussion, as Irving Janis has forcefully argued (Janis, 1982), or through the broader organizational processes involving suppression, encapsulation, and public relations that I have sketched in several papers (e.g., Westrum & Adamski, 1999, pp. 97-98). During the time when the space shuttle Columbia was flying on its last mission, structural engineers at the Johnson Space Center tried to convince the Mission Management Team that they could not assure the shuttle’s airworthiness. They wanted to get the Air Force to seek photographs in space that would allow them to make a sound decision. In spite of several attempts to get the photographs (which the Air Force was willing to provide), the Mission Management Team and other parties slighted the engineers’ concerns and refused to seek further images. Interestingly enough, the reason for this refusal was often that ‘even if we knew that there is damage, there is nothing we can do.’ This assertion may have been justified, but one would certainly have liked to see more effort to save the crew (Cabbage & Harwood, 2004). One wonders what might have happened if a NASA flight director such as Gene Kranz had been in charge of this process.

It is tempting to contrast the striking individual actions of LeMessurier and Camm with the unsatisfactory group processes of the Columbia teams. However, the key point is the climate of operation, no matter how large the group is. There are individuals who personally can serve as major bottlenecks to decision processes and groups who become the agents of rapid or thoughtful action. It all depends on leadership, which shapes the climate and thus sets the priorities.

Coping with Ongoing Trouble

Coping with ongoing trouble immediately raises the questions of defences and capabilities. A tank or a battleship is resilient because it has armor. A football team is resilient because its players are tough and its moves are well coordinated. But often organizations are resilient because they can respond quickly or even redesign themselves in the midst of trouble. They may use ‘slack resources’ or other devices that help them cope with the struggle. The organization’s flexibility is often a key factor in organizing to fight the problem. They are thus ‘adaptive’ rather than ‘tough.’ This is true, for instance, of learning during conflict or protracted crisis. Many examples of such learning are apparent from the history of warfare. There are many aspects to such learning. One aspect is learning from experience.

Such learning is the inverse of preconception. In the American Civil War, union cavalry with repeating rifles typically won the battles in which they were engaged. In spite of this experience, the repeating rifles were confiscated from cavalry units after the war and the cavalry were issued single-shot weapons. This was thought necessary to save ammunition. It did not serve American soldiers very well (Hallahan, 1994, pp. 198-199). Preconception can survive many severe lessons.

In World War II, British bureaucracy often got in the way of carrying on the war. This apparently was the case with preconceptions about the use of cannon. Gordon Welchman, in a book well worth reading, comments on British inability to use weapons in World War II for purposes other than their original one.

In weaponry, too, the British thought at that time suffered from rigidity and departmentalization. The German 88-mm gun, designed as an anti-aircraft weapon, was so dangerous that four of them could stop an armored brigade. The British had a magnificent 3.7 inch anti-aircraft gun of even greater penetrative power, but it was not used as an antitank weapon in any of the desert battles. It was intended to shoot at aircraft. The army had been supplied with the two-pounder to shoot at tanks. And that was that!

This is an example of the slow response of the British military authorities to new ideas. The Germans had used the 88-mm anti-aircraft gun as an extremely effective weapon against the excellent British Matilda tanks during the battle of France, as I well remember from German messages decoded by Hut 6 at the time. (Welchman, 1982, p. 229)

Another instance of inflexibility in World War II involved aircraft parts. During the Battle of Britain the supply of aircraft lagged disastrously. Some parts factories were slow in providing the Royal Air Force with new parts; other facilities were slow in recycling old ones. Lord Beaverbrook, as head of aircraft production, was assigned by Winston Churchill to address the problem. Beaverbrook’s aggressive approach operated on many fronts. He rounded up used engines, staged illegal raids on parts factories, chloroformed guards, and got the necessary parts. He corralled parts into special depots and guarded them with a ‘private army’ that used special vehicles called ‘Beaverettes’ and ‘Beaverbugs’ to protect them. When Churchill received complaints about Beaverbrook’s ruthless approach, he said he could do nothing. Air power was crucial to England, whose survival hung in the balance (Young, 1966, p. 143).

The ability to monitor what is happening in the organization is crucial to success in coping, whether this monitoring is coming from above or with the immediate group involved. The Report on 9/11 showed that coping with the attacks on the Pentagon and World Trade Center was often defective because of poor organizational discipline, previous unresolved police/fire department conflicts, and so forth (National Commission on Terrorist Attacks, 2004). For instance, New York’s inability to create a unified incident command system, to resolve conflicts between police and fire department radio systems, and to insure discipline in similar emergencies meant a large number of unnecessary deaths. Fire-fighters who heard the distress calls often went to the scene, without integrating themselves into the command structure. This led to on-the-scene overload and confusion, and an inability of the supervising officials to determine how many fire-fighters were still in the structure. Similarly, after the failure of the south tower, the police department failed to inform the fire fighters inside the north tower that they knew that structure was also about to collapse (Baker, 2002).

Another aspect of coping is rooting out underlying problems that exist in the system. James Reason has called these problems ‘latent pathogens’ (Reason, 1990). While the origins of these latent pathogens are well known, less is known about the processes that lead to their removal. In a paper by the author, I have speculated about the forces that lead to pathogens being removed (Westrum, 2003). I have suggested that organizations that create alignment, awareness, and empowerment among the work force are likely to be better at removing latent pathogens, and are therefore likely to have a lighter ‘pathogen load.’ Organizations with fewer underlying problems will be better in coping because their defences are less likely to be compromised.

It is also worth noting that medical mix-ups such as the drug error described by Patterson et al. (2004a) would probably be improved by having a ‘culture of conscious inquiry,’ where all involved recognize the potentialities of a dangerous situation, and compensate for the dangers by added vigilance and additional checks (Westrum, 1991).

Repairing after Catastrophe

Resilience often seems to mean the ability to put things back together once they have fallen apart. The ability to rebound after receiving a surprise attack in war is a good measure of such resilience. When the Yom Kippur War started on the night of 6 October 1973, the Israelis had expected attack, but were still unprepared for the ferocity of the joint Syrian and Egyptian assault that took place. The Israeli Air Force, used to unopposed dominance of the skies, suddenly found itself threatened by hundreds of Arab surface-to-air missiles, large and small. The war would reveal that the Israel Air Force, often viewed as the best in the world, had three problems that had not been addressed: 1) it was poor at communicating with the ground forces, 2) it had inadequate battlefield intelligence and 3) it had not properly addressed the issue of ground support. Ehud Yonay comments that ‘So elemental were these crises that, to pull its share in the war, the IAF not only had to switch priorities but literally reinvent itself in mid-fighting to perform missions it had ignored if not resisted for years’ (Yonay, 1993, p. 345). Fortunately for the Israelis, they were able to do just that. Initiatives for ground support that had stalled were suddenly put into place, the Air Force was able to successfully coordinate its actions with the Israeli ground forces. Israel was able to repel the attacks against it. The ability to turn crisis into victory is seldom so clearly shown.

Recovery after natural disasters is often a function of the scale of the damage and the frequency of the type of disaster. Florida regularly experiences hurricanes, so there are routines used for coping. On the other hand, the Tsunami that recently engaged South East Asia was on such a colossal scale that even assistance from other countries came slowly and unevenly.

Recovery is easier for a society or an organization if the decision-making centers do not themselves come under attack. The 9/11 attack, on the World Trade Center, on the other hand, removed a local decision-making center in the United States. The attack would have done more damage if the attacks on the Pentagon and the White House had been more successful.

Conclusion

I have found that putting together this typology has helped me see some of the differences in the variety of situations and capabilities that we lump together under the label ‘resilience.’ No question that resilience is important. But what is it? Resilience is a family of related ideas, not a single thing. The various situations that we have sketched offer different levels of challenge, and may well be met by different organizational mechanisms. A resilient organization under Situation I will not necessarily be resilient under Situation III. Similarly, because an organization is good at recovery, this does not mean that the organization is good at foresight. These capabilities all deserve systematic study. Presumably, previous disaster studies have given coping and recovery much study. I regret that time did not permit me to consult this literature.

Organizational resilience obviously is supported by internal processes. I have been forced here to concentrate on the organization’s behaviour and its outcomes. But many of the other papers in this volume (e.g., Chapter 14) look into the internal processes that allow resilience. There is little doubt that we need to know these processes better. In the past, the safety field as I know it has concentrated on weaknesses. Now perhaps we need to concentrate on strengths.

Acknowledgement

Many of the ideas for this chapter were suggested by other participants in the conference, and especially by John Wreathall.

Resilient Systems

Yushi Fujita

Engineers endow artifacts with abilities to cope with expected anomalies. The abilities may make the system robust. They are, however, a designed feature, which by definition cannot make the system ‘resilient.’ Humans at the front end (e.g., operators, maintenance people) are inherently adaptive and proactive; that allows them to accomplish better performances and, sometimes, even allows them to exhibit astonishing abilities in unexpected anomalies. However, this admirable human characteristic is a double-edged sword. Normally it works well, but sometimes it may lead to a disastrous end. Hence, a system relying on such human characteristics in an uncontrolled manner should not be called ‘resilient.’ A system should only be called ‘resilient’ when it is tuned in such a way that it can utilize its potential abilities, whether engineered features or acquired adaptive abilities, to the utmost extent and in a controlled manner, both in expected and unexpected situations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset