Chapter 7

Resilience Must Be Managed: a Proposal for a Safety Management Process that Includes a Resilience Approach

Akinori Komatsubara

The irregularity of socio-technical systems caused by various threats can lead to serious disturbance or accidents in our society. To avoid the disturbances, it is necessary to eliminate and diminish the threats or to establish barriers against the threats. That is the traditional safety approach. However, because these traditional safety approaches may not be sufficient to give stability to the socio-technical systems, a resilience approach may be needed. But in some cases that have been studied, and are described in this chapter, resilience has had no effects or resilience has caused specific accidents through functional resonance. Based on the case studies, a safety management process including a resilience approach is discussed and proposed as the conclusion.

Introduction

Our modern society comprises various socio-technical systems. Socio-technical systems may vary in size and may vary in longevity – from temporary to permanent. When some irregularity happens in these systems, our society is significantly affected and disrupted.

Consider when a delay occurs in the schedule of a local train. We may not catch the flight that we had planned to take, and we may lose good business opportunities; that may have serious consequences for us. Our modern society is, after all, a fragile thing, like a glass sculpture made by various socio-technical systems. Therefore, specific management is needed to maintain the stability of the systems.

Factors that disturb the stability of the system are threats. All socio-technical systems are constantly faced with various threats. They can hardly escape from them. The kinds of threats can be classified into five categories:

a.  Natural threats: these are natural disasters such as typhoons, earthquakes and heavy snowfall. Small animals or insects can also make serious threats, as seen in bird strikes on airplanes. Viruses and other infective agents can be a natural threat.

b.  Social threats: these are pranks and malicious acts. Terrorism is the worst one. Children who put stones mischievously on railroad tracks are examples of social threats to railways. Recently, cyber-attacks have also become a serious threat.

c.  Technical threats: these include equipment failure. It is well known that equipment failure is especially common when new technology is introduced. The troubles that occurred on B787 aircrafts in 2013 are one example. Old equipment often poses a threat as well, because however robust the equipment that was developed, it is impossible to escape from deterioration.

d.  Service target threats: these are threats that arise when the demand that the socio-technical system serves outstrips supply. Large influxes of passengers on railways or patients at medical institutions can pose threats to the overall provision of reliable services of the socio-technical system.

e.  Human threats: these are the so-called human errors and violations of the staffs. The decline in the level of the safety culture will accelerate the occurrence of human threats.

Specific countermeasures must be taken to prevent instability or accidents caused by these threats.

Eliminating or Diminishing Threats

Though this list is not necessarily complete, technical threats, human threats and service target threats are manageable. For technical threats, technical risk assessment and reliability engineering will be helpful. For service target threats, demand control will be effective. Medical triage for an emergency room and air traffic management (ATM) for flow control are the examples. For human threats, a traditional human error prevention approach is effective.

Establishing Barriers Against Threats

It is almost impossible to eliminate natural and social threats. Disaster prevention is therefore needed for natural threats. Sanitary management is as well. Security measures are needed for social threats. A firewall against cyber-attacks is one example.

These countermeasures are the traditional safety approach, and they are vital in keeping socio-technical systems steady and robust against threats. However, it may be impossible to escape every disturbance caused by these threats with the traditional safety approach. We can point out the following reasons.

•  There are many occasions at the sharp-end where various threats are always emerging and disappearing without repeatable combination. An emergency room is a typical example. At such an occasion, because of technical, economical and social reasons, it would be impossible to make ideal and precise countermeasures for every threat actually.

•  Unknown threats may exist that are beyond our imagination and anticipation.

Because of the above reasons, another countermeasure is therefore needed; we must take flexible countermeasures against the threats that emerge. That is the Resilience Approach.

There are two types of resilience approach. One is technical resilience and the other is human factors resilience. The typical example of technical resilience is seismically isolated structure for buildings. In this chapter, however, we focus on the human factors resilience. This is one aspect of human factors for safety; the other is traditional human error prevention. As Hollnagel (2012a) points Safety-I and Safety-II idea, the former will correspond to Safety-II and the latter traditional approach will correspond to Safety-I.

Resilience Cannot Prevent Accidents or Cause Accidents

Though we expect safety to result from resilient behaviour of people involved, there may be some cases in which resilience has no effect in preventing accidents. Moreover, resilience may actually cause some accidents. The following are potential cases that should be considered.

Poor resilience Capability

If the resilience capabilities of those who are expected to perform resilience behaviour at the sharp-end are poor, safety cannot be achieved. They cannot catch up with and solve the situation. Komatsubara, (2008a, 2011) points out four components required of an individual who is expected to perform with resilience.

1.  Technical skills: it is obvious when we remember the Miracle of Hudson River in 2009. Without excellent technical skill to operate the aircraft that lost all power, the captain could not land in the Hudson River. If the technical skill of people involved is poor, slight disturbance of the system may easily impose an opportunistic or scrambled control mode of behaviour in COCOM (Contextual Control Model; Hollnagel, 1993), and it may easily lead to unwanted results. Therefore good technical skills including professional knowledge are the primary premise of resilience.

2.  Non-technical skills: NTSs are indispensable for resilience. For example, to behave resiliently, we must anticipate and monitor the emergence of threats. Skills such as situational awareness and monitoring are, then, indispensable. Communication skills are also necessary to obtain good information for good responding.

3.  Mental and physical health: mental and physical health are the basis for positive behaviour. Imagine when we have a cold. In that case, we cannot make good decisions. We may not have a positive attitude to make good resilience, either. Fatigue and healthcare management is therefore needed.

4.  Attitude: vocational responsibility, social ethics, and a courageous mind are indispensable. In the Costa Concordia accident of the Italian coast in 2012, it is reported that the captain, who must behave resiliently to save passengers, escaped from the ship first. We cannot help thinking that his vocational sense of responsibility is insufficient.

Those who occupy a vocational position to achieve resilience should make self-efforts to enhance their resilience capability through these four components. At the same time, organizations must clarify the kinds and contents of capabilities that are needed for the staff members and support their development to enhance their resilience capability.

Lack of Resilience Resources

It is easy to understand that resilient safety cannot be performed without appropriate and sufficient resources.

In the Fukushima Nuclear Power Plant (NPP) accident in Japan in 2011, staff members of the NPP tried to perform resiliently, but resources were insufficient. They tried to recover the power supply of control panels, but emergency measures for the power supply had not been prepared. They therefore had to hastily gather car-batteries from cars parked in the plant yard and nearby car shops. This can be called a resilience behaviour, but this resilience to supply power could have been done more smoothly and effectively if emergency batteries as the resource had been prepared.

Organizations must anticipate and prepare the resources needed for the resilience by the staff members. Resources mean not only hardware such as tools and facilities, but also soft factors such as financial resources. The time allowed for resilience may also be important on some occasions.

Missing Philosophy of Resilience

All organizations have several purposes that they must achieve at the same time. The QCDS model of quality, cost, delivery and safety is a typical example. However, usually it is almost impossible to satisfy all the purpose at the same time. As the ETTO principle (Efficiency-Thoroughness Trade-off; Hollnagel, 2009) says, efficiency, that relates with cost and delivery, and thoroughness, that relates with quality and safety, often oppose each other. In that case, people tend to take efficiency rather than thoroughness. Moreover especially when an organization promotes cost-reduction, the people involved would tend to behave resiliently to achieve efficiency. The JCO criticality accident in Japan in 1999 is one example (Komatsubara, 2006). JCO was a small nuclear fuel processing company. Under a strong cost-reduction campaign of the company, the workers conceived the idea of producing liquid uranium fuel very efficiently, and worked resiliently to change the authorized production manual to pursue efficiency. A criticality accident then occurred, killing two workers. In this accident, resilient behaviour served not for safety but for efficiency. This means that if we regard resilience as an element of safety measures, we must establish and share the philosophy of safety before initiating resilient behaviours. Without a strong awareness of safety supported by good safety culture of the organization, resilience may start to drift to accidents.

Possibility of Functional Resonance Type Accidents

In situations where several people are involved, functional resonance type accidents (Hollnagel, 2012b) may occur when the combination of each resilient behaviour is inappropriate. The following are some such cases.

Case1) When people involved do not have the same context

This is the case which I encountered. This case is a private one, but if an accident had occurred, large disturbance might have been given to the road traffic as a socio-technical system.

The situation is shown in Figure 7.1. I was driving a car. In Japan, as in the UK, the driving lane is on the left. I intended to turn to the right at crossing A, and turned on my right turn signal and slowed down. At the same time, a public bus was coming from the opposite side, and the driver turned on the right turn signal, too. I understood that this bus would turn to the right at the same crossing A, and I started to turn to the right. This bus continued to come straight on, however, without slowing down, and we very nearly collided. I understood later that this bus had intended to turn right up ahead, at crossing B.

Image

Figure 7.1    The near-miss situation that I encountered

The traffic laws of Japan determine that drivers who wish to turn right or left should switch on the turn signal 30 metres ahead of the crossing. The behaviour of the bus driver was therefore appropriate legally. Probably, as I slowed down before the crossing A, he might have imagined that I understood his intention and would stop until his bus passed through, and he therefore continued to come straight on. I thought that this bus would turn to the right at the very crossing A, however, because this bus switched on the right turn signal before crossing A. Therefore, my behaviour of starting to turn to the right before the bus passed through the crossing was very natural. The FRAM analysis of this case is shown in Figure 7.2.

Image

Figure 7.2    The cars’ near-miss case expressed by FRAM

In this case, the bus driver and I behaved resiliently based on the different understanding we each had of our counterpart’s intention; the combination of our preconditions was inappropriate. That is, doing resilient behaviours based on contexts or preconditions that differ from one another may lead to functional resonance accidents.

Case2) When the people involved behave under different controls

The near-miss incident of flights JAL907 and JAL958 in Japan in 2001 is a typical example (JTSB, 2002). In this incident, two aircrafts were coming from opposite directions at the same altitude. To avoid collision, one aircraft obeyed Traffic alert and Collision Avoidance System (TCAS) instruction and the other obeyed air traffic control (ATC) advice, and when both descended, a near-miss occurred. Figure 7.3 illustrates the simplified situation expressed with FRAM.

In this case, both captains had the same understanding that a collision would occur unless they changed their altitude or the course. However, each control was different even though both that had been based in resilient behaviour to avoid the collision.

Image

Figure 7.3    The aircrafts’ near-miss case expressed by FRAM

A functional resonance accident may occur when several people involved have different preconditions, different controls or inappropriate timing, when performing resilience behaviour. To avoid this type of accident, then, we must consider how they can have same precondition, same control and appropriate timing to act with resilience.

No Blame Culture is Needed for Resilience

Based on Komatsubara (2008b), the position of those who perform resilience can be classified into a 2 x 2 matrix.

One aspect is the position of resilience; the resilience is a vocational or civil one.

Vocational resilience is expected of the people who serve the public as their job. Medical doctors, airline pilots and fire fighters are examples of such jobs. These professionals are expected to have resilience naturally and essentially.

Civil resilience is expected of the people who accidentally encounter the situation that needs resilience. Hollnagel (2011) points out the four essential factors for resilience of responding, monitoring, anticipating and learning. Strictly speaking, therefore, civil resilience may not be the resilience in the sense of resilience engineering because it usually lacks in the factors of anticipating and monitoring, but this chapter includes it to deepen the discussion. Consider when we take a seat next to an elderly person on a train, and suddenly the elderly person suffers a heart attack and loses consciousness. What should we do? We could take the attitude of a bystander, and escape from the situation, but we would most likely do something resiliently to save the elderly person.

The other aspect is that the results of the resilience done by those involved can or cannot contribute to the direct damage as injury or death of the people themselves. In other words, the resilience is only for others or for the people including him/herself. In the latter case, they could not take the attitude of a bystander because they would result in his/her injury or death if he/she does not act resiliently.

Table 7.1 shows examples of these positions of resilience.

Table 7.1    Classification of the position of Resilience Behaviour

Direct damage is imposed if they take the stance of a bystander?

Yes; Direct damage is imposed on him/her if he/she does not conduct resilience.

No; Direct damage is not imposed if he/she does not conduct resilience. The resilience is basically for others’ happiness.

Position of resilience

Vocational

Airline pilot

Medical doctor

Civil

Passengers who are taking a bus when the bus driver suddenly loses consciousness. If the passengers do not do something, they themselves may be injured or killed.

People who have just encountered a situation that calls for someone’s resilient help, but they can take the attitude of a bystander.

Especially when they serve others only, a blameless culture is needed with both vocational and civil resilience. Resilient behaviour does not always lead to a good result. Slight human errors may be inevitable in resilience. Moreover, in civil resilience, they may have little technical skills to incorporate resilience. In those cases, however, the people involved do their best to incorporate resilience behaviour at the very situation at the very moment, unwanted results may occur. In this situation, if any blaming voice with hindsight arises, they, or we, may lose the motivation to incorporate resilience. This is because they, or we, do not want to take responsibility for the unwanted result. Therefore, a blameless culture based on the idea of the Good Samaritan law, based on Chapter 10, paragraphs 29–37 of the Gospel of Luke is required.

Resilience Must Be Managed

Let’s consider soccer games. Players play resiliently to win the goal. Victory depends on their efforts. In addition, we must notice that the head coaches have important role for the victory as well. Of course, they must respect players’ decision and ensure flexibility for them because threats essentially occur at very sharp-end. They must, however, behave resiliently to be good at keeping the players in the game in hand. Moreover we must also notice that the coaches have already started their role prior to the game. They must investigate the opponent to make a strategy of the game, and develop training of each player to perform their best in the game. They may pay attention to the equipment like players’ spiked sports shoes, too. It is no exaggeration to say that the performance of players in the game depends on the prior management by the coach.

It is completely the same in safety, too.

When the sharp-end is fighting resiliently with threats, the blunt-end must control the sharp-end resiliently, for example, to avoid functional resonance accidents. However it is also very important for the blunt-end to make a steady effort so as to let the sharp-end win the threats. This must be conducted before the sharp-end actually starts resilient behaviour.

After all, the blunt-end organizations must develop some daily management and preparation for resilience of the sharp-end.

How Should We Manage?

Based on the discussion before, we should start the management from a traditional safety approach. At first, we must understand that resilience approach does not exist independently.

First of all, as far as possible, we must anticipate and identify threats that the sharp-end may encounter, and we must try to eliminate or diminish the threat or we must create barriers against that threat.

By doing so, we can avoid entangling sharp-end in a meaningless resilience. Consider medication dose error, for example. If we will not give the right medicine to the right patient, doctors must act resiliently to cure the patient for the medication dose error. The resilience, however, can be essentially avoided if the right dosing would be performed; that would be accomplished with traditional human factors approach.

However, we may not be able to establish the countermeasures that are enough to overcome the threats. Human errors would happen because ‘to err is human’. In that case, resilience is needed. Moreover resilience is needed against the threats which are beyond or out of our forecasting and imagination. As for the threats which are out of imagination, it is impossible to anticipate, of course. But it is possible to determine the unwanted events for the system. For example, we know how serious the loss of all power supply of NPPs would be. Therefore, we could prepare resilience against occurrences of all power supply loss as one of the unwanted events.

After that, we should enhance the resilience capabilities of individuals at the sharp-end. To do so, we must identify the kinds of resilience capabilities that are needed for the individual, considering the kinds of expected threats and unwanted events. We must also monitor whether the sharp-end has satisfied the identified capability or not. Then, we should make and develop training programs. In addition, in some cases, we must prepare some resources needed for the resilience activity. Moreover, when several people are involved in the situation, some management shall be needed to avoid functional resonance accidents caused by mismatching of each resilient behaviour. As for the car near-miss case that I encountered, if a signboard that just says ‘Beware of oncoming vehicles!’ had been set up, it might have enough effect to coordinate my precondition to turn to the right. At the near-miss case of two aircrafts, it is now regulated that pilots should only respond to TCAS resolution advisory (RA). This regulation helps pilots avoid having different controls when TCAS issues.

No matter how carefully we prepare, however, some situations that are caused by unexpected threats beyond forecasting may occur. In that case, prepared countermeasures may not be effective, but we expect that the people involved will act resiliently to settle and to recover from the situation. But this might bring unwanted results. Even if the results are not favourable, we should not blame it from hindsight. This means that blunt-ends must cultivate society to understand no-blame culture.

We must understand that all activities above should have been performed and well prepared by the blunt-end before the sharp-end encounters the very situation that needs resilience. Moreover, after the sharp-end has encountered the situation that required resilience, the blunt-end must review and evaluate that those prior activities were enough. Through the review, they must make corrections of the prior activity if needed. Those reviews will bring learning for the management, too.

To sum up, this chapter proposes the safety management system (SMS) model including robust and resilience as Figure 7.4. In this study, PDCA model of plan, do, check and act, is taken because PDCA model is very common in organizational management. If we take PDCA model, we can expect that Figure 7.4 process will be smoothly accepted by the organizational managers.

By promoting and repeating this PDCA cycle by the blunt-end, we can expect to establish the robust but resilient safety of the system.

Image

Figure 7.4    Safety Management System Model including Robust and Resilience Approach

Conclusion

In this chapter, a safety management model including the resilience approach has been proposed to keep the stability, and to increase safety of, the socio-technical systems. The resilience approach does not exist independently from the traditional safety approach including human factors. Moreover, the resilient safety approach may result in failure without prior and good management. If no prior management had been made (even if the sharp-end succeeded in achieving safety) of course we can learn from it for the next resilience but we must say it was just lucky. To increase the safety of socio-technical systems with the resilience approach, we need to understand the position of the resilience approach in the whole strategy of safety.

Without prior management, resilience would not win.

Commentary

This chapter puts forward the argument that we should be careful not to look at resilience – or resilience engineering – as a panacea. The resilience engineering perspective, and more specifically the Safety-II perspective, is not intended as and should not be used as a replacement for safety management. It is rather a complement to or a new but crucial facet of safety management. A resilience-based approach should not be relied on independently from a traditional safety approach – including a focus on human factors. To increase the safety of today’s intractable socio-technical systems it is necessary to understand the position of the resilience approach within the whole strategy of safety.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset