Chapter 2

Meeting IEC 61508 Part 1

Abstract

The maximum tolerable failure rate that we set, for each hazard, will lead us to an integrity target for each piece of equipment, depending on its relative contribution to the hazard in question. These integrity targets, as well as providing a numerical target to meet, are also expressed as “safety-integrity levels” according to the severity of the numerical target. This chapter addresses the need for setting integrity targets, ALARP concept, functional safety, establishing competency, and hierarchy of documents, and also steps involved in accomplishing all of these.

Keywords

ALARP; Competency; Disproportionality; Failure rates; Fatalities; Risk assessment; Safety-integrity level (SIL) targets
 
Part 1 of the Standard addresses the need for:
• Setting integrity (SIL) targets
• The ALARP concept (by inference)
• Capability to design, operate, and maintain for functional safety
• Establishing competency
• Hierarchy of documents
The following sections summarize the main requirements:

2.1. Establishing Integrity Targets

Assessing quantified integrity targets is an essential part of the design process (including retrospective safety studies). This leads to:
• A quantified target against which one predicts the rate of random hardware failures and establishes ALARP (as low as reasonably practicable).
• A SIL band for mandating the appropriate rigor of life cycle activities.
The following paragraphs describe how a SIL target is established.

2.1.1. The Quantitative Approach

(a). Maximum Tolerable Risk

In order to set a quantified safety integrity target, a target Maximum Tolerable Risk is needed. It is therefore useful to be aware of the following rates:
All accidents (per individual)5 × 104 pa
Natural disasters (per individual)2 × 106 pa
Accident in the home4 × 104 pa
Worst case Maximum Tolerable Risk in HSE R2P2 document103 pa
“Very low risk” as described in HSE R2P2 document (i.e., boundary between Tolerable and Broadly Acceptable)106 pa
“Individual Risk” is the frequency of fatality for a hypothetical person with respect to a specific hazard. This is different from “Societal Risk,” which takes account of multiple fatalities. Society has a greater aversion to multiple fatalities than single ones in that killing 10 people in a single incident is perceived as worse than 10 separate single fatalities.
Table 2.1 shows the limits of tolerability for “Individual Risk” and is based on a review of HSE's “Reducing risk, protecting people, 2001 (R2P2)” and HSG87. The former indicates a Maximum Tolerable Risk to an employee of 103 per annum for all risks combined. The actual risk of accidents at work per annum is well below this. Generally, guidance documents recommend a target of 104 per annum for all process related risks combined, leaving a margin to allow for other types of risk.
At the lower end of the risk scale, a Broadly Acceptable Risk is nearly always defined. This is the risk below which one would not, normally, seek further risk reduction. It is approximately two or three orders of magnitude less than the total of random risks to which one is exposed in everyday life.

Table 2.1

Target individual risks.

HSE R2P2Generally used for functional safety
Maximum Tolerable Individual Risk (per annum)
Employee103104
Public104105
Broadly Acceptable Risk (per annum)
Employee and public106106

image

It is important to note that the Individual Risk and the Societal Risk calculations are fundamentally different. Thus the starting points for Maximum Tolerable Risk, in the case of a single fatality, do not immediately coincide, which will be elaborated in Section 2.4.
Scenarios, such as sites, usually imply a risk to the same (more or less) groups of individuals (be it on-site or off-site) at any time. “Distributed” risks, for example, pipelines across wide areas, rail journeys, tunnels, with rapidly changing identities of individuals are the scenarios for which the involuntary risk approach becomes limited. An individual may be exposed for 2 min per annum (traveling through a tunnel) whereas, at any moment, there may be 100 people at risk. The Societal Risk approach (Section 2.4) is then more appropriate.
There is a body of opinion that multiple fatalities should also affect the choice of maximum tolerable Individual Risk. The targets in Table 2.2 reflect an attempt to take account of these concerns in a relatively simple way by adjusting the Individual Risk targets from Table 2.1.
More complex calculations for Societal Risk (involving F–N curves) are sometimes addressed by specialists as are adjustments for particularly vulnerable sections of the community (disabled, children etc.).

Table 2.2

Target fatality risks.

1–2 fatalities3–5 fatalities6 or more fatalities
Maximum Tolerable Individual Risk (per annum)
Employee (Voluntary)1043 × 105105
Public (Involuntary)1053 × 106106
Broadly Acceptable Risk (per annum)
Employee and public1063 × 107107

image

The location, that is, site or part of a site, for which a risk is being addressed, may be exposed to multiple potential sources of risk. The question arises as to how many potential separate hazards an individual (or group) in any one place and time is exposed to. Therefore, in the event of exposure to several hazards at one time, one should seek to allow for this by specifying a more stringent target for each hazard. For example, a study addressing a multirisk installation might need to take account of an order of magnitude of sources of risk. On the other hand, an assessment of a simple district pressure regulator valve for the local distribution of natural gas implies a limited number of sources of risk (perhaps only one).
A typical assessment confined to employees on a site might use the recommended 104 pa Maximum Tolerable Risk (for 1–2 fatalities) but may address 10 sources of risk to an individual in a particular place. Thus, an average of 105 pa would be used as the Maximum Tolerable Risk across the 10 hazards and, therefore, for each of the 10 safety functions involved. By the same token, the Broadly Acceptable Risk would be factored from 106 pa to 107 pa.
The question arises of how long an individual is exposed to a risk. Earlier practice has been to factor the maximum tolerable failure rate by the proportion of time it offers the risk (for example, an enclosure which is only visited 2 hrs per week). However, that approach would only be valid if persons (on-site) suffered no other risk outside that 2 hrs of his/her week. In case of off-site, the argument might be different in that persons may well only be at risk for a proportion of the time. Thus, for on-site personnel, the proportion of employee exposure time should be taken as the total working proportion of the week.
Despite the widely published figures for Maximum Tolerable Risk (e.g., Table 2.2), the UK HSE sometimes press for a Maximum Tolerable Risk to be targeted at a lower level nearer to the Broadly Acceptable level (e.g., an order of magnitude). This, however, is a controversial area. In the authors' opinion, whatever may the starting point be, the ALARP calculation will, in any case, cause the risk to be reduced to an appropriate level.
Table 2.3 caters for the lesser consequence of injury. Targets are set in the same manner and integrity assessment is carried out as for fatality. In general, rates an order of magnitude larger are used for the targets.

Table 2.3

Target individual risks for injury.

Maximum Tolerable Risk (per annum)
Employee (Voluntary)103
Public (Involuntary)104
Broadly Acceptable Risk (per annum)
Employee and public105

image

In any event, the final choice of Maximum Tolerable Risk (in any scenario) forms part of the “safety argument” put forward by a system user. There are no absolute rules but the foregoing provides an overview of current practice.

(b). Maximum tolerable failure rate

This involves factoring the Maximum Tolerable Risk according to totally external levels of protection and to factors which limit the propagation to fatality of the event. Table 2.4 gives examples of the elements which might be considered. These are not necessarily limited to the items described below and the analyst(s) must be open ended in identifying and assessing the factors involved.
The maximum tolerable failure rate is then targeted by taking the Maximum Tolerable Risk and factoring it according to the items assessed. Thus, for the examples given in Table 2.4 (assuming a 105 pa involuntary risk):

MaximumTolerableFailureRate=105pa/(0.6×0.2×0.7×0.25×0.9×0.25)=2.1×103pa

image

Table 2.4

Factors leading to the maximum tolerable failure rate.

Factor involving the propagation of the incident or describing an independent level of protectionProbability (example)This column is used to record arguments, justifications, references etc. to support the probability used
The profile of time at risk60%Quantifying whether the scenario can develop. This may be <100% as for example if:

• flow, temp, pressure etc. profiles are only sufficient at specific times, for the risk to apply.

• the process is only in use for specific periods.

Unavailability of separate mitigation fails (i.e., another level of protection)20%Mitigation outside the scope of this study and not included in the subsequent modeling which assesses whether the system meets the risk target. Examples are:

• α down stream temp, pressure etc. measurement leading to manual intervention.

• a physical item of protection (for example, vessel; bund) not included in the study.

Probability of the scenario developing70%Examples are:

• the vessel/line will succumb to the over-temp, over pressure etc.

• the release has an impact on the passing vehicle.

Person(s) exposed (i.e., being at risk)25%Proportion of time during which some person or persons are close enough to be at risk should the event propagate. Since a person may be exposed to a range of risks during the working week, this factor should not be erroneously reduced to the proportion of time exposed to the risk in question. If that were repeated across the spectrum of risks then each would be assigned an artificially optimistic target. The working week is approximately 25% of the time and thus that is the factor which would be anticipated for an on-site risk. In the same way, an off-site risk may only apply to a given individual for a short time.
Probability of subsequent ignition90%Quantifying whether the released material ignites/explodes.
Fatality ensues25%The likelihood that the event, having developed, actually leads to fatality.

image

Example

A gas release (e.g., a natural gas holder overfill) is judged to be a scenario leading to a single on-site fatality and three off-site fatalities. Both on and off site, person(s) are believed to be exposed to that one risk from the installation.
On site
Proportion of time system can offer the risk75%40 weeks pa
Probability of ignition5%Judgment
Person at risk25%Working week i.e., 42 hrs/168 hrs
Probability of fatality75%Judgment
From Table 2.2, the Maximum Tolerable Risk is 104 pa. Thus, the maximum tolerable failure rate (leading to the event) is calculated as:

104pa/(0.75×0.05×0.25×0.75)=1.4×102pa

image

Off site
Proportion of time system can offer the risk75%40 weeks pa
Probability of ignition5%Judgment
Person(s) at risk33%Commercial premises adjoin
Probability of three fatalities10%Offices well protected by embankments
From Table 2.2 the Maximum Tolerable Risk is 3 × 106 pa. Thus the maximum tolerable failure rate (leading to the event) is calculated as:

3×106pa/(0.75×0.05×0.33×0.1)=2.4×103pa

image

Thus, 2.4 × 103 pa, being the more stringent of the two, is taken as the maximum tolerable failure rate target.

(c). Safety integrity levels (SILs)

Notice that only now is the SIL concept introduced. The foregoing is about risk targeting but the practice of jumping immediately to a SIL target is a dangerous approach.
Furthermore, it is necessary to understand why there is any need for a SIL concept when we have numerical risk targets against which to assess the design. If the assessment were to involve only traditional reliability prediction, wherein the predicted hardware reliability is compared with a target, there would be no need for the concept of discrete SILs. However, because the rigor of adherence to design/quality assurance activities cannot be quantified, a number of discrete levels of “rigor,” which cover the credible range of integrity, are described. The practice is to divide the spectrum of integrity targets into four levels (see Chapter 1).
Consider the following examples:

Simple example (low demand)

As a simple example of selecting an appropriate SIL, assume that the maximum tolerable frequency for an involuntary risk scenario (e.g., customer killed by explosion) is 105 pa (A) (see Table 2.1). Assume that 102 (B) of the hazardous events in question lead to fatality. Thus the maximum tolerable failure rate for the hazardous event will be C = A/B = 103 pa. Assume that a fault tree analysis predicts that the unprotected process is only likely to achieve a failure rate of 2 × 101 pa (D) (i.e., 1/5 years). The FAILURE ON DEMAND of the safety system would need to be E = C/D = 103/2 × 101 = 5 × 103. Consulting the right-hand column of Table 1.1, SIL 2 is applicable.
This is an example of a low-demand safety-related system in that it is only called upon to operate at a frequency determined by the frequency of failure of the equipment under control (EUC)—in this case 2 × 101 pa. Note, also, that the target “E” in the above paragraph is dimensionless by virtue of dividing a rate by a rate. Again, this is consistent with the right-hand column of Table 1.1 in Chapter 1.

Simple example (high demand)

Now consider an example where a failure in a domestic appliance leads to overheating and subsequent fire. Assume, again, that the target risk of fatality is said to be 105 pa. Assume that a study suggests that 1 in 400 incidents leads to fatality.
It follows that the target maximum tolerable failure rate for the hazardous event can be calculated as 105 × 400 = 4 × 103 pa (i.e., 1/250 years). This is 4.6 × 107 per hour when expressed in units of “per hour” for the purpose of Table 1.1.
Consulting the middle column of Table 1.1, SIL 2 is applicable. This is an example of a high-demand safety-related system in that it is “at risk” continuously. Note, also, that the target in the above paragraph has the dimension of rate by virtue of multiplying a rate by a dimensionless number. Again, this is consistent with the middle column of Table 1.1.
It is worth noting that for a low-demand system the Standard, in general, is being applied to an “add-on” safety system which is separate from the normal control of the EUC (i.e., plant). On the other hand for a continuous system the Standard, in general, is being applied to the actual control element because its failure will lead directly to the potential hazard even though the control element may require additional features to meet the required integrity. Remember (as mentioned in Chapter 1) that a safety-related system with a demand rate of greater than one per annum should be treated as “high demand.”

More complex example

In the fault tree (Figure 2.1), gate G1 describes the causes of some hazardous event. It would be quantified using the rate parameter. Dividing the target maximum tolerable failure rate associated with the top gate (GTOP) by the rate for gate G1 provides a target PFD (probability of failure on demand) for the protection.
image
Figure 2.1 Fault tree.
Independent levels of protection are then modeled as shown by gates G21 and G22 in Figure 2.1. It is important to remember that the use of an AND gate (e.g., gate G2) implies that the events below that gate are totally independent of each other.
A greater number of levels of protection (i.e., gates below G2) lead to larger PFDs being allocated for each and, thus, lower integrity requirements will apply to each.
A maximum tolerable failure rate of 5.3 × 104 pa is taken as an example. Assume that the frequency of causes (i.e., gate G1) is 101 pa. Thus the target PFD associated with gate G2 becomes:

5.3×104pa/101pa=5.3×103

image

(Note that the result is dimensionally correct, i.e., a rate/rate becomes a PFD.)
A common mistake is to describe the scenario as “a SIL 2 safety system.” This would ONLY be the case if the mitigation were to be a single element and not decomposed into separate independent layers.
In Figure 2.1 there are two levels of protection for which the product of the two PFDs needs to be less than 5.3 × 103.
Depending on the equipment in question this could involve a number of possibilities. Examples are shown in Table 2.5, which assume independent levels of protection.

Table 2.5

Possible SIL outcomes.

Level 1 PFDLevel 1 SILLevel 2 PFDLevel 2 SIL
Option2 × 101<12.65 × 1021
Option7.3 × 10217.3 × 1021
Option7 × 101<17.57 × 1032

image

As can be seen, the SIL is inferred only once the PFD associated with each level of protection has been assigned/assessed.

(d). Exercises

Now try the following exercises (answers in Appendix 5), which involve establishing SIL targets:
Exercise 1:
Assume a Maximum Tolerable Risk target of 105 pa (public fatality).
Assume one in two incidents leads to an explosion.
Assume one in five explosions leads to a fatality.
Assume that a fault tree indicates that the process will suffer a failure rate of 0.05 pa.
It is proposed to implement an add-on safety system involving instrumentation and shut-down measures.
Which type of SIL (high/low) is indicated and why?
What is the target and what SIL is inferred?
Exercise 2:
2.1
Assume a Maximum Tolerable Risk fatality target of 105 pa.
Assume that there are nine other similar toxic spill hazards to be assessed from the plant which will threaten the same group of people at the same time.
Assume that toxic spillage causes fatality 1 in 10 times.
Assume that a fault tree indicates that each of the processes will suffer an incident once in 50 years.
It is proposed to implement an add-on safety system with instrumentation and shut-down measures.
Which type of SIL is indicated and why?
What is the target and what SIL is inferred?
2.2
If additional fire fighting equipment were made available, to reduce the likelihood of a fatality from 1 in 10 to 1 in 30, what effect, if any, will be there on the target SIL?
Exercises 1 and 2 involved the low-demand table in which the risk criteria were expressed as a PFD. Now try Exercise 3.
Exercise 3:
Target Maximum Tolerable Risk = 105 pa.
Assume that 1 in 200 failures, whereby an interruptible gas meter spuriously closes and then opens, leads to fatality.
Which type of SIL is indicated and why?
What is the target and what SIL is inferred?
A point worth pondering is that when a high-demand SR system fails, continued use is usually impossible, whereas, for the low demand system, limited operation may still be feasible after the risk reduction system has failed, albeit with additional care.

2.1.2. Layer of Protection Analysis

A methodology, specifically mentioned in Part 3 of IEC 61511 (Annex F), is known as Layer of Protection Analysis (LOPA). LOPA provides a structured risk analysis that can follow on from a qualitative technique such as HAZOP.
ESC's SILComp(R) software provides a step-by-step interactive guide through the LOPA process and allows importing from HAZOP worksheets. The use of software packages such as SILComp® can help reduce project man-hours, ensure consistency in approach and methodology, and give confidence that the workshop is performed in accordance with IEC 61508/61511.
In general, formalized LOPA procedures tend to use order of magnitude estimates and are thus referred to as so-called semi-quantitative methods. Also, they are tailored to low demand safety functions.
Nevertheless, many practitioners, despite using the term LOPA, actually carry out the analysis to a refinement level such as we have described in Section 2.1.1. This is commonly referred to as a quantitative approach.
LOPA estimates the probability/frequency of the undesired consequence of failure by multiplying the frequency of initiating events by the product of the probabilities of failure for the applicable protection layers. The severity of the consequences and the likelihood of occurrence are then assigned a probability (often by reference to a standard table usually specified in the user's procedure).
The result is called a “mitigated consequence frequency” and is often compared to a company's tolerable risk criteria (e.g., personnel, environment, asset loss). As a result any requirement for additional risk reduction required is identified. The output of the LOPA is the target PFD for the safety instrumented function.
For the LOPA to be valid there must be independence between initiating events and layers of protection and between the layers of protection. Where there are common causes either a dependent layer should not be credited at all or reduced credit (higher PFD) used.
It should also be noted that the Maximum Tolerable Risk frequencies used are usually for ALL hazards. Thus where personnel are exposed to multiple simultaneous hazards, the Maximum Tolerable Risk frequency needs to be divided by the number of hazards.
The input information required for a LOPA includes:
• Process plant and equipment design specifications
• Impact event descriptions and consequence of failure (assuming no protection)
• Severity level category (defined in the company's procedure)
• All potential demands (i.e., initiating causes) on the function; and corresponding initiation likelihood
• Vulnerability (e.g., probability of a leakage leading to ignition)
• Description of the safety instrumented protection function (i.e., layer of protection)
• Independent protection layers (e.g., mechanical devices, physical bunds).
LOPA worksheets are then prepared as shown in the example given in Section 13.6 of Chapter 13 and are not unlike Table 2.4 and its associated examples. Elements in the worksheet include:
Consequence: describes the consequence of the hazard corresponding to the descriptions given in the user's procedure.
Maximum Tolerable Risk (/year): as specified in the user's procedure.
Initiating Cause: Lists the identified causes of the hazard.
Initiating Likelihood (/year): quantifies the expected rate of occurrence of the initiating cause. This rate is based on the experience of the attendees and any historical information available.
Vulnerability: this represents the probability of being affected by the hazard once it has been initiated.
Independent protection layers (IPLs): the level of protection provided by each IPL is quantified by the probability that it will fail to perform its function on demand. The smaller the value of the PFD, the larger the risk reduction factor that is applied to the calculated initiating likelihood, hence where no IPL is claimed, a “1” is inserted into the LOPA worksheet.
The outputs from a LOPA include:
• Intermediate event likelihoods (assuming no additional instrumented protection);
• Additional protection instrumentation requirements (if any);
• The mitigated event likelihood.
One of the authors (DJS) has some reservations about the LOPA approach, particularly when used by nonexperts. These can be summarized as follows:
• Overlooking common cause failure (bridging two layers of protection).
• Estimation of the demand rate on a layer of protection (so-called “loop”) without full modeling of the combination of causative events.
• Failing to apportion risk targets between layers of protection as a result of addressing only one at a time.
• Lapsing into order of magnitude estimates (a failing common to the risk graph approach).
• LOPA can lead to repetition in order to address more than one safety function per hazard where a slavish “bottom-up” approach addresses each instrument in turn.
• Encourages semiskilled participation by virtue of appearing to be a simpler approach.

2.1.3. The Risk Graph Approach

In general the methods described in Sections 2.1.1 and 2.1.2 should be followed. However, the Standard acknowledges that a fully quantified approach to setting SIL targets is not always possible and that an alternative approach might sometimes be appropriate. This avoids quantifying the Maximum Tolerable Risk of fatality and uses semiquantitative judgments. Figure 2.2 gives an example of a risk graph.
image
Figure 2.2 Example risk graph.
The example shown is somewhat more complete than many in use. It has the additional granularity of offering three (rather than two) branches in some places and attempts to combine demand rate with exposure. Any such approach requires a detailed description of the decision points in the algorithm in order to establish some conformity of use. Table 2.6 shows a small part of that process.
 

Table 2.6

Key to Figure 2.1 (part of only).

icon
Risk graphs should only be used for general guidance in view of the wide risk ranges of the parameters in the tables. Successive cascading decisions involving only “order of magnitude” choices carry the potential for gross inaccuracy. Figure 2.2 improves on the granularity which simple risk graphs do not offer. Nevertheless this does not eliminate the problem.
The risk graph does not readily admit multiple levels of protection. This has been dealt within earlier sections. Furthermore, due to the nature of the rule-based algorithm, which culminates in the request for a demand rate, the risk graph is only applicable to low-demand SIL targets. It should only be used as a screening tool when addressing large numbers of safety functions. Then, any target of SIL 2 or greater should be subject to the quantified approach.

2.1.4. Safety Functions

IMPORTANT: It should be clear from the foregoing sections that SILs are ONLY appropriate to specifically defined safety functions. A safety function might consist of a flow transmitter, logic element, and a solenoid valve to protect against high flow. The flow transmitter, on its own, does not have a SIL and to suggest such is nearly meaningless. Its target SIL may vary from one application to another. The only way in which it can claim any SIL status in its own right is with respect to safe failure fraction and to the life-cycle activities during its design, and this will be dealt with in Chapters 3 and 4.

2.1.5. “Not Safety-Related”

It may well be the case that the SIL assessment indicates a probability or rate of failure less than that indicated for SIL 1. In this case the system may be described as “not safety-related” in the sense of the Standard. However, since the qualitative requirements of SIL 1 are little more than established engineering practice they should be regarded as a “good practice” target.
The following example shows how a piece of control equipment might be justified to be “not safety-related.” Assume that this programmable distributed control system (say a DCS for a process plant) causes various process shutdown functions to occur. In addition, let there be a hardwired emergency shutdown (presumably safety-related) system which can also independently bring about these shutdown conditions.
Assume that the target Maximum Tolerable Risk leads us to calculate that the failure rate for the DCS/ESD combined should be better than 103 pa. Assessment of the emergency shutdown system shows that it will fail with a PFD of 5 × 103. Thus, the target maximum tolerable failure rate of the DCS becomes 103 pa/5 × 103 = 2 × 101 pa. This being less onerous than the target for SIL 1, the target for the DCS is less than SIL 1. This is ambiguously referred to as “not safety-related.” An alternative term used in some guidance documents is “no special safety requirement.”
We would therefore say that the DCS is not safety-related. If, on the other hand, the target was only met by a combination of the DCS and ESD, then each might be safety-related with a SIL appropriate to its target PFD or failure rate. Paragraph 7.5.2.5 of Part 1 states that the EUC must be <SIL 1 or else it must be treated as safety-related.
For less than SIL 1 targets, the term SIL 0 (although not used in the Standard) is in common use and is considered appropriate.

2.1.6. SIL 4

There is a considerable body of opinion that SIL 4 safety functions should be avoided (as achieving it requires very significant levels of design effort and analysis) and that additional levels of risk reduction need to be introduced such that lower SIL targets are required for each element of the system.
In any case, a system with a SIL 4 target would imply a scenario with a high probability of the hazard leading to fatality and only one level of control (i.e., no separate mitigation). It is hard to imagine such a scenario as being acceptable.

2.1.7. Environment and Loss of Production

So far the implication has been that safety integrity is in respect of failures leading to death or injury. IEC 61508 (and some other guidance documents) also refers to severe environmental damage. Furthermore, although not directly relevant here, the SIL targeting approach can also be applied to loss of production. Figure 2.3 is one example of some suggested target criteria.
image
Figure 2.3 Environmental risk targets.

2.1.8. Malevolence and Misuse

Paragraph 7.4.2.3 of Part 1 of the Standard

The 2010 version of IEC 61508 draws attention to the need to address all foreseeable causes of a hazard. Thus human factors (already commonly addressed) should be extended to include vandalism, deliberate misuse, criminal interference, and so on. The frequency of such events can be assessed (anecdotally or from records) enabling them to be included in fault tree models.

2.2. “As Low as Reasonably Practicable”

Having established a SIL target it is insufficient merely to assess that the design will meet the Maximum Tolerable Risk target. It is necessary to establish whether further improvements are justified and thus the principle of ALARP (as low as reasonably practicable) is called for as “good practice.” In the UK this is also arguably necessary in order to meet safety legislation (“all that is reasonably practicable” is called for in the Health & Safety at Work Act 1974).
Figure 2.4 shows the so-called ALARP triangle which also makes use of the idea of a Maximum Tolerable Risk.
image
Figure 2.4 ALARP triangle.
In this context “acceptable” is generally taken to mean that we accept the probability of fatality as being reasonably low, having regard to the circumstances, and would not usually seek to expend more resources in reducing it further.
“Tolerable,” on the other hand, implies that whilst we are prepared to live with the particular risk level we would continue to review its causes and the defenses we might take with a view to reducing it further. Cost comes into the picture in that any potential reduction in risk would be compared with the cost needed to achieve it.
“Unacceptable” means that we would not normally tolerate that level of risk and would not participate in the activity in question nor permit others to operate a process that exhibited it except, perhaps, in exceptional circumstances.
The principle of ALARP describes the way in which risk is treated legally and by the HSE in the UK, and also applied in some other countries. The concept is that all reasonable measures will be taken in respect of risks which lie in the tolerable (ALARP) zone to reduce them further until the cost of further risk reduction is grossly disproportionate to the benefit.
It is at this point that the concept of “cost per life saved (CPL)” arises. Industries and organizations are reluctant to state-specific levels of “CPL” which they would regard as being disproportionate to a reduction in risk. However, criteria in the range £1,000,000 to £15,000,000 are not infrequently quoted.
The HSE recommend the use of a gross disproportionality factor. The CPL is multiplied by a gross disproportionality factor depending upon how close the predicted risk is to the target. For predicted risks, just approaching the Maximum Tolerable Risk, a factor of 10 is applied. This falls (on a logarithmic scale) as the risk moves towards the Broadly Acceptable region. This will be illustrated in the example which follows.
Perception of risk is certainly influenced by the circumstances. A far higher risk is tolerated from voluntary activities than from involuntary ones (people feel that they are more in control of the situation on roads than on a railway). This explains the use of different targets for employee (voluntary) and public (involuntary) in Tables 2.12.3.
A typical ALARP calculation might be as follows:
A £1,000,000 cost per life saved target is used in a particular industry.
A Maximum Tolerable Risk target of 104 pa has been set for a particular hazard which is likely to cause two fatalities.
The proposed system has been assessed and a predicted risk of 8 × 105 pa obtained. Given that the negligible risk is taken as 106 pa then the application of ALARP is required.
For a cost of £3,000, additional instrumentation and redundancy will reduce the risk to just above the negligible region (say 2 × 106 pa).
The plant life is 30 years.
The predicted risk of 8 × 105 pa leads to a gross disproportionality factor of 9.3 as shown in Figure 2.5 which, in practice, could be obtained using a spreadsheet for carrying out the log scale calculation mentioned above. The cost per life saved criteria thus becomes 9.3 × £1,000,000 = £9,300,000.
The “cost per life saved” in this example is given by the cost of the proposal divided by the number of lives saved over the plant life, as follows:

£3,000/[(8×1052×106)×2×30]=£640,000.

image

image
Figure 2.5 Calculation of GDF.
This being less than the £9,300,000 cost per life saved criterion (which has been adjusted by GDF) the proposal should be adopted. It should be noted that all the financial benefits of the proposed risk reduction measures should be included in the cost–benefit calculation (e.g., saving plant damage, loss of production, business interruption, etc.). Furthermore, following “good practice” is also important although not of itself sufficient to demonstrate ALARP. However, cost–benefit arguments should not be used to justify circumventing established good practice.
Exercise 4:
A £2,000,000 cost per life saved target is used in a particular industry.
A Maximum Tolerable Risk target of 105 pa has been set for a particular hazard which is likely to cause three fatalities. The Broadly Acceptable Risk is 106 pa.
The proposed system has been assessed and a predicted risk of 8 × 106 pa obtained.
How much could justifiably be spent on additional instrumentation and redundancy to reduce the risk from 8 × 106 pa to 2 × 106 pa (just above the negligible region)?
The plant life is 25 years.
(Note the GDF for this example would be calculated as 8.6)
ESC SILComp® and Technis LOPAPLUS software both calculate the justified cost of further risk reduction (as part of an ALARP demonstration) automatically, based on data form the LOPA study.

2.3. Functional Safety Management and Competence

2.3.1. Functional Safety Capability Assessment

In claiming conformance (irrespective of the target SIL) it is necessary to show that the management of the design, operations and maintenance activities, and the system implementation is itself appropriate and that there is adequate competence for carrying out each task.
This involves two basic types of assessment. The first is the assessment of management procedures (similar to but more rigorous than an ISO 9001 audit). Appendix 1 of this book provides a Functional Safety Capability template procedure which should be adequate as an addition to an ISO 9001 quality management system. The second is an assessment of the implementation of these procedures. Thus, the life-cycle activities described in Chapters 2, 3, and 4 would be audited, for one or more projects, to establish that the procedures are being put into practice.
Appendix 2 contains a checklist schedule to assist in the rigor of assessment, particularly for self assessment (see also Section 7.3 of Chapter 7).

2.3.2. Competency

In Part 1 of IEC 61508 (Paragraphs 6.2.13–15) the need for adequate competency is called for. It is open-ended in that it only calls for the training, knowledge, experience, and qualifications to be “relevant.” Factors listed for consideration are:
• Responsibilities and level of supervision
• Link between severity of consequences and degree of competence
• Link target SIL and degree of competence
• The link between design novelty and rigor of competence
• Relevance of previous experience
• Engineering application knowledge
• Technology knowledge
• Safety engineering knowledge
• Legal/regulatory knowledge
• Relevance of qualifications
• The need for training to be documented.

(a). IET/BCS “Competency guidelines for safety-related systems practitioners”

This was an early guidance document in this area. It listed 12 safety-related job functions (described as functions) broken down into specific tasks. Guidance is then provided on setting up a review process and in assessing capability (having regard to applications relevance) against the interpretations given in the document. The 12 jobs are:
Corporate Functional Safety Management: This concerns the competency required to develop and administer functional safety within an organization.
Project Safety Assurance Management: This extends the previous task into implementing the functional safety requirements in a project.
Safety-Related System Maintenance: This involves maintaining a system and controlling modifications so as to maintain the safety-integrity targets.
Safety-Related System Procurement: This covers the technical aspects of controlling procurement and subcontracts (not just administration).
Independent Safety Assessment: This is supervising and/or carrying out the assessments.
Safety Hazard and Risk Analysis: That is to say HAZOP (HAZard and OPerability study), LOPA, risk analysis, prediction, etc.
Safety Requirements Specification: Being able to specify all the safety requirements for a system.
Safety Validation: Defining a test/validation plan and executing and assessing the results of tests.
Safety-Related System Architectural Design: Being able to partition requirements into subsystems so that the overall system meets the safety targets.
Safety-Related System Hardware Realization: Specifying hardware and its tests.
Safety-Related System Software Realization: Specifying software, developing code, and testing the software.
Human Factors Safety Engineering: Assessing human error and engineering the interrelationships of the design with the human factors (Section 5.4 of Chapter 5).
The three levels of competence described in the document are:
The Supervised Practitioner who can carry out one of the above jobs but requiring review of the work.
The Practitioner who can work unsupervised and can manage and check the work of a Supervised Practitioner.
The Expert who will be keeping abreast of the state of art and will be able to tackle novel scenarios.
This IET/BCS document provided a solid basis for the development of competence. It probably goes beyond what is actually called for in IEC 61508. Due to its complexity it is generally difficult to put into practice in full and therefore might discourage some people from starting a scheme. Hence a simpler approach might be more practical. However, this is a steadily developing field and the requirements of “good practice” are moving forward.

(b). HSE document (2007) “Managing competence for safety-related systems”

More recently, this document was produced in cooperation with the IET and the BCS. In outline its structure is:
Phase One—Plan
Define purpose and scope
Phase Two—Design
Competence criteria
Processes and methods
Phase Three—Operate
Select and recruit
Assess competence
Develop competence
Assign responsibilities
Monitor
Deal with failure
Manage assessors' and managers' competence
Manage supplier competence
Manage information
Manage change
Phase Four—Audit and Review
Audit
Review

(c). Annex D of “Guide to the application of IEC 61511”

This is a fairly succinct summary of a competency management system which lists competency criteria for each of the life-cycle phases described in Section 1.4 Chapter 1 of this book.

(d). Competency register

Experience and training should be logged so that individuals can be assessed for the suitability to carry out tasks as defined in the company's procedure (Appendix 1 of this book).
Figure 2.6 shows a typical format for an Assessment Document for each person. These would form the competency register within the organization.
image
Figure 2.6 Competency register entry.

2.3.3. Independence of the Assessment

This is addressed in Part 1 Paragraph 8.2.18. The level of independence to be applied when carrying out assessments is recommended, and, according to the target SIL, can be summarized as:
SILAssessed by:
4Independent organization
3Independent department
2Independent person
1Independent person
For SILs 2 and 3 add one level of independence if there is lack of experience, unusual complexity, or novelty of design. Clearly, these terms are open to interpretation, and words such as “department” and “organization” will depend on the size and type of company. For example, in a large multiproject design company there might be a separate safety assessment department sufficient to meet the requirements of SIL 3. A smaller single-project company might, on the other hand, need to engage an independent organization or consultant in order to meet the SIL 3 requirement.
The level of independence to be applied when establishing SIL targets is recommended, according to consequence, as:
Multiple fatality, say >5Independent organization
Multiple fatalityIndependent department
Single fatalityIndependent person
InjuryIndependent person
For scenarios involving fatality, add one level of independence if there is lack of experience, unusual complexity, or novelty of design. Clearly, these terms are open to interpretation, and words such as “department” and “organization” will depend on the size and type of company.

2.3.4. Hierarchy of Documents

This will vary according to the nature of the product or project and the life-cycle activities involved. The following brief outline provides an overview from which some (or all) of the relevant documents can be taken.
Annex A of Part 1 addresses these lists. The following is an interpretation of how they might be implemented. It should be stressed that document titles (in themselves) need not be rigidly adhered to and that some might be incorporated into other existing documents. An example is the “safety requirements” which might in some cases sit within the “functional specification” providing that they are clearly identified as a coherent section.
• Functional safety requirements
• Functional safety plan (See Appendix 7 of this book)
• Validation plan (and report)
• Functional safety design specification (Hardware)
• Functional safety design specification (Software)
• Review plans (and reports)
• Test plans (and reports)
• Test strategy and procedures
• Safety Manual (maybe part of Users' Manual).
These are dealt with, as they occur, in Chapters 3 and 4.

2.3.5. Conformance Demonstration Template

In order to justify adequate functional safety management to satisfy Part 1 of the standard, it is necessary to provide a documented assessment.
The following Conformance Demonstration Template is suggested as a possible format.

IEC 61508 Part 1

Under “Evidence” enter a reference to the project document (e.g., spec, test report, review, calculation) which satisfies that requirement. Under “Feature” read the text in conjunction with the fuller text in this chapter.
FeatureEvidence
Adequate functional safety capability is demonstrated by the organization. To include a top level policy, clear safety life cycle describing the activities undertaken, procedures, functional safety audits and arrangements for independent assessment.
FS management system regularly reviewed and audited.
An adequate competency register that maps to projects and the requirement for named individuals for each FS role. Register to describe training and application area experience of individuals. Safety-related tasks to be defined. Review and training to be covered.
Evidence that contract and project reviews are mandatory to establish functional safety requirements.
The need for a clear documentation hierarchy describing the relationship of Q&S plan, functional spec, design docs, review strategy, integration, test and validation plans etc.
Existence of hardware and software design standards and defined hardware and software life-cycle models.
The recording and follow-up of hazardous incidents. Adequate corrective action.
Hazardous incidents addressed and handled.
Operations and maintenance adequately addressed where relevant.
Modifications and impact analysis addressed and appropriate change documentation.
Document and configuration adequate control.
FS assessment carried out.

FS, functional safety.

It is anticipated that the foregoing items will be adequately dealt with by the organization's quality managements systems and the additional functional safety procedure exampled in Appendix 1 of this book.

2.4. Societal Risk

As mentioned in Section 2.1.1 there is the separate issue of Societal Risk. Here, instead of addressing the risk to an individual, we are concerned with the tolerability of multiple fatality events (irrespective of the individual identities of the victims). One proceeds as follows:

2.4.1. Assess the Number of Potential Fatalities

This may not be a single number at all times of the day. The following example shows how a weighted average can be arrived at when overlapping groups of people are at risk over different periods of time:
For 4 hrs per day, 60 persons are at risk
For 17 hrs per week, 10 persons are at risk
For 24 hrs per day, 1 person is at risk
Weighted average of exposure is:
4/24×60+17/168×10+24/24×1=12fatalitiesimage

2.4.2. It Is Now Necessary to Address the Maximum Tolerable Risk

Unlike the Individual Risk criteria (Table 2.2), which address the probability as applying to an Individual, the criterion becomes the frequency of a fatal event (irrespective of the individuals concerned). As already mentioned, in Section 2.1.1, individual risk addresses a specific person(s). However, Societal Risk addresses the risk to a potentially changing group irrespective of their identity (e.g., the continuously changing occupants of a rail tunnel). Hence the criteria are expressed as frequencies for the event rather than risk to an individual. Figure 2.7 suggests criteria based on the number of potential fatalities. It has no specific provenance but can be related to HSE document R2P2 by virtue of a 2 × 104 pa maximum target for 50 fatalities. Thus, for the 12 fatality scenario above a maximum tolerable failure rate for the event of 103 pa is suggested.
image
Figure 2.7 Criteria per number of fatalities.
Although expressed in log by log format, it is a relationship which can be summarized (where N is the number of potential fatalities) as:

MaximumTolerableFrequency(societal)=102pa/NBroadlyAcceptableFrequency(societal)=104pa/N

image

2.4.3. The Propagation to Fatality

The propagation to fatality of an event is calculated as for Involuntary Risk, BUT, ignoring the element which addresses what proportion of the time any one is at risk, it having been accounted already in the Societal Risk concept. This was illustrated earlier in Section 2.1.1 by reference to the fact that 100% of the time there is someone at risk irrespective of the 2-min exposure of a named individual.

2.4.4. Scenarios with Both Societal and Individual Implications

This raises the question as to which approach (individual or societal) should prevail in any given scenario. Examples quoted above (a site with specific people at risk versus a pipeline to which numerous ever changing identities of persons are exposed) are fairly straightforward.
However some scenarios might need the application of BOTH individual and societal calculations and for ALARP to be satisfied in both cases.

2.5. Example Involving Both Individual and Societal Risk

A Pipeline passes through a tunnel which is utilized 24 hrs per day such that, at any time, 100 randomly selected persons are at risk from pipeline rupture. It is assessed that there would be potentially 100 fatalities given that an incident has a 75% chance of propagating to fatality. However, there are also three specific maintenance personnel at any time, each being present for 35 hrs per week (20%). It is assessed that all three might be potentially simultaneous fatalities given that an incident has a 50% chance of propagating to their fatality (note that this is not necessarily the same as the 75% above since the three maintenance personnel might be in a slightly different situation to the 100 travelers). There are no other simultaneous risks perceived. A reliability/integrity study has established a predicted frequency of pipeline breach of 5 × 105 pa. The pipeline will remain in situ for 25 years.

2.5.1. Individual Risk Argument

From Table 2.2 a voluntary (3 fatality) Maximum Tolerable Risk of 3 × 105 pa is chosen.
The Broadly Acceptable Risk is 3 × 107 pa.
The maximum tolerable failure rate for the pipeline is thus 3 × 105 pa/(50% × 20%) = 3 × 104 pa.
The predicted failure rate for the pipeline is 5 × 105 pa (from above).
Thus the predicted Individual Risk is 3 × 105 pa × 5 × 105/3 × 104 = 5 × 106 pa.
Applying the Figure 2.8 spreadsheet a GDF of 5.35 is obtained as shown below.
image
Figure 2.8 GDF diagram.
The cost per life saved criterion (Section 2.2), typically £2,000,000, therefore becomes £10.7 million. ALARP is tested as follows:

£10,700,000=£proposed/(5×106pa3×107pa)×threefatalities×25years.

image

Thus any expenditure within a budget of £3800 which might reduce the risk to the Broadly Acceptable level should be considered. If no realistic risk reduction can be obtained within this sum it might be argued that ALARP is satisfied.

2.5.2. Societal Risk Argument

From Figure 2.7 the max tolerable frequency of 104 pa (i.e., 102/100).
The Broadly Acceptable frequency is, by the same token, therefore 106 pa.
The maximum tolerable failure rate for the pipeline is thus 104 pa/(75%) = 1.3 × 104 pa.
The predicted failure rate for the pipeline is 5 × 105 pa (from above).
Thus the predicted risk frequency is 104 pa × 5 × 105/1.3 × 104 = 3.8 × 105 pa.
Applying the Figure 2.9 spreadsheet a GDF of 7.13 is obtained as shown below.
image
Figure 2.9 GDF diagram.
The cost per life saved criterion (Section 2.2), typically £2,000,000, therefore becomes £14.3 million. ALARP is tested as follows:

£14,300,000=£proposed/(3.8×105pa106pa)×100fatalities×25years.

image

Thus any expenditure within a budget of £1.32 million which might reduce the risk to the Broadly Acceptable level should be considered and a number of options could be available within that sum.

2.5.3. Conclusion

From the Individual Risk standpoint ALARP is argued to be satisfied by virtue of the negligible budget.
From the Societal Risk standpoint ALARP is not satisfied and risk reduction should be studied within the budget indicated.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset