Chapter 12

Burner Control Assessment (Example)

Abstract

This chapter consists of a possible report of an integrity study on a proposed replacement burner control system. The requirement, herein, involves the high-demand table and the target is expressed as a failure rate.

Keywords

Assessment; Design overview; Failures; Fault tree logic; Integrity requirements; MTBF; Safety integrity study
 
This chapter consists of a possible report of an integrity study on a proposed replacement burner control system. Unlike Chapter 11, the requirement involves the high-demand table and the target is expressed as a failure rate.
This is not intended as a MODEL report but an example of a typical approach. The reader may care to study it in the light of this book and attempt to list omissions and to suggest improvements.

Safety Integrity Study of a Proposed Replacement Boiler Controller

Executive Summary and Recommendations

Objectives

To establish a Safety-Integrity Level target, vis-à-vis IEC 61508, for a Boiler Control System which is regarded as safety-related.
To address the following failure mode: Pilots are extinguished but nevertheless burner gas continues to be released with subsequent explosion of the unignited gas.
To assess the design against the above target and to make recommendations.

Targets

A Maximum Tolerable Risk target of 104 pa which leads to a MAXIMUM TOLERABLE TARGET FAILURE RATE of 3 × 103 pa (see Section 12.2).
This implies a SIL 2 target.

Results

The frequency of the top event is 2 × 104 pa and the target is met. This result remains within the ALARP region but it was shown that further risk reduction is unlikely to be justified.
image
Figure 12.1 Fault tree (suppressing below Gates G1 and G2).

Recommendations

Review all the assumptions in Sections 12.2, 12.3 and 12.4.3. Review the failure rates and down times in Section 12.5 and the fault tree logic, in Figures 12.112.3, for a future version of this study.
Continue to address ALARP.
Place a SIL 2 requirement on the system vendor, in respect of the requirements of Parts 2 and 3 of IEC 61508.
Because very coarse assumptions have had to be made, concerning the programmable logic controller (PLC) and safety monitor (SAM) design, carry out a more detailed analysis with the chosen vendor.
Address the following design considerations with the vendor:
• Effect of loss of power supply, particularly where it is to only some of the equipment.
• Examine the detail of the PLC/SAM interconnections to the I/O and ensure that the fault tree logic is not compromised.
• Establish if the effect of failure of the valve limit switches needs to be included in the fault tree logic.

12.1. Objectives

(a) To establish a Safety-Integrity Level target, vis-à-vis IEC 61508, for a Boiler Control System which is regarded as safety related.
image
Figure 12.2 Fault tree (Gate G1).
image
Figure 12.3 Fault tree (Gate G2).
(b) To address the following failure mode: Pilots are extinguished but nevertheless burner gas continues to be released with subsequent explosion of the unignited gas.
(c) To assess the design against the above target.
(d) To make recommendations.

12.2. Integrity Requirements

IGEM SR/15 suggests target maximum tolerable risk criteria. These are, for individual risk:
1–2 Fatalities (Employee)104 pa
Broadly Acceptable106 pa
Assume that there is a 0.9 probability of ignition of the unburnt gases.
Assume that there is a 0.1 probability of the explosion leading to fatality.
Assume that there is a 0.5 probability that the oil burners are not active.
Assume that there is a 0.75 probability of there being a person at risk.
Hence the TARGET MAXIMUM TOLERABLE FAILURE RATE = 104 pa divided by (0.9 × 0.1 × 0.5 × 0.75) = 3 × 103 pa.
This invokes a SIL 2 target.

12.3. Assumptions

12.3.1. Specific

(a) Proof test is carried out annually. Thus the mean down time of unrevealed failures, being half the proof-test interval, is approximately 4000 hrs.
(b) The system is in operation 365 days per annum.
(c) The burner control system comprises a combination of four “XYZ Ltd” PLCs and a number of safety monitors (known as SAMs).

12.3.2. General

(a) Reliability assessment is a statistical process for applying historical failure data to proposed designs and configurations. It therefore provides a credible target/estimate of the likely reliability of equipment assuming manufacturing, design, and operating conditions identical to those under which the data were collected. It is a valuable design review technique for comparing alternative designs, establishing order of magnitude performance targets and evaluating the potential effects of design changes.
(b) Failure rates (symbol λ), for the purpose of this prediction, are assumed to be constant with time. Both early and wearout-related failures would decrease the reliability but are assumed to be removed by burn-in and preventive replacement, respectively.
(c) Each single component failure which causes system failure is described as a SERIES ELEMENT. This is represented, in fault tree notation, as an OR gate whereby any failure causes the top event. The system failure rate contribution from this source is obtained from the sum of the individual failure rates.
(d) Where coincident failures are needed to fail for the relevant system failure mode to occur then this is represented, in fault tree notation, as an AND gate where more than one failure is needed to cause the top event.
(e) The failure rates used, and thus the predicted MTBFs (mean time between failure) and availabilities, are those credibly associated with a well proven design after a suitable period of reliability growth. They might therefore be considered optimistic as far as field trial or early build states are concerned.
(f) Calendar-based failure rates have been used in this study.
(g) Software failures are systematic and, as such, are not random. They are not quantified in this study.

12.4. Results

12.4.1. Random Hardware Failures

The fault tree logic was constructed from a discussion of the failure scenarios at the meeting on 8 January 2001 involving Messrs “Q” and “Z.” The fault tree was analyzed using the TECHNIS fault tree package TTREE.
The frequency of the top event (Figure 12.1) is 2 × 104 pa (see Annex 1) which is well within the target.
Annex 1 shows the combinations of failures (cut sets) which lead to the failure mode in question. It is useful to note that at least three coincident events are required to lead to the top event. An “Importance” measure is provided for each cut set and it can be seen that no cut set contributes more than 1.4% of the total. There is therefore no suggestion of a critical component.

12.4.2. Qualitative Requirements

The qualitative measures required to limit software failures are listed, for each SIL, in the IGEM SR/15 and IEC 61508 documents. Although the IGEM guidance harmonizes closely with IEC 61508, compliance with SR/15 does not automatically imply compliance with IEC 61508.
It has to be stressed that this type of qualitative assessment merely establishes a measure of “adherence to a process” and does not signify that the quantitative SIL is automatically achieved by those activities. It addresses, however, a set of measures deemed to be appropriate (at the SIL) by the above documents.
It should also be kept in mind that an assessment is in respect of the specific failure mode. The assessment of these qualitative measures should therefore, ideally, be in respect of their application to those failure modes rather than in a general sense.

The purpose of the following is to provide an aide-memoire whereby features of the design cycle can be assessed in greater detail for inclusion in a later assessment. This list is based on safety integrity level (SIL 2).

1. Requirements

(a) Requirements Definition: This needs to be identified. It needs to be under configuration control with adequate document identification. It should also refer to the safety integrity requirements of the failure mode addressed in this report. Subject to this, the requirement will be met. A tender document, in response to the Requirements Specification, might well have been produced by the supplier and might well be identified.
(b) The Functional Specification needs to address the safety integrity requirement and to be specific about the failure modes. It will be desirable to state to the client that it is understood that the integrity issue is “loss of pilot followed by …” etc. Subject to this, the requirement will be met.
(c) The design may not utilize a CAD specification tool or formal method in delineating the requirement. However, the safety-related system might comprise simple control loops and therefore not involve parameter calculation, branching decision algorithms, or complex data manipulation. Thus, a formal specification language may not be applicable. The documentation might be controlled by ISO 9001 configuration control and appropriate software management. The need for an additional CAD specification tool may not be considered necessary. Subject to this, the requirement will be met.

2. Design and language

(a) There should be evidence of a “structured” design method. Examples include:
Logic diagrams
Data dictionary
Data flow diagrams
Truth tables Subject to this, the requirement will be met
(b) There should be a company-specific, or better still, project-specific coding/design standard which addresses, for example:
Use of a suitable language
Compiler requirements
Hygienic use of the language set
Use of templates (i.e., field proven) modules
No dynamic objects
No dynamic variables or online checking thereof
Limited interrupts, pointers, and recursion
No unconditional jumps
Fully defined module interfaces. Subject to this, the requirement will be met.
(c) Ascertain if the compiler/translator is certified or internally validated by long use. Subject to this, the requirement will be met.
(d) Demonstrate a modular approach to the structure of the code and rules for modules (i.e., single entry/exit). Subject to this, the requirement will be met.

3. Fault tolerance

(a) Assuming Type B components, and a nonredundant configuration, at least 90% safe failure fraction is required for SIL 2. It will be necessary to establish that 90% of PLC failures are either detected by the watchdog or result in failures not invoking the failure mode addressed in this study. Subject to a review, the requirement will be met.
(b) Desirable features (not necessarily essential) would be, error detection/correction codes and failure assertion programming. Subject to this, the requirement will be met.
(c) Demonstrate graceful degradation in the design philosophy. Subject to this, the requirement will be met.

4. Documentation and change control

(a) A description is needed here to cover: Rigour of configuration control (i.e., document master index, change control register, change notes, change procedure, requirements matrix (customer spec/FDS/FAT mapping)). Subject to this, the requirement will be met.
(b) The change/modification process should be fairly rigorous, key words are:
Impact analysis of each change
Re-verification of changed and affected modules (the full test not just the perceived change) Subject to this, the requirement will be met
Re-verification of the whole system for each change
Data recording during these re-tests
Subject to this, the requirement will be met.

5. Design review

(a) Formal design review procedure. Evidence that design reviews are:
Specifically planned in a Quality Plan document
Which items in the design cycle are to be reviewed (i.e., FDS, acceptance test results, etc.)
Described in terms of who is participating, what is being reviewed, what documents, etc.
Followed by remedial action
Specifically addressing the above failure mode
Code review see (b) Subject to this, the requirement will be met.
(b) Code: Specific code review at pseudo code or ladder or language level which addresses the above failure mode. Subject to this, the requirement will be met.
(c) There needs to be justification that the language is not suitable for static analysis and that the code walkthrough is sufficiently rigorous for a simple PLC language set in that it is a form of “low-level static analysis.” Subject to this, the requirement will be met.

6. Test (applies to both hardware and software)

(a) There should be a comprehensive set of functional and interface test procedures which address the above failure mode. The test procedures need to evidence some sort of formal test case development for the software (i.e., formally addressing the execution possibilities, boundary values, and extremes). Subject to this, the requirement will be met.
(b) There should be misuse testing in the context of failing due to some scenario of I/O or operator interface. Subject to this, the requirement will be met.
(c) There should be evidence of formal recording and review of all test results including remedial action (probably via the configuration and change procedures). Subject to this, the requirement will be met.
(d) There should be specific final validation test plan for proving the safety-related feature. This could be during commissioning. Subject to this, the requirement will be met.

7. Integrity assessment

Reliability modelling has been used in the integrity assessment.

8. Quality, safety, and management

(a) In respect of the safety integrity issues (i.e., for the above failure mode) some evidence of specific competency mapping is necessary to show that individuals have been chosen for tasks with the requirements in view (e.g., safety testing, integrity assessment). The competency requirements of IEC 61508 infer that appropriate job descriptions and training records for operating and maintenance staff are in place. Subject to this, the requirement will be met.
(b) Show that an ISO 9001 quality system is in operation, if not actually certified. Subject to this, the requirement will be met.
(c) Show evidence of safety management in the sense of ascertaining safety engineering requirements in a project as is the case in this project. This study needs to address the safety management system (known as functional safety capability) of the equipment designer and operator. Conformance with IEC 61508 involves this aspect of the safety-related equipment. Subject to this, the requirement will be met.
(d) Failure recording, particularly where long term evidence of a component (e.g., the compiler or the PLC hardware) can be demonstrated is beneficial. Subject to this, the requirement will be met.

9. Installation and commissioning

There needs to be a full commissioning test. Also, modifications will need to be subject to control and records will need to be kept. Subject to this, the requirement will be met.

12.4.3. ALARP

The ALARP (as low as reasonably practicable) principle involves deciding if the cost and time of any proposed risk reduction is, or is not, grossly disproportionate to the safety benefit gained.
The demonstration of ALARP is supported by calculating the Cost per Life Saved of the proposal. The process is described in Chapter 2. Successive improvements are considered in this fashion until the cost becomes disproportionate. The target of 3 × 103 pa corresponded to a maximum tolerable risk target of 104 pa. The resulting 2 × 104 pa corresponds to a risk of 6.6 × 106 pa. This individual risk is not as small as the Broadly Acceptable level and ALARP should be considered.
Assuming, for the sake of argument, that the scenario is sufficiently serious as to involve two fatalities, then any proposed further risk reduction would need to be assessed against the ALARP principle. Assuming cost of a £2,000,000 per life saved criterion then the following would apply to a proposed risk reduction, from 6.6 × 106 pa. Assuming a 30-year plant life:

£2,000,000=(Proposedexpenditure)([6.6×106106]×30×2)

image

Thus: proposed expenditure = £672.
It seems unlikely that the degree of further risk reduction referred to could be achieved within £672 and thus it might be argued that ALARP is satisfied.

12.5. Failure Rate Data

In this study the FARADIP.THREE Version 9.0 data ranges have been used for some of the items. The data are expressed as ranges. In general the lower figure in the range, used in a prediction, is likely to yield an assessment of the credible design objective reliability: that is, the reliability which might reasonably be targeted after some field experience and a realistic reliability growth programme. The initial (field trial or prototype) reliability might well be an order of magnitude less than this figure. The centre column figure (in the FARADIP software package) indicates a failure rate which is more frequently indicated by the various sources. It has been used where available. The higher figure will probably include a high proportion of maintenance revealed defects and failures. F3 refers to FARADIP.THREE, Judge refers to judgement.
Code (Description)ModeFailure rate PMH (or fixed per hr probability)Mode rate 106MDT (hrs)Reference
CCF1 (common cause failures)Any0.10.124JUDGE
CCF2/3 (common cause failures)Any0.10.14000JUDGE
ESDOC (ESD button)o/c0.10.124F3
UV (UV detector)Fail5224F3
MAINS (UV separate supply)Fail5524JUDGE
PLC… (revealed failures)5124JUDGE
PLC… (unrevealed failures)514000JUDGE
FAN (any fan)Fail101024F3
PSWL (pressure switch)Low2124F3
PSWH (pressure switch)High2124F3
CG10CL (Pilot diaphragm vlv)Closed2124F3
CG9CL (slamshut)Sp close124F3
CG11… (slamshuts)Sp close424F3
COG5… (butterfly vlv)Fail to close24000F3
CG4OP… (butterfly vlv)Fail to close24000F3
CG5OP (diaphragm vlv)Fail to close24000F3
BFG… (blast gas vlvs)24000F3

image

MDT, mean down time.

12.6. References

A reference section would normally be included.

Annex I Fault Tree Details

File name: Burner.TRO.

Results of fault tree quantification for top event: GTOP.

Top event frequency = 0.222E  07 per hr = 0.194E  03 per year
Top event MTBF = 0.451E + 08 hr = 0.515E + 04 years
Top event probability = 0.526E  06

Basic event reliability data

Basic eventTypeFailure rateMean fault duration
CCF1I/E0.100E  0624.0
CG10CLI/E0.100E  0524.0
ESDOCI/E0.100E  0624.0
PSW1LI/E0.100E  0524.0
CG9CLI/E0.100E  0524.0
PLCSM1I/E0.100E  0524.0
FANIDI/E0.100E  0424.0
FANFDI/E0.100E  0424.0
PSW4HI/E0.100E  0524.0
PSW5HI/E0.100E  0524.0
CG11ACI/E0.400E  0524.0
PLCSM2I/E0.100E  0524.0
CG11BCI/E0.400E  0524.0
PLCSM3I/E0.100E  0524.0
CG11CCI/E0.400E  0524.0
PLCSM4I/E0.100E  0524.0
CG11DCI/E0.400E  0524.0
PLCSM5I/E0.100E  0524.0
MAINSI/E0.500E  0524.0
UV1I/E0.200E  0524.0
UV2I/E0.200E  0524.0
UV3I/E0.200E  0524.0
UV4I/E0.200E  0524.0
Table Continued

image

Basic eventTypeFailure rateMean fault duration
PLCSM6I/E0.100E  050.400E + 04
CCF3I/E0.100E  060.400E + 04
COG5AOI/E0.200E  050.400E + 04
PLCSM7I/E0.100E  050.400E + 04
COG5BOI/E0.200E  050.400E + 04
PLCSM8I/E0.100E  050.400E + 04
COG5COI/E0.200E  050.400E + 04
PLCSM9I/E0.100E  050.400E + 04
COG5DOI/E0.200E  050.400E + 04
PLCS10I/E0.100E  050.400E + 04
CG4OPI/E0.200E  050.400E + 04
CG5OPI/E0.200E  050.400E + 04
BFG1OPI/E0.100E  050.400E + 04
PLCS11I/E0.100E  050.400E + 04
CCF2I/E0.100E  060.400E + 04
BFG5AOI/E0.100E  050.400E + 04
PLCS12I/E0.100E  050.400E + 04
BFG5BOI/E0.100E  050.400E + 04
PLCS13I/E0.100E  050.400E + 04
BFG5COI/E0.100E  050.400E + 04
PLCS14I/E0.100E  050.400E + 04
BFG5DOI/E0.100E  050.400E + 04
PLCS15I/E0.100E  050.400E + 04
BFG5EOI/E0.100E  050.400E + 04
PLCS16I/E0.100E  050.400E + 04
BFG5FOI/E0.100E  050.400E + 04
PLCS17I/E0.100E  050.400E + 04
BFG5GOI/E0.100E  050.400E + 04
PLCS18I/E0.100E  050.400E + 04
BFG5HOI/E0.100E  050.400E + 04
PLCS19I/E0.100E  050.400E + 04

image

Barlow–Proschan measure of cut set importance (Note: This is the name given to the practice of ranking cut sets by frequency)
Rank 1 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic eventTypeFailure rateMean fault duration
FANIDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5AOI/E0.200E  050.400E + 04

image

Rank 2 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic eventTypeFailure rateMean fault duration
FANIDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5BOI/E0.200E  050.400E + 04

image

Rank 3 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic eventTypeFailure rateMean fault duration
FANIDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5COI/E0.200E  050.400E + 04

image

Rank 4 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic eventTypeFailure rateMean fault duration
FANIDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5DOI/E0.200E  050.400E + 04

image

Rank 5 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic evenTypeFailure rateMean fault duration
FANFDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5AOI/E0.200E  050.400E + 04

image

Rank 6 Importance 0.144E  01 MTBF hrs 0.313E + 10 MTBF years 0.357E + 06.
Basic eventTypeFailure rateMean fault duration
FANFDI/E0.100E  0424.0
PLCSM6I/E0.100E  050.400E + 04
COG5BOI/E0.200E  050.400E + 04

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset