Chapter 3

Meeting IEC 61508 Part 2

Abstract

This chapter covers Part 2 of IEC 61508 addressing the safety system hardware and overall system design. The authors have attempted, in this chapter, to simplify the highly complex set of requirements to a credible representation of distinct design requirements for safety, which includes requirements for design and development and those involving complete specification.

Keywords

ASIC; Demonstration template; FMEA; Proven in use; Redundant units; Safe failure fraction (SFF); Safety requirement specification (SRS)
 
IEC 61508 Part 2 covers the safety system hardware and overall system design, whereas software design is covered by Part 3 (see next chapter). This chapter summarizes the main requirements. However, the following points should be noted first.

The appropriateness of each technique, and the degree of refinement (e.g., high, medium, low), represents the opinions of individuals involved in drafting the Standard.

The combination of text (e.g., paragraphs 7.1–7.9) and tables (both A and B series) and the use of modifying terms (such as high, medium, and low) to describe the intensity of each technique have led to a highly complex set of requirements. Their interpretation requires the simultaneous reading of textual paragraphs, A tables, B tables, and Table B6—all on different pages of the standard. The A Tables are described as referring to measures for controlling (i.e., revealing) failures and the B Tables to avoidance measures.

The authors of this book have, therefore, attempted to simplify this “algorithm of requirements” and this chapter is offered as a credible representation of requirements.

At the end of this chapter a “conformance demonstration template” is suggested which, when completed for a specific product or system assessment, will offer evidence of conformance to the safety integrity level (SIL) in question.

The approach to the assessment will differ substantially between:
COMPONENT (e.g., Transducer) DESIGN
and
APPLICATIONS SYSTEM DESIGN
The demonstration template tables at the end of this chapter cater for the latter case. Chapter 8, which covers the restricted subset of IEC 61511, also caters for application software.

3.1. Organizing and Managing the Life Cycle

Sections 7.1 of the Standard: Table ‘1’

The idea of a design life cycle has already been introduced to embrace all the activities during design, manufacture, installation, and so on. The exact nature of the design-cycle model will depend on complexity and the type of system being designed. The IEC 61508 model (in Part 1 of the Standard) may well be suitable and was fully described in Chapter 1 of this book. In IEC 61508 Part 2 its Table “1” describes the life-cycle activities again and is, more or less, a repeat of Part 1.
A major point worth making is that the life-cycle activities should all be documented. Unless this is done, there is no visibility to the design process, and an assessment cannot verify that the standard has been followed. This should be a familiar discipline inasmuch as most readers will be operating within an ISO 9001 management system of practice. The design should be conducted under a project management regime and adequately documented to provide traceability. These requirements can be met by following a quality system such as specified in ISO 9001. The level and depth of the required project management and documentation will depend on the SIL level. The use of checklists is desirable at all stages.
The need for Functional Safety Capability (more recently called Functional Safety Management) has been described in Chapter 2, Section 2.3 and also in Appendix 1. IEC 61508 Part 2 (Hardware) and Part 3 (Software) expect this to have been addressed.
Irrespective of the target SIL, there needs to be a project management structure which defines all the required actions and responsibilities, along with defining adequate competency, of the persons responsible for each task. There needs to be a “Quality and Safety” Plan which heads the documentation hierarchy and describes the overall functional safety targets and plans. All documentation and procedures need to be well structured, for each design phase, and be sufficiently clear that the recipient for the next phase can easily understand the inputs to that task. This is sufficiently important that Appendix 7 of this book provides more detail.
SIL 3 and SIL 4 require, also, that the project management identifies the additional procedures and activities required at these levels and that there is a robust reporting mechanism to confirm both the completion and correctness of each activity. The documentation used for these higher SIL systems should be generated based on standards which give guidance on consistency and layout and include checklists. In addition, for SIL 4 systems, computer-aided configuration control and computer-aided design documentation should be used. Table B6 of the Standard elaborates on what constitutes a higher rigor of techniques and measures. Project Management, for example, requires validation independent from design and using a formalized procedure, computer-aided engineering, etc., in order to attract the description “high effectiveness.”
Much of the above “good practice” (e.g., references to Project Management) tends to be repeated, throughout the Standard, for each of the life-cycle activities, in both text and tables. We have attempted to avoid such repetition in this book. There are many other aspects of the Standard's guidance which are repetitious and we have tended to refer to each item once and in the most appropriate section.
The need for validation planning is stressed in the Standard and this should be visible in the project Quality/Safety Plan which will include reference to the Functional Safety Audits.
In general this whole section should be met by implementing the template Functional Safety Procedure provided in Appendix 1.

3.2. Requirements Involving the Specification

Section 7.2 of the Standard: Table B1 (avoidance)

(a) The safety requirements specification

This is an important document because it is crucial to specify the requirements of a safety system correctly and completely. Irrespective of the SIL target it should be clear, precise, unambiguous, testable, and well structured, and cover:
• Description of the hazards
• Integrity level requirements plus type of operation, i.e., low demand or high demand for each function
• Response times
• Safety function requirements, definition of the safe state and how it is achieved
• System documents (e.g., P&IDS, cause and effect matrices, logic diagrams, process data sheets, equipment layouts)
• System architecture
• Operational performance and modes of operation
• Behavior under fault conditions
• Start-up and reset requirements
• Input ranges and trip values, outputs, overrides
• Manual shutdown details
• Behavior under power loss
• Interfaces with other systems and operators
• Environmental design requirements for the safety system equipment
• Electromagnetic compatibility
• Requirements for periodic tests and/or replacements
• Separation of functions (see below)
• Deliverables at each life-cycle stage (e.g., test procedures, results).
Structured design should be used at all SIL levels. At the system application level the functional requirements (i.e., logic) can be expressed by using semiformal methods such as cause and effect diagrams or logic/function block diagrams. All this can be suitable up to SIL 3. These include Yourdon, MASCOT, SADT, and several other techniques referenced in Part 7 of the Standard. In the case of new product design rather than applications engineering (i.e., design of executive software) structured methods should be progressively considered from SIL 2 upwards. For SIL 4 applications structured methods should be used.
ESC's SILComp® software generates safety requirement specification (SRS) automatically, based on data from SIL Targeting and Assessment (Verification). This is of particular use when managing a large number of safety instrument functions.

(b) Separation of functions

In order to reduce the likelihood of common cause failures the specification should also cover the degree of separation required, both physically and electrically, between the EUC and the safety system(s). Any necessary data interchange between the two systems should also be tightly specified and only data flow from the EUC to the safety system permitted.
These requirements need to be applied to any redundant elements of the safety-related system(s).
Achieving this separation may not always be possible since parts of the EUC may include a safety function that cannot be dissociated from the control of the equipment. This is more likely for the continuous mode of operation in which case the whole control system should be treated as safety-related pending target SIL calculations (Chapter 2, Section 2.1).
If the safety-related and non-safety-related system elements cannot be shown to be sufficiently independent then the complete system should be treated as safety-related.
For SIL 1 and SIL 2 there should be a clear specification of the separation between the EUC and the safety system, and electrical/data interfaces should be well defined. Physical separation should be considered.
For SIL 3 there should be physical separation between the EUC and the safety system and, also, the electrical/data interfaces should be clearly specified. Physical separation of redundant parts of the safety system should be considered.
For SIL 4 there should be total physical/electrical/data separation between the safety system and the EUC and between the redundant parts of the safety system.

3.3. Requirements for Design and Development

Section 7.4 of the Standard: Table B2 (avoidance)

3.3.1. Features of the Design

Sections 7.4.1–7.4.11 excluding 7.4.4 and 7.4.5
(a) Use of “in-house” design standards and work practices needs to be evident. These will address proven components and parts, preferred designs and configurations, etc.
(b) On manual or auto-detection of a failure the design should ensure system behavior which maintains the overall safety targets. In general, this requires that failure in a safety system having redundant paths should be repaired within the mean time to repair that is assumed in the hardware reliability calculations. If this is not possible, then the procedure should be the same as for nonredundant paths as follows. On failure of a safety system with no redundant paths, either additional process monitoring should be provided to maintain adequate safety or the EUC should be shut down.
(c) Sector specific requirements need to be observed. Many of these are contained in the documents described in Chapters 810.
(d) The system design should be structured and modular and should use well-tried modules/components. Structured, in this context, implies clear partitioning of functions and a visible hierarchy of modules and their interconnection. For SIL 1 and SIL 2 the modularity should be kept to a “limited size” and each module/component should have had previously documented field experience for at least one year with 10 devices. If previous experience does not exist, or is insufficiently documented, then this can be replaced with additional modular/component testing. Such use of subjective descriptions (e.g., the “limited size”) adds further weight to the desirability of “in-house” checklists, which can be developed in the light of experience.
In addition for SIL 3 systems, previous experience is needed in a relevant application and for a period of at least 2 years with 10 devices or, alternatively, some third-party certification.
SIL 4 systems should be both proven in use, as mentioned above, and have third-party certification.
It is worth mentioning that the “years” of operation referred to above assume full time use (i.e., 8760 hrs per annum).
(e) Systematic failures caused by the design (this refers to Tables A15 and A18): the primary technique is to use monitoring circuitry to check the functionality of the system. The degree of complexity required for this monitoring ranges from “low” for SIL 1 and SIL 2, through “medium” for SIL 3 to “high” for SIL 4.
For example, a PLC-based safety system with a SIL 1 or SIL 2 target would require, as a minimum, a watchdog function on the PLC CPU being the most complex element of this “lower” integrity safety system.
These checks would be extended in order to meet SIL 3 and would include additional testing on the CPU (i.e., memory checks) along with basic checking of the I/O modules, sensors, and actuators.
The coverage of these tests would need to be significantly increased for SIL 4 systems. Thus the degree of testing of input and output modules, sensors, and actuators would be substantially increased. Again, however, these are subjective statements and standards such as IEC 61508 do not and cannot give totally prescriptive guidance. Nevertheless some guidance is given concerning diagnostic coverage.

It should be noted that the minimum configuration table given in Section 3.3.2a of this chapter permits higher SIL claims, despite lower levels of diagnosis, by virtue of either more redundancy or a higher proportion of “fail safe” type failures. The 2010 version allows a proven-in-use alternative (see Section 3.3.2b).

(f) Systematic failures caused by environmental stress (this refers to Table A16): this requirement applies to all SILs and states that all components (indeed the overall system) should be designed and tested as suitable for the environment in question. This includes temperature and temperature cycling, emc (electromagnetic compatibility), vibration, electro-static, etc. Components and systems that meet the appropriate IEC component standards, or CE marking, UL (Underwriters Laboratories Inc) or FM (Factory Mutual) approval would generally be expected to meet this requirement.
(g) Systematic operation failures (this refers to Table A17): for all SILs the system should have protection against online modifications of either software or hardware.
There needs to be feedback on operator actions, particularly when these involve keyboards, in order to assist the operator in detecting mistakes.
As an example of this, for SIL 1 and SIL 2, all input operator actions should be repeated back whereas, for SIL 3 and SIL 4, significant and consistent validation checks should be made on the operator action before acceptance of the commands.
The design should take into account human capabilities and limitations of operators and maintenance staff. Human factors are addressed in Section 5.4 of this book.
(h) Tables A1 to A15 of the Standard are techniques considered suitable for achieving improvements in diagnostic capability. The following Section 3.3.2 discusses diagnostic capability and Safe Failure Fraction (SFF). Carrying out a detailed Failure Mode Effects Analysis (FMEA) (Appendix 4) will generally provide a claim of diagnostic capability which overrides these tables. However, they can be used as a guide to techniques.
(i) Communications: Paragraph 7.4.11 of the Standard requires one to address the failure rate of the communications process. Channels are described in two ways:
• White Box: where the communications executive software has already been designed and certified to provide the appropriate integrity (e.g., use of self-test etc.)
• Black Box: where the integrity is designed in at the applications software level because the white box claim cannot be made.
(j) Synthesis of elements: Paragraph 7.4.3 allows a configuration involving parallel elements, each demonstrating a particular SIL in respect of systematic failures, to claim an increment of one SIL. This requires that a common cause analysis has been carried out in order to demonstrate independence by use of appropriate techniques (e.g., functional diversity). Figure 3.1 illustrates the idea. In other words, diverse designs are needed as well as mere redundancy.
image
Figure 3.1 Showing two SIL 2 elements achieving a SIL 3 result.

3.3.2. Architectures (i.e., SFF)

Section 7.4.4 Tables ‘2’ and 3

(a) Claim via SFF (known, in the Standard, as Route 1H)
Regardless of the hardware reliability calculated for the design, the Standard specifies minimum levels of redundancy coupled with given levels of fault tolerance (described by the SFF). This can be estimated as shown in Appendix 4.
The term Safe Failure Fraction (SFF) is coined, in IEC 61508. It is defined as the sum of the potentially dangerous failures revealed by auto-test together with those which result in a safe state, as a fraction of the TOTAL number of failures.

SFF=Totalrevealedhazardousfailures+TotalsafefailuresTotalfailures

image

(Thus the bottom line is the top line PLUS the unrevealed hazardous failures)
There is a significant change (in the 2010 version of IEC 61508) in that previously “safe” failures included all failures which have no adverse effect on the safety function. This has now been narrowed to admit only those which result in forcing the so-called “safe” state which therefore infers a spurious triggering of the safety function (e.g., shutdown or trip). The net result is to reduce the quantity of failures defined as “safe” which, being on the top and bottom of the equation, effectively reduces the SFF which can be claimed.
An example might be a slamshut valve where 80% of the failures are “spurious closure” and 20% “fail to close”. In that case, an 80% “SFF” would be claimed without further need to demonstrate automatic diagnosis. On the other hand, a combined example might be a control system whereby 50% of failures are “fail-safe” and the remaining 50% enjoy a 60% automatic diagnosis. In this latter case the overall SFF becomes 80% (i.e., 50% + 0.6 × 50%).
There are two Tables which cover the so-called “Type A” components (Failure modes well defined PLUS behavior under fault conditions well defined PLUS failure data available) and the “Type B” components (likely to be more complex and whereby any of the above are not satisfied).
In the following tables “m” refers to the number of failures which lead to system failure. The tables provide the maximum SIL which can be claimed for each SFF case. The expression “m + 1” implies redundancy whereby there are (m + 1) elements and m failures are sufficient to cause system failure. The term Hardware Fault Tolerance (HFT) is commonly used. An HFT of 0 implies simplex (i.e., no failures tolerated). An HFT of one implies m out of (m + 1) (i.e., one failure tolerated) and so on.

Requirements for SFF

Type A SFFSIL for Simplex HFT 0SIL for (m + 1) HFT 1SIL for (m + 2) HFT 2
<60%123
60–90%234
90–99%344
>99%344

image

Type B SFFSIL for simplex HFT 0SIL for (m + 1) HFT 1SIL for (m + 2) HFT 2
<60%NO12
60–90%123
90–99%234
>99%344

image

Simplex implies no redundancy

(m + 1) implies 1 out of 2, 2 out of 3 etc

(m + 2) implies 1 out of 3, 2 out of 4 etc

 This configuration is not allowed.

The above table refers to 60%, 90%, and 99%. At first this might seem a realistic range of safe fail fraction ranging from simple to comprehensive. However, it is worth considering how the diagnostic part of each of these coverage levels might be established. There are two ways in which diagnostic coverage and SFF ratios can be assessed:
By test: where failures are simulated and the number of diagnosed failures, or those leading to a safe condition, are counted.
By FMEA: where the circuit is examined to ascertain, for each potential component failure mode, whether it would be revealed by the diagnostic program or lead to a safe condition.
Clearly a 60% SFF could be demonstrated fairly easily by either method. Test would require a sample of only a few failures to reveal 60%.
Turning to 90% coverage, the test sample would now need to exceed 20 failures (for reasonable statistical significance) and the FMEA would require a more detailed approach. In both cases the cost and time become more significant. An FMEA as illustrated in Appendix 4 is needed and might well involve three to four man-days.
For 99% coverage a reasonable sample size would now exceed 200 failures and the test demonstration is likely to be impracticable.
The foregoing should be considered carefully to ensure that there is adequate evidence to claim 90% and an even more careful examination before accepting the credibility of a 99% claim.
In order to take credit for diagnostic coverage, as described in the Standard (i.e., the above Architectural Constraint Tables), the time interval between repeated tests should at least be an order of magnitude less than the expected demand interval. For the case of a continuous system then the auto-test interval plus the time to put the system into a safe state should be within the time it takes for a failure to propagate to the hazard.
Furthermore, it is important to remember that auto-test means just that. Failures discovered by however frequent manual proof tests are not credited as revealed for the purpose of an SFF claim.
(b) Claim via field failure data (7.4.4.2 of Part 2) (known, in the Standard, as Route 2H)
The 2010 version of the Standard permits an alternative route to the above “architectures” rules. If well documented and verified FIELD (not warranty/returns) based failure rate data is available for the device in question, and is implied at 90% statistical confidence (see Section 3.10). Also the “architecture” rules are modified as follows:
In addition, the following redundancy rules (7.4.4.3.1 of Part two) will apply:
SIL 4—HFT of 2 (i.e., 1 out of 3, 2 out of 4, etc.)
SIL 3—HFT of 1 (i.e., 1 out of 2, 2 out of 3, etc.)
SIL 2—HFT of 0 (i.e., simplex but low demand only)
SIL 1—HFT of 0 (i.e., simplex low or high demand)
However, the majority of so-called data tends to be based on manufacturers' warranty statistics or even FMEAs and does NOT qualify as field data. The authors therefore believe that invoking this rule is unlikely to be frequently justified.

3.3.3. Random Hardware Failures

Section 7.4.5

This is traditionally known as “reliability prediction” which, in the past, has dominated risk assessment work. It involves specifying the reliability model, the failure rates to be assumed, the component down times, diagnostic intervals, and coverage. It is, of course, only a part of the picture since systematic failures must be addressed qualitatively via the rigor of life-cycle activities.
Techniques such as FMEA, reliability block diagrams, and fault tree analysis are involved, and Chapters 5 and 6 together with Appendix 4 briefly describe how to carry these out. The Standard refers to confidence levels in respect of failure rates and this will be dealt with later.
In Chapter 1 we mentioned the anomaly concerning the allocation of the quantitative failure probability target to the random hardware failures alone. There is yet another anomaly concerning judgment of whether the target is met. If the fully quantified approach (described in Chapter 2) has been adopted then the failure target will be a PFD (probability of failure on demand) or a failure rate. The reliability prediction might suggest that the target is not met although still remaining within the PFD/rate limits of the SIL in question. The rule here is that since we have chosen to adopt a fully quantitative approach we should meet the target set (paragraph 7.4.5.1 of Part 2 of the Standard confirms this view). For example a PFD of 2 × 103 might have been targeted for a safety-related risk reduction system. This is, of course, SIL 2. The assessment might suggest that it will achieve 5 × 103 which is indeed SIL 2. However, since a target of 2 × 103 is the case then that target has NOT been met.
The question might then be asked “What if we had opted for a simpler risk graph approach and stated the requirement merely as a SIL—then would we not have met the requirement?” Indeed we have and this appears to be inconsistent. Once again there is no right or wrong answer to the dilemma. The Standard does not address it and, as in all such matters, the judgment of the responsible engineer is needed. Both approaches are admissible and, in any case, the accuracy of quantification is not very high (see Chapter 5).

3.4. Integration and Test (Referred to as Verification)

Section 7.5 and 7.9 of the Standard Table B3 (avoidance)

Based on the intended functionality the system should be tested, and the results recorded, to ensure that it fully meets the requirements. This is the type of testing which, for example, looks at the output responses to various combinations of inputs. This applies to all SILs.
Furthermore, a degree of additional testing, such as the response to unusual and “not specified” input conditions should be carried out. For SIL 1 and SIL 2 this should include system partitioning testing and boundary value testing. For SIL 3 and SIL 4 the tests should be extended to include test cases that combine critical logic requirements at operation boundaries.

3.5. Operations and Maintenance

Section 7.6 Table B4 (avoidance)

(a) The system should have clear and concise operating and maintenance procedures. These procedures, and the safety system interface with personnel, should be designed to be user, and maintenance, friendly. This applies to all SIL levels.
(b) Documentation needs to be kept, of audits and for any proof testing that is called for. There need to be records of the demand rate of the safety-related equipment, and furthermore failures also need to be recorded. These records should be periodically reviewed, to verify that the target safety integrity level was indeed appropriate and that it has been achieved. This applies to all SILs.
(c) For SIL 1 and SIL 2 systems, the operator input commands should be protected by key switches/passwords and all personnel should receive basic training. In addition, for SIL 3 and SIL 4 systems operating/maintenance procedures should be highly robust and personnel should have a high degree of experience and undertake annual training. This should include a study of the relationship between the safety-related system and the EUC.

3.6. Validation (Meaning Overall Acceptance Test and the Close Out of Actions)

Section 7.3 and 7.7: Table B5

The object is to ensure that all the requirements of the safety system have been met and that all the procedures have been followed (albeit this should follow as a result of a company's functional safety capability).
A validation plan is needed which cross-references all the functional safety requirements to the various calculations, reviews, and tests which verify the individual features. The completed cross-referencing of the results/reports provides the verification report. A spreadsheet is often effective for this purpose.
It is also necessary to ensure that any remedial action or additional testing arising from earlier tests has been carried out. In other words there is:
• a description of the problem (symptoms)
• a description of the causes
• the solution
• evidence of re-testing to clear the problem
This requirement applies to all SIL levels.

3.7. Safety Manuals

Section 7.4.9.3–7 and App D

For specific hardware or software items a safety manual is called for. Thus, instrumentation, PLCs, and field devices will each need to be marketed with a safety manual. Re-useable items of code and software packages will also require a safety manual. Contents should include, for hardware (software is dealt with in the next chapter):
• a detailed specification of the functions
• the hardware and/or software configuration
• failure modes of the item
• for every failure mode an estimated failure rate
• failure modes that are detected by internal diagnostics
• failure modes of the diagnostics
• the hardware fault tolerance
• proof test intervals (if relevant)

3.8. Modifications

Section 7.8

For all modifications and changes there should be:
• revision control
• a record of the reason for the design change
• an impact analysis
• re-testing of the changed and any other affected modules.
The methods and procedures should be exactly the same as those applied at the original design phase. This paragraph applies to all SILs.
The Standard requires that, for SIL 1, changed modules are re-verified, for SIL 2 all affected modules are re-verified. For software (see Chapter 4) at SIL 3 the whole system is re-validated.

3.9. Acquired Subsystems

For any subsystem which is to be used as part of the safety system, and is acquired as a complete item by the integrator of the safety system, the following parameters will need to be established, in addition to any other engineering considerations.
• Random hardware failure rates, categorized as:
fail safe failures,
dangerous failures detected by auto-test, and
dangerous failures detected by proof test;
• Procedures/methods for adequate proof testing;
• The hardware fault tolerance of the subsystem;
• The highest SIL that can be claimed as a consequence of the measures and procedures used during the design and implementation of the hardware and software; or
• A SIL derived by claim of “proven in use” see Paragraph 3.10 below.

3.10. “Proven in Use” (Referred to as Route 2s in the Standard)

The Standard calls the use of the systematic techniques described in this chapter route 1s. Proven in use is referred to as route 2s. It also refers to route 3s but this is, in fact a matter for Part 3.
As an alternative to all the systematic requirements summarized in this Chapter, adequate statistical data from field use may be used to satisfy the Standard. The random hardware failures prediction and SFF demonstrations are, however, still required. The previous field experience should be in an application and environment, which is very similar to the intended use. All failures experienced, whether due to hardware failures or systematic faults, should be recorded, along with total running hours. The Standard asks that the calculated failure rates should be claimed using a confidence limit of at least 70% (note that the 2H rule asks for 90%).
Paragraph 7.4.10 of Part 2 allows for statistical demonstration that a SIL has been met in use. In Part 7 Annex D there are a number of pieces of statistical theory which purport to be appropriate to establishing confidence for software failures. However, the same theory applies to hardware failures and for the purposes of the single-sided 70% requirement can be summarized as follows.
For zero failures, the following “number of operations/demands” or “equipment hours” are necessary to infer that the lower limit of each SIL has been exceeded. Note that the operations and years should be field experience and not test hours or test demands.
SIL 1 (1: 101 or 101 per annum)12 operationsor 12 years
SIL 2 (1: 102 or 102 per annum)120 operationsor 120 years
SIL 3 (1: 103 or 103 per annum)1200 operationsor 1200 years
SIL 4 (1: 104 or 104 per annum)12,000 operationsor 12,000 years
For one failure, the following table applies. The times for larger numbers of failures can be calculated accordingly (i.e., from chi-square methods).
SIL 1 (1: 101 or 101 per annum)24 operationsor 24 years
SIL 2 (1: 102 or 102 per annum)240 operationsor 240 years
SIL 3 (1: 103 or 103 per annum)2400 operationsor 2400 years
SIL 4 (1: 104 or 104 per annum)24,000 operationsor 24,000 years
The 90% confidence requirement would approximately double the experience requirement. The theory is dealt with in Smith DJ, Reliability, Maintainability and Risk.

3.11. ASICs and CPU Chips

(a). Digital ASICs and User Programmable ICs

Section 7.4.6.7 and Annex F of the Standard

All design activities are to be documented and all tools, libraries, and production procedures should be proven in use. In the case of common or widely used tools, information about possible bugs and restrictions is required.
All activities and their results should be verified, for example by simulation, equivalence checks, timing analysis, or checking the technology constraints.
For third-party soft cores and hard cores, only validated macro blocks should be used and these should comply with all constraints and proceedings defined by the macro core provider if practicable. Unless already proven in use, each macro block should be treated as a newly written code, for example, it should be fully validated.
For the design, a problem-oriented and abstract high-level design methodology and design description language should be used. There should be adequate testability (for production test). Gate and interconnection (wire) delays should be considered.
Internal gates with tristate outputs should be avoided. If internal tristate outputs are used these outputs should be equipped with pull-ups/downs or bus-holders.
Before production, an adequate verification of the complete ASIC (i.e., including each verification step carried out during design and implementation to ensure correct module and chip functionality) should be carried out.
There are two tables in Annex F to cover digital ASICs and programmable ICs. They are very similar and are briefly summarized in one of the tables at the end of this chapter.

(b). Digital ICs with On-Chip Redundancy (up to SIL 3)

Annex E of the Standard

A single IC semiconductor substrate may contain on-chip redundancy subject to conservative constraints and given that there is a Safety Manual.
Establish separate physical blocks on the substratum of the IC for each channel and each monitoring element such as a watchdog. The blocks shall include bond wires and pin-out. Each channel shall have its own separated inputs and outputs which shall not be routed through another channel/block.
Take appropriate measures to avoid dangerous failure caused by faults of the power supply including common cause failures.
The minimum distance between boundaries of different physical blocks shall be sufficient to avoid short circuit and cross talk.
The substratum shall be connected to ground independent from the IC design process used (n-well or p-well).
The detection of a fault (by diagnostic tests, proof tests) in an IC with on-chip redundancy shall result in a safe state.
The minimum diagnostic coverage of each channel shall be at least 60%.
If it is necessary to implement a watchdog, for example for program sequence monitoring and/or to guarantee the required diagnostic coverage or SFF, one channel shall not be used as a watchdog of another channel, except the use of functional diverse channels.
When testing for electromagnetic compatibility without additional safety margins the function carried out by the IC shall not be interfered with.
Avoid unsymmetrical wiring.
Beware of circuit faults leading to over-temperature.
For SIL 3 there shall be documented evidence that all application-specific environmental conditions are in accordance with that taken into account during specification, analysis, and verification, and validation shall be provided. External measures are required that can achieve or maintain a safe state of the E/E/PE system. These measures require medium effectiveness as a minimum. All measures implemented inside the IC to monitor for effects of systematic and/or common cause failures shall use these external measures to achieve or maintain a safe state.
The Standard provides a CCF (Partial Beta type) Model. Partial Beta modeling is dealt with in Section 5.2.2. A Beta of 33% is taken as the starting point. Numbers are added or subtracted from this according to features which either compromise or defend against CCF (common cause failure). It is necessary to achieve a Beta of no greater than 25%. The scoring is provided in Appendix E of the Standard and summarized in the last table at the end of this chapter.

3.12. Conformance Demonstration Template

In order to justify that the requirements have been satisfied, it is necessary to provide a documented demonstration.
The following Conformance Demonstration Template is suggested as a possible format. The authors (as do many guidance documents) counsel against SIL 4 targets. In the event of such a case, more rigorous detail from the Standard would need to be addressed.

IEC 61508 Part 2

For embedded software designs, with new hardware design, the demonstration might involve a reprint of all the tables from the Standard. The evidence for each item would then be entered in the right hand column as in the simple tables below.
However, the following tables might be considered adequate for relatively straightforward designs.
Under “Evidence” enter a reference to the project document (e.g., spec, test report, review, calculation) which satisfies that requirement. Under “Feature” take the text in conjunction with the fuller text in this chapter and/or the text in the IEC 61508 Standard. Note that a “Not Applicable” entry is acceptable if it can be justified.
The majority of the tables address “Procedures during the life cycle.” Towards the end there are tables which summarize “Techniques during the life cycle.”

General/life cycle (Paragraphs 7.1, 7.3) (Table 1)

Feature (all SILs)Evidence
Existence of a Quality and Safety Plan (see Appendix 1), including document hierarchy, roles and competency, validation plan, etc.
Description of overall novelty, complexity, reason for SIL targets, rigor needed, etc.
Clear documentation hierarchy (Quality and Safety Plan, functional spec, design docs, review strategy, integration and test plans etc)
Adequately cross-referenced documents which identify the FS requirements.
Adequate project management as per company's FSM procedure
The project plan should include adequate plans to validate the overall requirements. It should state the state tools and techniques to be used.
Feature (SIL 3)
Enhanced rigor of project management and appropriate independence

SIL, safety integrity level; FS, functional safety; FSM, functional safety management.

Specification (Para 7.2) (Table B1)

Feature (all SILs)Evidence
Clear text and some graphics, use of checklist or structured method, precise, unambiguous. Describes SR functions and separation of EUC/SRS, responses, performance requirements, well defined interfaces, modes of operation.
SIL for each SR function, high/low demand
Emc addressed
Either: Inspection of the spec, semiformal methods, checklists, CAS tool or formal method
Feature (SIL 2 and above)Evidence
Inspection/review of the specification
Feature (SIL 3)Evidence
Use of a semiformal method
Physical separation of EUC/SRS

EUC, equipment under control; SRS, safety requirement specification; CAS, computer algebra system; SR, safety related.

Design and development (Para 7.4) (Tables B2, A15–A18)

Feature (all SILs)Evidence
Use of in-house design standards and work instructions
Sector specific guidance addressed as required
Visible and adequate design documentation
Structured design in evidence
Proven components and subsystems (justified by 10 for 1 year)
Modular approach with SR elements independent of non-SR and interfaces well defined.
SR SIL = highest of mode SILs
Adequate component de-rating (in-house or other standards)
Non-SR failures independent of SRS
Safe state achieved on detection of failure
Data-communications errors addressed
No access by user to change hardware or software
Operator interfaces considered
Fault tolerant technique (minimum of a watchdog)
Appropriate emc measures
Feature (SIL 2 and above)Evidence
Checklist or walkthrough or design tools
Higher degree of fault tolerance
Appropriate emc measures as per Table A17
Feature (SIL 3)Evidence
Use of semiformal methods
Proven components and subsystems (certified or justified by 10 for 2 year)
Higher degree of fault tolerance and monitoring (e.g., memory checks)

SR, safety related; SRS, software requirement specification.

Random hardware failures and architectures (Paragraphs 7.4.4, 7.4.5)

Feature (all SILs)Evidence
SFF and architectural conformance is to be demonstrated OR alternative route (proven in use)
Random hardware failures are to be predicted and compared with the SIL or other quantified target
Random hardware failures assessment contains all the items suggested in Appendix 2 of this book. Include reliability model, CCF model, justification of choice of failure rate data, coverage of all the hazardous failure modes
Feature (SFF  90%)Evidence
SFF assessed by a documented FMEA (adequate rigor Appendix 4)
Appropriate choice of type A or type B SFF Table
Feature (SIL 3)Evidence
Fault insertion (sample) in the FMEA process

SFF, safe failure fraction; FMEA, failure mode effects analysis; CCF, common cause failure.

Integration and test (Paragraphs 7.5, 7.9) (Table B3)

Feature (all SILs)Evidence
Overall review and test strategy in Quality and Safety Plan
Test specs, logs of results and discrepancies, records of versions, acceptance criteria, tools
Evidence of remedial action
Functional test including input partitioning, boundary values, unintended functions and nonspecified input values
Feature (SIL 2 and above)Evidence
As for SIL 1
Feature (SIL 3)Evidence
Include tests of critical logic functions at operational boundaries
Standardized procedures

Operations and maintenance (Paragraph 7.6) (Table B4)

Feature (all SILs)Evidence
Safety manual in place - if applicable
Component wear out life accounted for by preventive replacement
Proof tests specified
Procedures validated by Ops and Mtce staff
Commissioning successful
Failures (and Actual Demands) reporting Procedures in place
Start-up, shutdown and fault scenarios covered
User friendly interfaces
Lockable switch or password access
Operator i/ps to be acknowledged
Basic training specified
Feature (SIL 2 and above)Evidence
Protect against operator errors OR specify operator skill
Feature (SIL 3)Evidence
Protect against operator errors AND specify operator skill
At least annual training

Validation (Paragraph 7.7) (Table B5)

Feature (all SILs)Evidence
Validation plan actually implemented. To include:
Function test
Environmental test
Fault insertion
Calibration of equipment
Records and close out report
Discrepancies positively handled
Functional tests
Environmental tests
Interference tests
Fault insertion
Feature (SIL 2 and above)Evidence
Check all SR functions OK in presence of faulty operating conditions
Feature (SIL 3)Evidence
Fault insertion at unit level
Some static or dynamic analysis or simulation

Modifications (Paragraph 7.8)

Feature (all SILs)Evidence
Change control with adequate competence
Impact analysis carried out
Re-verify changed modules
Feature (SIL 2 and above)Evidence
Re-verify affected modules

Acquired subsystems

Feature (at the appropriate SIL)Evidence
SIL requirements reflected onto suppliers
Compliance demonstrated

Proven in use (Paragraph 7.10)

Feature (at the appropriate SIL)Evidence
Application appropriate and restricted functionality
Any differences to application addressed and conformance demonstrated
Statistical data available at 70% confidence to verify random hardware failures target
Failure data validated

Techniques (ASICs & ICs) (Annexe F) (Summary): In general, the following summary can be assumed to apply for all SILs. The Standard provides some graduation in the degrees of effectiveness.

Design phaseTechnique/measureEvidence
Design entryStructured description in (V)HDL with proven simulators
Functional test on module and top level
Restricted use of asynchronous constructs
Synchronization of primary inputs and control of metastability
Coding guidelines with defensive programming
Modularization
Design for testability
Use of Boolean if programmable ICs
SynthesisSimulation of the gate netlist, to check timing constraints or
Static analysis of the propagation delay (STA)
Internal consistency checks
Verification of the gate netlist
Application of proven in use synthesis tools and libraries
Test insertion and test pattern generationImplementation of test structures and estimation of the test coverage by simulation (ATPG tool)
Simulation of the gate netlist, to check timing constraints or verification against reference model
Placement, routing, layout generationProven in use or validated hard cores with online testing
Simulation or verification of the gate netlist, to check timing constraints or static analysis of the propagation delay (STA)
Chip productionProven in use process technology with QA

image

 Very high speed integrated circuit hardware description.

Assessment of CCF (CPUs) (Annex E) see 3.11

Technique/measure decreasing ββ-factor (%)
Diverse measures or functions in different channels4–6
Testing for emc with additional safety margin5
Providing each block with its own power supply pins6
Isolate and decouple physical locations2–4
Ground pin between pin-out of different blocks2
High diagnostic coverage (≥99%) of each channel7–9
Technique/measure increasing ββ-factor (%)
Watchdog on-chip used as monitoring element5
Monitoring elements on-chip other than watchdog, for example clock5–10
Internal connections between blocks by wiring between output and input cells of different blocks without cross-over2
Internal connections between blocks by wiring between output and input cells of different blocks with cross-over4

Beta = 33% plus items increasing beta and minus items decreasing beta. The target beta must be set at less than 25%.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset