Chapter 6. Delay Testing

Duncan M. (Hank) WalkerTexas A&M University, College Station, Texas

Michael S. HsiaoVirginia Tech, Blacksburg, Virginia

About This Chapter

Delay testing is used to verify that a circuit meets its timing specifications. This chapter introduces delay testing of digital logic in synchronous circuits and then focuses on delay tests that target the circuit structure. Approaches for applying delay tests in scan designs are first discussed, including issues in clocking. Delay fault models are then described along with their sensitization criteria. The chapter includes a discussion of recent research in defect-based delay fault models.

The second half of the chapter discusses delay fault simulation and automatic test pattern generation. Delay fault simulation is used to grade the coverage of functional test patterns, as well as to drop faults that are fortuitously detected while targeting other faults.

Through this chapter, the reader will learn about the major delay fault modeling, simulation, and test generation techniques and their application to modern digital circuits. This background will be valuable for selecting the delay test methodology that best meets the design needs.

Introduction

Timing is one of the specifications that must be verified during digital circuit testing. This type of test is referred to as a delay test. Sometimes delay testing is referred to as AC testing to distinguish it from the DC test conditions of stuck-at faults. Delay testing is becoming increasingly important, as timing margins are reduced to maximize performance while minimizing power dissipation.

There are three basic approaches to delay testing: random, functional, and structural. Random patterns applied at the rated clock speed will detect most delay defects that cause circuit failure. The advantages of random pattern testing are that test generation is not required; the patterns can be generated on-chip with built-in self-test (BIST) hardware (as discussed in Chapter 2); the patterns are applied one after the other, as in normal operation; and the same hardware targets all types of defects, not just delay defects. The disadvantages of random pattern testing are that it provides poor coverage of the longest circuit paths and may produce circuit activity much higher than normal. This higher activity generates higher-than-normal chip temperatures and power supply noise, causing the circuit to operate slower than normal. This may cause chips to be incorrectly rejected.

Functional tests applied at the rated clock speed provide the most accurate delay testing, because the chip is tested in the same way it is used in normal operation. Instruction set processors such as microprocessors, digital signal processors, and microcontrollers can be tested at low cost by loading instruction sequences into on-chip cache or memory and executing them. The primary disadvantages of functional testing are that it is difficult to write high-coverage test sequences, and designs without large, on-chip instruction memories require an expensive, full-featured tester to apply the functional patterns.

Structural testing uses knowledge of the circuit structure and the corresponding delay fault models to generate delay test patterns. The advantages of structural testing are that the test pattern generation process is automated, and it can achieve high fault coverage. It also makes it considerably easier to diagnose delay faults. The main disadvantages are that simplifying assumptions must be made in the delay fault model to make the problem tractable, the design must include design for testability (DFT) features to achieve high coverage, and the delay test application is very different from normal operation. A structurally testable path may not be functionally feasible. Therefore, care must be taken to avoid over-testing the false long paths.

This chapter focuses on structural delay testing of synchronous digital circuits. We consider circuits represented by gate-level primitives, including storage elements, such as flip-flops and latches. The circuit may also contain black boxes — that is, undefined modules. These most commonly represent embedded memory arrays or analog blocks.

The Huffman model of a synchronous sequential circuit is shown in Figure 6.1. The circuit consists of combinational logic and flip-flops synchronized by a common clock. The inputs to the combinational logic consist of primary inputs (PI), x1, x2, ... , xn, and the flip-flop outputs y1, y2, ... , yl (also called pseudo primary inputs [PPI]). Its outputs consist of primary outputs (PO), z1 z2 zm, and the flip-flop inputs Y1, Y2, ... , Yl (also called the pseudo primary outputs [PPO]).

Huffman model of sequential digital circuits.

Figure 6.1. Huffman model of sequential digital circuits.

Some delay fault models require knowledge of circuit delays in order to test them. Most delay test approaches use no more than the rising and falling delays from each gate input to output, with the interconnect delay lumped into the gate delay. This can be split into the gate transport delay and interconnect propagation delay to compute the delay to each fanout separately. To analyze glitch behavior during testing, the inertial delay must also be considered. The inertial delay defines the minimum pulse width that propagates through a gate. Process variation can be modeled using the min-max delay model, but the correlation between gate delays is usually not available.

Delay testing requires that transitions be launched into the circuit from the PIs and PPIs, and the outputs captured at the POs and PPOs at the specified time. The outputs are compared to the correct results to detect any delay faults. Launching transitions into the circuit requires a two-pattern test. The first vector initializes the circuit and is termed the initialization vector. The second vector launches the transitions and is termed the test vector.

Delay Test Application

In this section, we discuss how delay tests are applied to synchronous sequential circuits. As with testing for stuck-at faults, automatic generation of delay tests for large sequential circuits is impractical. DFT circuitry must be inserted to enhance the controllability and observability of the circuit. As discussed in Chapter 2, this is primarily done by connecting most latches and flip-flops together into scan chains. These scan chains provide direct access to the PPIs and PPOs. They are used to initialize the circuit for the delay test and to read out the results. The difference from stuck-at testing is that delay testing requires a two-pattern test. This presents a problem, because in normal operation, the flip-flops or latches only store one value. There are two common approaches to applying the two patterns: enhanced scan and muxed-D scan.

Enhanced Scan

In enhanced scan, each storage element contains the corresponding bits of both the initialization and test vector. This can be implemented in several ways. In a flip-flop design, the flip-flops can be connected in a scan chain, and their outputs connected to holding latches. After the initialization vector has been scanned into the flip-flops, the vector is transferred to the holding latches, initializing the circuit under test. The test vector is then scanned in and transferred to the holding latches to apply the test to the circuit. This approach has the advantage that the initialization and test vectors are independent of one another. A second advantage is that the circuit PPIs are fixed while the test vector is scanned in, drastically reducing test power. Test power is discussed in Chapter 7. Because the initialization and test vectors are independent, a third advantage of enhanced scan is increased delay fault coverage. The disadvantage of enhanced scan is the extra area, power, and delay associated with the holding latch. The area and delay overhead of the holding latch preclude the use of enhanced scan in most designs.

Muxed-D Scan

Most designs implement scan by using muxed-D flip-flops, as discussed in Chapter 2. That is, the D (data) input of a D-type flip-flop is fed by a 2:1 multiplexer, with one multiplexer input fed by the circuit logic (the PPO) and the other fed by the previous flip-flop in the scan chain. The scan enable (SE) signal controls the multiplexer. The advantage of the muxed-D scan design is that the system clock is used to clock the flip-flops in both normal and test modes, and the design has low area and delay overhead. The disadvantage of the muxed-D design approach is that PPIs change with each shift of the scan chain, resulting in high power dissipation during scan-in or scan-out. This is discussed further in Chapter 7.

Scan Clocking

Delay test pattern application in scan designs uses a variable clock approach. The test pattern is slowly shifted into the scan chains, fast system clock cycles are used to apply the test, and then slow shift cycles are used to shift out the result (and shift in the next test pattern). This approach makes each test pattern independent. The slow shift speed avoids critical timing along the serial data path of the scan chain and reduces the power dissipated during scan shifting.

For enhanced scan, the initialization vector is scanned in and then loaded into the hold latches. The test vector is then scanned in and launched when the hold latches are loaded, and then it is captured with the system clock. The disadvantage of this approach is that the hold latch enable must be able to operate at system speed, so that the latch enable to system clock timing is accurate. In contrast, scan testing using muxed-D flip-flops uses the system clock to launch and capture patterns, so there are no timing correlation issues related to the clock tree.

The muxed-D scan design has only one bit stored in each PPI. Because the flip-flop input is only connected to the PPOs of the circuit under test and the neighboring scan cell, this constrains the relationship between the initialization and test vectors. The initialization vector is scanned in, so it does not have any constraints. The test vector, however, can only be derived in two ways. The first is to have the test vector be a 1-bit shift from the initialization vector. This is generated by shifting the scan chain by 1 bit. This 1-bit shift to launch the transition is known as launch-on-shift (LOS), launch-off-shift, or skewed load [Patel 1992] [Savir 1992]. The procedure for applying a LOS test is as follows:

  1. The circuit is set to scan mode. The initialization vector is scanned in, and values are set on PIs.

  2. The test vector is obtained by shifting the scan chain by 1 bit (applying one system clock cycle). Usually the PIs do not change values because of the timing constraints of low-cost testers.

  3. The circuit is set to normal operation by flipping the scan enable and pulsing the system clock at the rated speed to capture the circuit PPO values in the flip-flops. The values on POs are captured if necessary.

  4. The circuit is set to scan mode and the results are shifted out. This step is overlapped with step 1.

The advantage of LOS is that a combinational ATPG can enforce the pattern constraint with only minor modifications; fast test generation methodologies for combinational circuits can be applied without many modifications; and the constraint can be met for most faults, minimizing test patterns and test generation time. The primary disadvantage of LOS is that in a muxed-D scan implementation, after the shift has occurred to launch the transition, the scan enable must be flipped at the rated system clock speed, so that the PPOs can capture the test result on the next system clock cycle. This requires designing the scan enable to have the same timing performance as the system clock. This requires too much power, area, and design effort for most designs. An alternative is to have separate scan enables for the launch and capture flip-flops, so that the launch SE can remain in scan mode while the capture SE is already in capture mode. Multiple scan enables are sometimes used to reduce test power dissipation (see Chapter 7). This places constraints on the scan design, because the inputs and outputs of a combinational logic block must be controlled by separate scan enables. Level sensitive scan design (LSSD) based scan can use LOS because it uses separate clocks for the L1 and L2 latches [Savir 1992]. Another disadvantage of LOS is that some false long paths may be exercised. Because the test vector is the shifted version of the initialization vector, such a state transition may not be possible in the functional mode, thereby allowing nonfunctional behavior.

If LOS cannot be used, the alternative is to derive the test vector from the PPOs feeding the flip-flops. The PPO values in turn are generated by the initialization vector. This is known as launch-on-capture (LOC), launch-off-capture, functional justification, double-capture [Wang 2006a], or broad-side test [Savir 1994]. The procedure for applying a LOC test is as follows:

  1. Same as LOS step 1.

  2. The scan enable is set to normal operation. Dummy cycles are inserted as needed to give the SE time to flip. Figure 6.2a shows the clock waveform with one dummy cycle. Figure 6.2b shows the clock waveform if no dummy cycle is needed.

    Launch-on-capture clock waveforms.

    Figure 6.2. Launch-on-capture clock waveforms.

  3. The system clock is pulsed twice at the rated clock speed. At the first clock pulse, the test vector is derived from the initialization vector. At the second clock pulse, the results are captured. The values on POs are captured if necessary.

  4. Same as LOS step 4.

The advantages of the LOC approach are that the SE has no timing constraint, and the test vector is a legal state of the circuit, assuming the initialization vector is a legal state. This reduces the testing of false paths, which can lead to yield loss. This is discussed further in Section 6.7. The primary disadvantage of LOC is the need to justify the test vector back one time frame to determine the initialization vector, increasing test generation time, and pattern count. This constraint also makes some faults untestable. This was listed earlier as an advantage, but a sequentially redundant fault may present a reliability hazard. If such redundant faults occur in online error-checking hardware, they must be tested to verify the correct function of this hardware.

Faster-Than-At-Speed Testing

The preceding discussion assumes that the timing between the launch and capture of a test pattern uses the rated system clock period. The drawback is that some test generation methods, particularly for transition faults, typically use a breadth-first search algorithm that tends to select short paths through each fault site. As a result, when tested at rated speed, many paths have considerable timing slack, and so only relatively large delay defects can be detected. One solution is to generate one or more longest paths through each fault site [Majhi 2000] [Sharma 2002] [Qiu 2003a], as discussed in Section 6.6. However, the increased path length increases test data volume [Qiu 2004a]. Rather than maximizing the length of the tested paths, the alternative is to shrink the capture clock timing to minimize the slack for each pattern [Mao 1990]. However, separate timing for each pattern drastically increases test data volume. An alternative is to group patterns into sets of almost equal-length paths [Kruseman 2004]. The test engineer must trade off between the number of groups and test data volume. Because the chip is being tested at faster than its rated speed, logic transitions may still be occurring when the capture clock is applied. Those flip-flops fed by paths that exceed the cycle time or containing hazards must be masked off. The faults that are not detected because of masking are targeted by patterns run at the next lower clock speed [Barnhart 2004]. Applying transition fault patterns at faster than the rated speed has been shown to catch small delay defects that escape traditional transition fault tests [Kruseman 2004] [Amodeo 2005].

Faster-than-at-speed testing can be applied whenever test patterns propagate on paths with timing slack. There are two primary concerns when using this test approach. The first is the accuracy of the timing models. If paths are slower than predicted, a faster-than-at-speed test can reject chips that pass at the rated speed. Timing margin must be added statically or dynamically, reducing the effectiveness at screening small delay defects. The second problem is the additional power supply noise introduced by operating at faster than rated speed. The power grid has less time to recover from the launch, so the circuit may operate more slowly than it would if clocked at the rated speed.

Delay Fault Models

A delay defect is a defect that causes an extra delay in the circuit. An example is a spot defect that causes a resistive short or open. To make the delay testing problem tractable, delay defect behavior must be abstracted to a delay fault. A large number of delay fault models have been proposed. In the following sections, we discuss some popular delay fault models.

Transition Fault Model

The most commonly used delay fault model is the transition fault (TF) model [Barzilai 1983]. In this model, a gate input or output is assumed to have a slow-to-rise (STR) or slow-to-fall (STF) delay fault. For every stuck-at fault in a circuit, there is a corresponding transition fault (TF). The delay of the TF is assumed large enough that any path through it will be detected as slow—that is, the delay increase because of the TF must exceed the slack of the shortest path through that line.

The primary advantage of the TF model is that test generation does not need to consider circuit timing. A stuck-at fault test generator can be modified to meet the additional requirements for generating TF tests [Waicukauski 1987]. Because a stuck-at fault can be considered a very slow TF, a TF test set will detect all the corresponding stuck-at faults. The TF model has more constraints than the stuck-at fault model, so the TF coverage is normally lower than stuck-at fault coverage. Top-off vectors can be generated to test the stuck-at faults not detected by the TF test set.

The primary disadvantage of the TF model is that its resolution is limited by the difference in delay between the longest and shortest path through the fault site. Stuck-at and TF test generators normally select the easiest path to test, which is the shortest path, because this has the fewest necessary assignments. This problem can be avoided by propagating along low-slack paths [Lin 2006a]. Another major problem with the TF model is that TF tests often only propagate a glitch from the fault site, which may not be detected unless the delay fault is large [Lin 2005a].

Inline-Delay Fault Model

Production experience shows that many delay defects are due to resistive interconnect vias, which cause both the rising and falling transitions through that line to be slow [Benware 2003]. This can be modeled by the inline-delay fault. This fault is analogous to the TF, except that only one of the STR or STF faults must be detected at each fault site. A test set for inline-delay faults is smaller than a test set for TFs.

Even though resistive vias are more common in newer technologies, misshapen transistors, resistive contacts, and resistive shorts can still cause STR or STF faults without the opposite slow transition. Therefore, high delay fault coverage still requires a TF test set. If a technology contains many inline-delay faults, test generation can first be done for them, and then top-off TF patterns can be generated. If testing stops on the first failure, this combination of patterns will reduce average test time without loss of TF coverage.

Gate-Delay Fault Model

A spot defect that causes a small delay fault for a particular transition on a gate input or output can be modeled as a gate-delay fault, also termed a local delay fault [Carter 1987]. Here “small” is a delay that is larger than the minimum slack but smaller than the maximum slack for that line. Testing such faults requires testing a long or longest path with the appropriate transition through the fault site. The quality of a test set is defined by how close the minimum detected delay fault sizes are to the minimum detectable fault sizes [Iyengar 1988].

The line delay fault model [Majhi 2000] extends the gate-delay fault model to test a rising or falling delay fault on each line in the circuit. Detecting the smallest delay defect that can cause failure requires propagating along the longest sensitizable path through that line.

Path-Delay Fault Model

The path-delay fault model [Smith 1985] models the distributed delay on a path. A path is a sequence of gates from PIs to POs, with a transition on each gate output along the path. The gate input on the path is referred to as the on-input, or on-path input, whereas the other inputs are side inputs or off-inputs. If the circuit contains a path that is slow for a rising or falling transition, then it contains a path-delay fault. Unless explicitly mentioned, we refer to the transition direction at the input to the path. The path-delay fault model assumes that any path can have any delay, so fault coverage is defined as the fraction of all paths (or all sensitizable paths) tested. The path-delay fault model is more general than the fault models discussed above, because a path-delay fault test will detect the corresponding transition, inline, or gate-delay faults, as well as distributed delay resulting from process variation (global delay faults) or supply noise.

The primary drawback of the path-delay fault model is that the number of paths can be exponential in the size of the circuit, so computing and achieving high coverage is difficult. For example, ISCAS-1985 benchmark circuit c6288, a 16-bit multiplier, has close to 1020 paths. Almost all of the long paths in this circuit are false paths [Qiu 2003c]. Coverage estimates can be improved by eliminating untestable paths [Cheng 1993] [Lam 1993], but these methods are expensive. Path-delay testing is primarily used to test a set of longest (or critical) paths provided by static timing analysis.

A fault model intermediate between gate and path delay is the segment delay fault model. It assumes that a segment along a path is slow [Heragu 1996]. Because the segment length is bounded, the number of segment faults is linear in the circuit size. The propagation delay fault model combines the transition fault and path-delay fault models [Lin 2005a]. It assumes that the sum of the local delay and the distributed delay of the fault propagation path causes a failure, with at least one robust propagation path.

Defect-Based Delay Fault Models

The models described here do not adequately describe the delay defects that are more common in advanced technologies. As a result, such models cannot achieve the desired product quality. In practice, delay faults can be a combination of local and global disturbances. Delay faults caused by local disturbances are termed local delay faults, and those caused by global disturbances are termed global delay faults [Luong 1996] or distributed path-delay faults [Sivaraman 1996]. Resistive shorts and opens and capacitive crosstalk cause local delay faults, whereas power supply noise, intradie temperature distributions, and process variation cause global delay faults. The combined delay fault (CDF) model was developed to incorporate all of these effects [Qiu 2003b, 2004b]. Resistive short, open, and capacitive crosstalk locations can be extracted from the layout design, and are roughly linear in circuit size [Stanojevic 2001]. The circuit-level delay behavior of shorts, opens, and crosstalk can be abstracted to rules for test generation [Chen 1999] [Krstic 2001] [Li 2003]. Linear approximations [Lu 2005] can be used for nonlinear process correlation effects [Luong 1996]. A Monte Carlo approach can identify potentially longest paths under process variation [Liou 2002a] [Krstic 2003]. The statistical delay quality model incorporates process variation and defect characteristics into the coverage model [Sato 2005].

Rather than directly targeting a delay fault model, defect-based delay tests can attempt to maximize observations of each fault site. For example, the N-detect stuck-at test [Ma 1995] can be extended to target transition faults [Pomeranz 1999]. A related approach is to generate transition tests to each reachable output of a transition fault site, that is, transition fault propagation to all reachable outputs (TARO). The TARO test was shown to achieve high defect coverage [Tseng 2001] [McCluskey 2004] [Park 2005]. Similarly, the DOREME [Dworak 2004] test approach that increases the excitation and observation of fault sites increases the detection of delay defects.

Technology scaling has led to increased power supply noise, which can cause the chip to incorrectly fail at rated speed during testing [Saxena 2003]. Test generation must take power supply noise into account [Krstic 1999] [Liou 2003]. The primary challenge is accurate supply noise (delay) estimation at low cost [Wang 2006b]. An alternative is to use on-chip test structures to calibrate the supply noise [Rearick 2005]. See Section 7.3 for a further discussion of noise during testing.

The CDF coverage metric [Qiu 2004b] was developed to accurately reflect test quality, considering realistic delay defects. In the CDF coverage metric, the correlation in path delays is used to substantially reduce the pessimism of the path-delay fault model while avoiding the optimism of the transition and gate-delay fault models. The CDF coverage metric exploits structural and spatial correlation between paths [Luong 1996] [Lu 2005]. Two paths have structural correlation when they share a common path segment. For example, in Figure 6.3, path a-d-e and b-d-e are structurally correlated because they share segment d-e. Two paths can also have spatial correlation because the path delays are functions of the manufacturing process parameters, which are spatially correlated. For two paths that are physically close to each other, the delay correlation is high because the paths have similar process parameters.

Example of structurally correlated paths.

Figure 6.3. Example of structurally correlated paths.

Figure 6.4 shows the delay space [Luong 1996] for two paths, assuming the path delay is a one-dimensional function of process parameters. The delay space is the bounded region in which the probable delay value combinations are represented. It is assumed that each path has min-max delays. If the two paths have no correlation, the delay value combination can be anywhere within the rectangle. If they are perfectly correlated, the delay space shrinks to a line, which means if path 1 has max (min) delay under a combination of certain process parameters, path 2 also reaches its max (min) delay under the same combination of process parameters. In reality, the correlation is somewhere in between, and the realistic delay space is the shaded area. Using correlation information, the delays on the untested paths can be predicted by the delays on the tested paths [Brockman 1989], reducing the paths that must be tested.

Delay spaces for different path correlations.

Figure 6.4. Delay spaces for different path correlations.

In terms of probability, the coverage for test set t is as follows:

Equation 6.1. 

Under the CDF model, a single local delay fault on a line is assumed, termed the fault site. Any path through the site is subject to process variation. A delay fault may be caused by a local delay fault, a global delay fault, or a combination of the two.

CDF detection is probabilistic, instead of deterministic. For example, suppose there are two paths, P1 and P2, through a fault site, and the local extra delay is not large enough for either path to be definitely slow. Figure 6.5 shows the delay space [Sivaraman 1996] for this fault. tmax is the maximum specified delay of the circuit. The circuit has some probability that path 1 or 2 is slow. If test set t1 tests path 1 only and test set t2 tests path 2 only, then both t1 and t2 are required to guarantee detection of the fault.

Delay space of a fault.

Figure 6.5. Delay space of a fault.

Using this probability model, Formula (6.1) can be translated into Formula (6.2) to compute the detection probability (DP) for fault site i with local extra delay Δ (the size of the local delay fault):

Equation 6.2. 

In the example shown in Figure 6.5, if test set t tests path 1 only, the DP is area(B∪C)/area(A∪B∪C); if t tests path 2 only, the DP is area(B∪A)/area(A∪B∪C); and if t tests both paths, the DP is 100%.

The preceding analysis is for a fixed local extra delay Δ. For fault site i with an arbitrary Δ, the DP for site i is computed as:

Equation 6.3. 

where Δ0, i is the minimum slack at the fault site and pi(Δ) is the probability density function (PDF) of Δ at fault site i, based on the PDF of resistive shorts and opens. The overall fault coverage for test set t is:

Equation 6.4. 

where wi is the weight for fault site ii wi = 1). Fault sites might be weighted based on the probability of a delay fault occurrence, such as the critical area.

Based on Formula (6.2), if the path delays are correlated, the DP computation is dependent on the delay space. For example, in Figure 6.5, the areas of A, B, and C change if the delay space changes. Because accurate spatial delay correlations are usually not known, coverage can be computed assuming no correlation and 100% correlation. This provides lower and upper bounds on the coverage.

The CDF coverage metric (Formulae 6.26.4) is inexpensive, because only a small subset of paths must be considered. For example, Figure 6.6 shows the delays of four paths through a fault site, each having a delay distribution because of process variation. Suppose path P1 is tested by t, and the longest testable path P0 is not tested. When Δ0 < Δ < Δ1, DPi (t) is 0; when Δ > Δ2, DPi (t) is 100%, because the tested path P1 is definitely slow; when Δ1 < Δ < Δ2, DPi (t) increases from 0 to 100% as Δ increases. Thus, the fault coverage computation is required only in this interval. The main cost is computing the fault efficiency, which is the number of tested faults over the number of testable faults. This requires checking the sensitization for all the paths whose length is within the interval where DPi (t) rises from 0 to 100%. A lower bound on coverage can be quickly computed by assuming all structural paths are testable. Experiments on ISCAS-1985 circuits show the error of this approximation is <4% [Qiu 2004b].

Fault coverage computation.

Figure 6.6. Fault coverage computation.

The CDF fault coverage metric suggests a test strategy:

  1. Apply transition fault tests to detect large local delay faults. In practice, most local delay faults are large.

  2. Test one of the longest paths through each gate or line at the rated clock speed to eliminate or reduce the 0-DP area between Δ0 and Δ1 in Figure 6.6, because this is the second largest source of coverage loss.

  3. Test more potentially longest paths (such as P2 in Figure 6.6, if P0 does not exist) to increase the DP between Δ1 and Δ2.

Figure 6.7 shows the conceptual relationship between fault coverage and the percentage of tested paths, sorted in decreasing order of length. If there is no local delay fault, the fault coverage increases quickly and reaches 100% after all potentially critical paths are tested. The curves for the CDF coverage metric have “jumps,” where the first path through a fault site is tested. The traditional path-delay fault coverage assumes the percentage of testable paths tested.

Fault coverage versus percentage of tested paths.

Figure 6.7. Fault coverage versus percentage of tested paths.

Simulation experiments with the CDF coverage metric on ISCAS-1985 benchmark circuits [Qiu 2004b] show that a TF test set achieves reasonably high coverage (98.96%), but lower than what can be achieved by a K longest paths per gate (KLPG) test set (see Section 6.6.4), or a combination of TF and a set of critical paths. Figure 6.8 shows the fault efficiency for c7552, assuming the number of longest rising and falling paths per line increases from 1 to 5. As the figure shows, only a small number of longest paths are needed through each fault site to achieve high fault efficiency. In addition, the benefit of testing one longest rising and one longest falling path (the fault efficiency increase from the transition fault test to the K = 1 test) is more significant than testing more long paths (increasing K from 1 to 5).

Fault efficiency for c7552, assuming the K longest rising and falling paths through each gate are tested (K = 1 to K = 5). UB is the upper bound (no intradie process variation), and LB is the lower bound (random intradie process variation).

Figure 6.8. Fault efficiency for c7552, assuming the K longest rising and falling paths through each gate are tested (K = 1 to K = 5). UB is the upper bound (no intradie process variation), and LB is the lower bound (random intradie process variation).

Delay Test Sensitization

A path is said to be testable if a rising/falling transition can propagate from the primary input to the primary output associated with the path, under certain sensitization criteria [Lin 1987] [McGeer 1989] [Benkoski 1990] [Chang 1993]. If a path is not testable, it is called an untestable or false path [Liou 2002a, 2002b]. For example, in Figure 6.9, path a-c-d is a false path under the single-path sensitization criterion, because to propagate a transition through the AND gate requires line b to be logic 1 and to propagate the transition through the OR gate requires line b to be logic 0. We use the terms “untestable” and “false” interchangeably.

A circuit with a false path a-c-d.

Figure 6.9. A circuit with a false path a-c-d.

Tests that target paths, including gate-delay, line-delay, path-delay, and KLPG tests, have differing quality, depending on the path sensitization criteria that they meet. These tests are usually classified as robust or nonrobust [Smith 1985] [Lin 1987] [Bushnell 2000] or functionally sensitizable [Cheng 1996a]. Robust and nonrobust tests propagate a transition along the target path, while functionally sensitizable tests propagate on multiple paths. An overview of the classification schemes in the literature can be found in [Krstic 1998] [Jha 2003].

A robust test is one that will detect the delay fault independent of the delays in the rest of the circuit. Detection by a nonrobust test depends on circuit delays. A test along a functionally sensitizable path requires increased delay on several paths for detection of the delay fault. Consider, for example, a path with a rising transition into an OR gate. If the side input of the OR gate is a stable 0, then the transition will pass through the OR gate, independent of the delays on the path. This is the robust test condition. If the side input has a falling transition that arrives before the on-path transition, then the OR gate will glitch low (have a transient pulse), with a wider glitch for increasing delay at the on-input to the OR gate. If the side input transition is late, the glitch will shrink or disappear. This is the nonrobust condition. Reconvergence from multiple paths of different lengths can cause multiple glitches or hazards. In both the robust and nonrobust cases, the path is statically sensitized, because the side inputs along the path have noncontrolling values in the test vector. In functionally sensitizable paths, the test vector has at least one controlling value on a side input along the path. Detection of a delay fault on such paths requires propagation of the delay fault on multiple paths. For example, if the side input of the OR gate has a rising transition, then both the on-input and side input must be slow for the rising transition on the output to be slow.

One challenge in determining path sensitization is the presence of uncontrollable unknown (X) values in the circuit. These are produced by uninitialized flip-flops or “black boxes,” most commonly embedded memory arrays. For example, suppose a 2:1 multiplexer has an X on its control line and transitions on both of its inputs. Then the path that propagates through the multiplexer is unknown. All that is known is that a path at least as long as the shorter of the two paths through the multiplexer will propagate.

Delay Fault Simulation

Delay fault simulation is used to fault grade existing test sets, such as functional tests, and to drop faults that are fortuitously detected during ATPG.

Transition Fault Simulation

Transition fault simulation is similar to stuck-at fault simulation, except for the additional criterion that the fault is sensitized by the appropriate transition [Waicukauski 1987]. Good machine simulation determines the transitions at each fault site. For example, a rising transition sensitizes an STR fault. Each sensitized fault is then propagated to see if the fault effect reaches a PO or PPO.

Gate/Line Delay Fault Simulation

Gate or line delay fault simulation must account for circuit delays, in order to determine the smallest detected delay fault [Carter 1987] [Pramanick 1988]. The method in [Carter 1987] assumes that an earliest arrival time and latest stabilization time is known for each signal waveform when transitioning from its initial value to its final value. The signal has an unknown value during this time interval. First good machine simulation is performed to compute these values for all signals. Propagating the earliest arrival times and latest stabilization times is based on the controlling values at each gate. For example, the latest stabilization time for a signal transitioning from 0 to 1 on an AND gate input determines the latest stabilization time for the output to transition from 0 to 1. For each pair of input vectors, fault simulation consists of injecting an extra delay at each fault site, determining if this extra delay propagates in terms of new arrival and stabilization times, and then whether it is detected at the POs and PPOs at the observation time. This process is repeated for all fault sites and patterns. Fault detection may be uncertain, if the observation time is between the earliest arrival and latest stabilization times.

Path-Delay Fault Simulation

A large number of path-delay fault simulators have been developed, for example [Smith 1985] [Schulz 1989] [Fink 1992] [Kagaris 1997]. These can be classified into enumerative and nonenumerative methods. Enumerative methods use a list of paths and compare the paths sensitized by a given vector pair to the list. The value algebra used for determining signal propagation must be sufficient to handle the sensitization criteria. For example, [Smith 1985] used a six-valued algebra to handle hazards when computing robust test coverage. Because the number of paths in most circuits is very large, nonenumerative methods were developed to compute fault coverage without an explicit list of paths. Such techniques allow for a compact graph representation of all path-delay faults in the circuit, called the path status graph (PSG). The PSG is modified dynamically during test generation or fault simulation to account for the set of faults that remain to be targeted.

Defect-Based Delay Fault Model Simulation

CodSim is a CDF fault simulator for combinational circuits [Qiu 2003b]. It incorporates resistive shorts and opens, capacitive coupling, and process variation, and considers robust and nonrobust propagation on paths through each fault site. CodSim computes the detection probability for each fault site, for test set t. If the DP is above the specified threshold, the fault site is dropped. The DPs for all fault sites are then used to compute the overall fault coverage. The three phases of the fault simulation algorithm are as follows:

  1. For each test pattern, run good machine timing simulation and identify the robust/nonrobust propagation paths from each line to primary outputs.

  2. Check the validation of the nonrobustly sensitized paths through a line by introducing a local delay fault at that line and running fault simulation.

  3. Run fault simulation considering capacitive coupling to the selected long paths for each fault site.

After good machine simulation, the initial and final logic values and the nominal transition time of the last event for each line are known, as shown in Figure 6.10. The numbers next to the transition symbols indicate the transition time, assuming a unit gate delay model. S1 or S0 indicates a stable 1 or 0.

Robust propagation path identification.

Figure 6.10. Robust propagation path identification.

In phase 2, the robust/nonrobust propagation paths through each line are identified. A line’s robust propagation paths can be computed using its immediate fanout lines’ robust propagation paths. In Figure 6.10, suppose line d has a robust propagation path P1 with length 6, and line e has path P2 with length 7. The robust propagation paths for line b are computed by checking the final logic values on the side inputs of gate G1 and G2. Then two paths are identified: b-P1 with length 7 and b-P2 with length 8. Because the propagation paths are robust, the extra delay Δ on a line must be detected if ttrans + Δ + lprop >tmax, where ttrans and lprop are the transition time and propagation path length associated with that line, respectively. Because ttrans and lprop are statistical values, the computed Δ is also statistical. For resistive shorts, the opposite logic value on the other shorted line must be checked.

The nonrobust propagation paths can be identified in a similar way. The difference is that if there is no transition on a line, the nonrobust propagation paths from that line must also be computed, such as line g in Figure 6.11. The reason is that a local delay fault on line d may generate a glitch on g, and the computation of the nonrobust propagation paths from d uses g’s propagation paths. The time complexity of phase 1 is O(V · C), for V vectors and C lines in the circuit.

Nonrobust propagation path identification.

Figure 6.11. Nonrobust propagation path identification.

Fault detection through a nonrobust propagation path is dependent on the delays on the side inputs to the path. In Figure 6.11, an extra delay on line b does not affect the transition time on line h, even though line b has a nonrobust propagation path. Therefore, the validation of these paths must be checked (phase 2). After phase 1, each line has a few nonrobust propagation paths. The validation check can be performed by introducing an extra delay Δ on the line, where Δ = tmaxttranslprop, and running fault simulation for the test pattern which sensitizes this path to check if the slow signal can be detected at any PO or PPO. This procedure starts with the smallest Δ. If a small Δ can be detected, the validation check for the paths that can only detect large Δ is not necessary. Experiments show that normally only a few paths must be checked for each line.

Checking only robust and nonrobust paths may miss some functional sensitizable paths [Cheng 1996b]. However, because these paths always appear in pairs and the shorter path determines the delay, in most cases they do not contribute to the fault coverage. Thus, these paths are not checked unless there is no long robust or nonrobust propagation path through the fault site.

In phase 3, the extra delay due to coupling is computed. Similar to phase 2, phase 3 introduces an extra delay Δ at the fault site, where Δ can cause one propagation path (either robust or nonrobust) to be slow, and runs the fault simulation considering coupling for the vector sensitizing the path. The delay Δ is introduced because the coupling alignment is dependent on Δ. Because of the interdependence of aggressor and victim signal timing, iterative simulation is used. The phase 3 analysis is only performed for long paths.

Simulation experiments on ISCAS-1985 circuits with resistive opens show that for 10,000 random patterns applied at rated speed, the CDF and transition fault coverages are within 1%, as each fault site has a high probability of having many long paths tested. The fault efficiency of a KLPG-5 test (see Section 6.6.4) is 99.9%+. Simulation of resistive shorts shows fault coverage of approximately 90%. The reduced coverage is because resistive shorts on fault sites with large slack cannot be detected until the short resistance is so low that it nearly causes a stuck-at fault [Li 2003]. Simulation of KLPG-5 tests shows much higher resistive short coverage, because the longest paths are tested.

Delay Fault Test Generation

Transition/Inline Fault ATPG

Transition fault ATPG is based on modified stuck-at fault ATPG [Waicukauski 1987]. The initialization vector sets the initial value, and the test vector launches the transition and propagates it to a PO or PPO. For example, for a STR fault on line g, a logic 0 is set on g by the initialization vector, and in the test vector the value on g is set to 1, with a propagation path sensitized. These conditions can result in test invalidation due to hazards [Lin 2005a]. Such hazards can be reduced by attempting to generate single input change (SIC) patterns, in which the test vector differs from the initialization vector by only a single bit.

A simple ATPG model is created by introducing a seven-valued algebra, with logic values representing two time frames of a sequential circuit. The seven values are S1, S0, R1, F0, U1, U0, and X, denoting steady 1 across both time frames, steady 0 across both time frames, a rising transition across the two frames, a falling transition across the two frames, don’t care in the first frame and logic 1 in the second frame, don’t care in the first frame and logic 0 in the second frame, and don’t care in both frames, respectively. Boolean operators can operate on this seven-valued logic, and an ATPG can be developed to generate test vectors for each target transition fault.

One may also employ a stuck-at ATPG to generate tests for transition faults. By noting that a transition fault can be mapped to two stuck-at faults, vectors from a stuck-at suite can be used to construct the transition test set. For example, consider a fault x STR. To detect this fault, signal x must be set to logic 0 in the first time frame, and in the second vector, launch a transition by setting x = 1 and simultaneously propagating its fault effect to a PO or PPO. In other words, the first vector excites x stuck-at 1, and the second vector detects x stuck-at 0. As the two vectors may be uncorrelated, this technique works well under the enhanced-scan architecture. A test set can thus be constructed by matching stuck-at vectors. A graphical representation of all the test vectors and stuck-at faults can permit finding a minimum length test set. A concept of transition test chains was proposed in [Liu 2002, 2005] [Wang 2006a] to compute short chains that maximize detection of transition faults.

Despite the work that has gone into transition fault test generation, a complete transition fault test set may not detect all critical paths, because the ATPG decision process often favors those easier paths to launch or propagate the fault. The easier paths are generally the shorter paths. Conversely, a test set that exercises the longest paths in the circuit may not detect all transition faults, because a transition fault may not be robustly or nonrobustly launched or propagated [Gupta 2004]. Therefore, there has been effort to generate test sets that also cover long paths, such as [Sharma 2002], [Qiu 2003a], and [Gupta 2004]. Among these approaches, some try to generate a set of longest paths such that every gate is covered. Subsequently, a test set that tests these paths would also achieve full transition fault coverage. Alternatively, in [Gupta 2004], the aim was to launch the transition as late as possible, while at the same time propagating it through the longest propagation path. This technique thus elevates the quality of generated transition tests so that long paths are exercised.

A transition fault that is testable under one application scheme (e.g., LOC) may become untestable under another test application. Consider the circuit fragment shown in Figure 6.12. Suppose a, b, and c are outputs of scan flip-flops. The STF transition on line d would be untestable under the LOS test application. This is because to detect this STF, ab = 00 and c = 0 in the test (second) vector are needed for the transition to propagate through the AND gate. In other words, the test vector must be abc = 000. However, under LOS, this vector is a shifted version of the initialization vector. For the test vector to be abc = 000, the initialization vector must be abc = 00X. However, this initialization vector cannot set d = 1. Consequently, this STF fault is untestable under LOS.

Launch-on-shift untestable transition fault.

Figure 6.12. Launch-on-shift untestable transition fault.

Now, as the previous example illustrated, a LOS untestable fault may be testable under other methods of test application. Interestingly, however, if a transition fault is LOC untestable, it is guaranteed to be functionally untestable. This is because under LOC, the two vectors are functionally correlated in that the second vector is derived from the first via clocking the circuit. Because the starting state of a LOC test is fully controllable, any fault that is untestable would also be functionally untestable as no initial state would be able to both launch the transition and observe its effect. Note that, however, a functionally untestable transition fault may be detected under LOS, because the two vectors are not functionally correlated [Liu 2003].

Gate-Delay Fault ATPG

Gate delay fault ATPG attempts to generate a long path through every gate input and output for both rising and falling transitions. This overlaps with the goal of many path delay subset ATPG approaches [Sharma 2002], described later. Here we focus on extensions to transition fault ATPG to improve the detection of small delay defects. One approach is to make random decisions when selecting the propagation path from the fault site. This should result in a longer path length than the normal approach of using breadth-first search, which tends to produce the shortest path. However, this approach had little benefit in a recent experiment [Qiu 2006].

A second approach is to use static timing analysis (STA) to compute the timing slack of each line in the circuit. The ATPG then attempts to propagate through the fault site on one or more low-slack paths [Lin 2006a]. Because the least slack path may be false, the ATPG must consider other low-slack paths. When the paths through a fault site are of similar length, it does not matter which path is chosen, particularly if none of the paths is critical [Qiu 2004b]. When there is a significant difference in path length, the ATPG should attempt to propagate on the longest path. A low-cost approach to achieve this goal is to take the difference between the maximum and minimum structural path lengths in the fanin and fanout cones of a fault site [Kajihara 2006]. The cone with the larger difference is given search priority. On average, this results in longer path lengths than a standard TF ATPG algorithm.

Path-Delay Fault ATPG

Test generation for path-delay faults can be categorized into enumerative and nonenumerative approaches. Enumerative methods generate a list of long structural paths and check their testability [Li 1989] [Majhi 2000] [Murakami 2000] [Shao 2002]. This approach works well when most paths are testable, but it is inefficient if many long paths are untestable. Rather than testing all paths, the globally longest paths can be targeted [Lin 1987].

To increase the efficiency, NEST [Pomeranz 1995] generates paths in a nonenumerative way, but it is only effective in highly testable circuits. DYNAMITE [Fuchs 1991] improved the efficiency for poorly testable circuits, but it has excessive memory consumption for highly testable circuits. RESIST [Fuchs 1994] exploits the fact that many paths in a circuit have common subpaths and sensitizes those subpaths only once, which reduces repeated work and identifies large sets of untestable paths. RESIST can handle circuits such as ISCAS-1985 benchmark c6288 with an exponential number of paths. However, both the enumerative and nonenumerative algorithms are too slow to handle industrial circuits.

The RESIST algorithm was extended to find a set of longest testable paths that cover every gate [Sharma 2002]. This test set targets line delay faults. This method takes advantage of the relations between the longest paths through different gates and guarantees their testability. However, this work assumes a unit delay model and fails on c6288.

K Longest Paths per Gate (KLPG) ATPG

The CDF fault coverage metric suggests that testing multiple paths through each fault site will increase fault coverage. The K longest paths per gate (KLPG) test generation algorithm [Qiu 2003a, 2004a] was developed to test the K longest sensitizable paths through each gate or line for both STR and STF faults at the fault site. Figure 6.13 illustrates the KLPG path generation algorithm. A launch point (of a path) is a PI or PPI, and a capture point is a PO or PPO. In the preprocessing phase, STA computes the maximum delay from each gate to capture points, without considering any logic constraint. This value is termed the PERT delay or STA delay. In the path generation phase, partial paths are initialized from launch points. A partial path is a path that originates from a launch point but has not reached any capture point. A value called esperance [Benkoski 1990] is associated with a partial path. The esperance is the sum of the delay of the partial path and the PERT delay from its last node to a capture point. In other words, the esperance of a partial path is the upper bound of its delay when it becomes a complete path that reaches a capture point.

KLPG path generation algorithm.

Figure 6.13. KLPG path generation algorithm.

In each iteration of the path generation algorithm, the partial path with the maximum esperance value is extended by adding one gate. If the last gate of the partial path has more than one fanout, the partial path splits. Then the constraints to propagate the transition on the added gate, such as noncontrolling side input values required under the robust or nonrobust sensitization criterion, are applied. Direct implications are then used to propagate the constraints throughout the circuit. If there are any conflicts, the search space containing the partial path is trimmed off. If the partial path does not reach a capture point, some false path elimination techniques [Qiu 2003a] are applied to prevent it from growing to a false path. Then, its esperance is updated and the partial path is inserted back into the partial path store. If a partial path becomes a complete path, final justification is performed to find a vector. This process repeats until enough longest testable paths are generated. Because the longest path through a fault site is likely the longest path through other fault sites along the path, fault dropping is performed when a new path is generated.

An iteration of the path generation begins by popping the max esperance partial path from the path store. The partial path is extended by adding a fanout gate along the max esperance path. For example, in Figure 6.14, the partial path g0 ... gi is extended by adding gate gj because extending to gj could potentially preserve the max esperance. If the partial path has more than one extendable fanout, it must be copied, its esperance updated and pushed into the path store. For example (Figure 6.14), because gate gi has two fanouts, and extending the partial path to gj may result in false paths later, the partial path g0 ... gi must be saved because extending it to gate gk may get a longer testable path. Because fanout gj has been tried, the min-max esperance in the copy becomes 11/11.

Extending a partial path: (a) before extension and (b) after extension.

Figure 6.14. Extending a partial path: (a) before extension and (b) after extension.

After the partial path is extended (g0 ... gigj in Figure 6.14) the constraints to propagate the transition on the added gate (gj) are applied. Under the nonrobust sensitization criterion, noncontrolling final values on the side inputs are required. Under the robust sensitization criterion, in addition to noncontrolling final values, the side inputs must remain noncontrolling if the on-path input has a transition to the controlling value. Then direct implications are used to propagate the constraints throughout the circuit and discard false partial paths. For example (Figure 6.14), if extending partial path g0 ... gi to gate gj results in a conflict (see Figure 6.15), both path g0 ... gr and g0 ... gs are false, and the partial path is deleted from the path store. Experimental results show that most false paths can be eliminated by direct implications [Benkoski 1990] [Qiu 2003a].

Conflict after applying direct implications.

Figure 6.15. Conflict after applying direct implications.

If the launch-on-shift approach is used, the logic values on neighboring scan flip-flops are dependent on each other. For example, in Figure 6.16, the logic value of cell A in the initialization vector is the same as that of cell B in the test vector. The relation between cell B and C is the same. Therefore, if there is a rising transition assigned to cell B, direct implications will try to assign logic 1 to cell A in the initialization vector and logic 0 to cell C in the test vector and propagate the new assignments throughout the circuit. If there are any conflicts, the partial path is a sequential false path under the launch-on-shift constraints. This algorithm does not consider modification of the scan chain design to reduce the dependence, such as inserting dummy cells between the scan flip-flops.

Implications on scan flip-flops.

Figure 6.16. Implications on scan flip-flops.

If LOC is used, dependence exists between the two vectors. Even if the circuit had a pipeline structure, in which the two vectors are independent, the structure can also be seen as the general structure shown in Figure 6.16. The conversion is shown in Figure 6.17. Thus, the test vector is the output of the combinational circuit, derived from the initialization vector, excluding the PIs and POs. In other words, V2 = C(V1), where V1 and V2 are the two vectors and C is the logic of the combinational circuit. For example, if it is assumed that a testable path has a rising transition launched from cell A and a rising transition captured on cell B, in Figure 6.16, then for the initialization vector, output a′ must be logic 1 (then it becomes the value for input a in the second vector); and for the test vector, input b must be logic 0 because it is derived from the initialization vector. Then more direct implications can be performed from a′ and b.

A pipeline circuit structure.

Figure 6.17. A pipeline circuit structure.

Most designs include non-scan flip-flops. Even after simulation of the test setup sequence and scan procedure, some flip-flops remain uninitialized. In addition, embedded memories and “black boxes” are assumed uncontrollable, so their outputs have unknown values. A seven-valued algebra is used to distinguish between controllable and uncontrollable unknown values. The seven values are logic 0/1, x (unknown/unassigned), u (uncontrollable), 0/u (0 or uncontrollable), 1/u (1 or uncontrollable), and x/u (unknown or uncontrollable). At the beginning of test generation, the lines from the uncontrollable memories are u and all the other lines are x. The x values may be assigned a known value during test generation. In a two-input AND gate, if one input is x and the other is u, the output is 0/u because if the input with x is assigned a logic 0, the output becomes 0, but if this input is assigned a logic 1, the output becomes uncontrollable. Figure 6.18 shows two examples, assuming M1 is a non-scan memory cell and M2 is a scan flip-flop. The logic values assigned by simulation are shown. If the conventional three-value algebra is used, all the lines are assigned x’s.

In Figure 6.18a, line n3 can never be logic 1, so all paths through n4 are false. Using a three-value algebra, test generation may have to check all paths through n4. Moreover, it can be learned that n5 can never be logic 1, so its faults are untestable. Because all the paths through n4 contain n5, the faults on n4 are also untestable. Figure 6.18b shows how this algebra permits additional searching. When a partial path reaches gate g2, the value of the test vector on side input n3 is set to logic 0 and direct implications performed. The value on n2 is set to 0 and further direct implications can be performed from M2. In the conventional three-value logic, direct implications stop at gate g1.

Application of seven-value algebra in path-delay test: (a) trimming the search space and (b) enabling additional search.

Figure 6.18. Application of seven-value algebra in path-delay test: (a) trimming the search space and (b) enabling additional search.

Final justification is used to find a delay test pattern when a complete path is found, using the corresponding LOS or LOC constraints. For LOC, one time frame expansion is used, as shown in Figure 6.19. The initialization vector V1 can be generated within one time frame, but since the test vector V2 = C (V′1), the goal is to find a satisfying V′1. Because V1 and V′1 are identical excluding the “don’t care” bits, in the justification process, there must be no conflicts between V1 and V′1.

Time frame expansion for final justification using launch-on-capture.

Figure 6.19. Time frame expansion for final justification using launch-on-capture.

A KLPG-1 test set is constructed as follows. For each fault, a test is first generated using the longest robustly testable path (as this is the most reliable test, even though it may not be the longest path). If there is no robust test, the longest restricted nonrobustly testable path is selected. The restrictions are that the test must have the required transition at the fault site, and the local delay fault must be detected at a capture point, if there are no other delay faults. In other words, the test is also a TF test.

For example, path n1-n2-n3-n5 is a nonrobustly testable path in Figure 6.20. It is a valid nonrobust test for lines n2 and n3. However, it is not a valid nonrobust test for line n5 because the glitch (or transition) may not happen if the delay of path n1-n4 is greater than the delay of path n1-n2-n3. Similarly, it is not a valid nonrobust test for line n1 because the slow-to-rise fault may not be detected even if there are no other delay faults.

Restricted nonrobust test.

Figure 6.20. Restricted nonrobust test.

If a fault has no robust or nonrobust test, a transition fault test that can detect a small local delay fault is generated. In KLPG test generation, this case usually happens when the local delay fault can only be activated or propagated through multiple (functionally sensitizable) paths, such as the slow-to-fall fault on line n2 in Figure 6.21. The length of the shortest paths in the activating or propagating path set determines the test quality. The longer the shortest path, the smaller the local delay fault that can be detected. A similar situation occurs when an uncontrollable value selects between two propagation paths. Only the shorter of the two paths can be assumed.

Transition fault test.

Figure 6.21. Transition fault test.

Experimental results with KLPG test patterns show that the execution time and number of paths rise slowly with K, since many gates share the same paths. More faults are detected with LOS than LOC, with each having unique detects [Qiu 2004a]. The LOS robust path lengths are close to those of enhanced scan, while LOC path lengths are often shorter. This indicates that LOS and enhanced scan sensitize false paths. Silicon results with KLPG-1 test patterns have shown that they detect delay faults that escape TF tests [Qiu 2006].

Pseudo-Functional Testing to Avoid Over-Testing

Design for testability (DFT) is a popular technique used to increase the testability of a design, and it has been effective for structural testing, as discussed in the earlier sections of this chapter. It also allows the underlying test generator to focus only on generating a two-vector delay test that assumes the initial state can be loaded via the scan structure. However, unlike stuck-at faults, when testing for delay, one must keep in mind those functionally infeasible paths. Incidental activation of those functionally infeasible paths may over-test the chip. In addition, if a chip fails only the particular test that exercises such paths, it may be considered a bad chip when in reality it is fully functional.

Clearly, a delay fault that cannot be activated in the combinational portion of the circuit is functionally infeasible. On the other hand, some delay faults that can be activated under scan may not be functionally feasible. This is because the scanned-in state of the two-vector pattern may not be functionally reachable. In LOC, if the scanned-in starting state is unreachable, the LOC test may inadvertently exercise some functional false paths. Likewise, in LOS, the transition that is launched and propagated may not map to a functionally realizable state transition.

Having established the preceding discussion, the main hurdle in using scan tests for at-speed testing is that of yield loss [Rearick 2001] [Krstic 2003] [Lin 2005b]. That is, a timing failure caused by a functionally infeasible path may not be a true failure and will result in throwing good parts away. To address this problem, pseudo-functional tests have been proposed. In pseudo-functional scan testing, any test that is scanned in should conform “closely” to a functionally reachable state. The intuition behind this is that if the scanned state is known to be a functionally reachable state, then the chip will operate close to the functional mode during the application of the target test. Subsequently, the yield loss problems can be considerably reduced.

The following discussion gives a broad overview of various techniques that have been proposed to generate such pseudo-functional tests. Pseudo-functional tests can be computed by first extracting a set of constraints that specifies the illegal states [Lin 2005b] [Zhang 2005] [Lin 2006b] [Syal 2006]. With these constraints fed to the underlying ATPG, the test generator generates suitable tests that attempt to avoid the supplied constraints, which specify the illegal states. Alternatively, one may start with an initial set of unconstrained test patterns that has been generated by an ATPG without any constraints. Then, the unconstrained test set is modified to transform them to pseudo-functional scan tests by incorporating the knowledge of a subset of reachable states [Pomeranz 2004]. For instance, one may first compute a subset of reachable states via random simulation from a known valid state and collect all the states reached from the simulation. Any valid state identified is guaranteed to be reachable. Consequently, the pseudo-patterns generated by adhering to this information are guaranteed to be functionally valid states. When any pattern generated cannot strictly adhere to the valid state information, either a validity check of the modified state is invoked, or the initial state portion of the pregenerated pattern can be modified to adhere to one of the known valid states.

Therefore, identification of unreachable/illegal states (the converse of the reachable states) is a fundamental problem in the context of pseudo-functional testing. Sequential ATPG has been dealing with this problem since its conception. Given a state, it tries to determine if the state can be justified from an all-unknown state. When the target state is found to be unjustifiable, DUST [Gouders 1993] and DAT [Konijnenburg 1999] take this known illegal state and attempt to eliminate one specified assignment at a time, try to justify it, and end up with as large an illegal state space as possible. In other words, suppose the initial illegal state is 0X01X1. If after this process of elimination, state 0XX1XX is also illegal, we have obtained a larger illegal state space. Methods to merge these cubes to form even larger cubes are discussed in [Konijnenburg 1999].

Circuit fragment for constraint expression.

Figure 6.22. Circuit fragment for constraint expression.

In addition, succinct representation of such illegal states is essential to avoid memory explosion and enhance search efficiency. One representation is by expressing constraints as Boolean expressions over nets in the design [Liu 2003]. All such constraints are referred to as functional constraints. Another is the use of binary decision diagrams (BDDs). The constraints can be explicitly represented as a set of all computed illegal states. Consider three illegal states in a circuit with six flip-flops s1 through s6. The first illegal state is s1s2s3s4s5s6= 111XXX. The illegal state subspace can be represented simply as s1 · s2 · s3. Negating this conjunction gives us the clause w = ¬s1 +¬s2 +¬s3. This clause essentially restricts any solution to fall within the subspace expressed by w. For example, when s1s2s3 = 111, w would evaluate to 0, indicating that an illegal state space has been entered. The clauses for the other two illegal states can be obtained in a similar manner. Finally, the constrained formula is formed simply by the conjunction of all the constrained clauses. Thus, at any time, none of the constraint clauses must evaluate to 0 to ensure that the search remains outside of any of the illegal state spaces.

Alternatively, the illegal state spaces can be represented as Boolean expressions of illegal internal signal combinations [Syal 2006]. For example, suppose we have a fragment of the circuit shown in Figure 6.22. A single constraint (¬x + y) can represent several illegal states, namely abcd = {1000, 0100, 1100, 1001, 0101, 1101, 1010, 0110, 1110}. When the signals abcd are assigned any of these values, the constraint (¬x + y) will be violated.

Computing Constraints

Logic implications capture Boolean dependencies between various nets in a circuit. Logic implications have been used in ATPG engines in the past. For details of computing static implications, refer to [Zhao 1997] [Wang 2006a]. In the rest of the discussion, the following notations will be used:

Impl[g, v, t]: Set of implications resulting from assigning logic v to gate g in time frame t.

[g, v, t] → [h, w, t1]: The assignment g = v in time frame t implies h = w in time frame t1.

There are relationships between nets that are valid in the same time frame but can only be learned through “sequential learning” (i.e., Boolean learning across time frames). Such relationships are missed if only one time frame is analyzed. Furthermore, not all sequential relationships can be learned by time frame expansion [Syal 2006]. During ATPG, all combinational constraints will be automatically factored in the process and need not be stored. On the other hand, sequential relations are of great help to the ATPG to avoid those undesirable state spaces.

Pair-Wise Constraints

The first mechanism to identify functional constraints is by using the logic implications directly. For each gate g in the circuit, the sets Impl[g, 1, 0] and Impl[g, 0, 0] over the time window –t to t (t is user specified) are first computed and stored. All relationships for [g, 1, 0] and [g, 0, 0] that exist in time frame 0 are recorded. Let these relationships be denoted by sets Sseqg1 and Sseqg0. Note that some of these relationships are learned through sequential reasoning, whereas some relationships may be purely combinational.

To identify useful pair-wise constraints that denote illegal states (i.e., functional constraints), combinational relationships must be filtered out. This can be performed by using a single time frame analysis. For each gate g in this single-frame circuit, Impl[g, 1, 0] and Impl[g, 0, 0] are also computed. These combinational implications are recorded as Scombg1 and Scombg0. Next, set difference operations are performed for each gate g: Sseqg1 – Scombg1 and Sseqg0 – Scombg0. This operation ensures that all combinational relationships that do not represent any illegal state(s) are pruned from the list of functional constraints. Note that the remaining sequential constraints are those that still reside in the same time frame. For instance, x = 1 implies y = 1. However, because this constraint is sequential in nature, there must exist a starting state such that when x = 1, y can be made to equal 0. By adding this constraint (¬x + y), the set of states that violate this implication is thus captured [Syal 2006].

Multiliteral Constraints

Although the pair-wise sequential constraints may help to capture a large set of illegal states, they capture relationships only between a pair of gates. These may be insufficient to capture illegal states that require an analysis on more than two gates. To cover such illegal states, an approach to identify sequential relationships between multiple nets (multinode constraints) was proposed [Syal 2006].

The concept of multinode implications is rather simple. Starting with (A = 0, B = 1, C = 0), if via simulation an implication (D = 1) is obtained, then the multinode implication would simply be [(A = 0, B = 1, C = 0) → (D = 1)].

Let Ai represent an assignment of the form [gi, vi, t], where gi is a gate in the circuit, vi ε {0, 1}, and t is the time frame of this assignment. In other words, each Ai denotes a value assignment to a variable in the net-list. The form of pair-wise implication is AiAj whereas the form of multinode implication is (A1 Λ A2Λ ... Λ Am) → (Am+1. The element on the left-hand side of the arrow is called the implicate. The element on the right-hand side of the arrow is called the implicant.

Because the implicate of a multinode constraint is a conjunction of terms, the probability of the implicate evaluating to true reduces with the number of terms. Thus, the usefulness of a multinode constraint decreases with its size. Thus, multinode constraints are often restricted to a small size. Similar to pair-wise constraints, multinode constraints are also restricted to reside within the same time frame.

Consider the circuit fragment illustrated in Figure 6.23. Two important implications can be computed for gate D, namely (1) [D, 1, 0] → [F, 0, 1] and (2) [A, 1, 0] Λ [B, 1, 0] → [D, 1, 0]. By the transitive property of implications, a third relationship can be learned in a straightforward manner: (3) ([A, 1, 0] Λ [B, 1, 0] → [F, 0, 1]. Note that implication (i) is a pair-wise relation while relationships (2) and (3) are multinode constraints. Such relationships can be derived in linear time simply by analyzing every two-input gate.

Fragment of sequential circuit.

Figure 6.23. Fragment of sequential circuit.

However, deriving constraints this way is not entirely desirable. Specifically, constraint (2) is a combinational relationship and, as stated previously, is useless as a functional constraint because it does not block any states that could violate this constraint. Constraint (3) involves gates across time frames and is thus not considered for representing multinode constraints.

As a result, a different approach is needed. Assume that the following sequential relationships have been identified: [gi, vi, 1] → [A, 1, 0] and [gj, vj, 1] → [B, 1, 0], then a new multinode constraint can be deduced: [gi, vi, 1] Λ [gj, vj, 1] → [F, 0, 1]. Note that this multinode constraint has all the desirable characteristics, namely that it is not a combinational constraint and all the involved gates reside in the same time frame. In addition, the constraint can be learned by analyzing relationships around every single gate, retaining the linear complexity.

The preceding discussion begs the following question: How can one efficiently discover the gates gi that hold the relations [gi, vi, 1] → [A, 1, 0] and [gj, vj, 1] → [B, 1, 0]? In essence, the task is to identify the implicates of [A, 1, 0] as well as for [B, 1, 0]. At first glance, this task seems daunting. However, the contrapositive law allows for an efficient computation of such implicates. Recall that the contrapositive law states that if x → y, then ¬y →¬x. In other words, it is sufficient to compute what [A, 0, 0] implies, then by the contrapositive law, the implicates can be obtained. For example, if [A, 0, 0] → [gi, ¬vi}, 1] and [B, 0, 0] → [gj, ¬vj, 1] then by the contrapositive law, [gi, vi, 1] → [A, 1, 0] and [gj, vj, 1] → [B, 1, 0]. These implications are exactly what are needed to form the desired multinode constraint discussed earlier. Consider the circuit of Figure 6.22 again. Because it is known that [A, 0, 0] → [C, 1, 1] and [B, 0, 0] → [E, 0, 1], by contrapositive law, [C, 0, 1] → [A, 1, 0] and [E, 1, 1] → [B, 1, 0] are obtained. So the multinode constraint can be constructed to be [C, 0, 1] [E, 1, 1] → [F, 0, 1]. Again, this constraint can be generalized (([C, 0, t] [E, 1, t]) → [F, 0, t]; for all values of t) and it can be used as a constraint during constrained ATPG.

Finally, note that for the AND gate D, D = 1 is used to search for the multinode constraint. If it were an OR gate, the opposing value would have been chosen. In essence, the particular value assignment is chosen in order to form a conjunction of values necessary to form a multinode constraint. In the AND gate case—that is, ([A, 1, 0]Λ [B, 1, 0]) → [D, 1, 0].

The complexity for computing the prescribed multinode constraints is thus linear with respect to the number of gates in the circuit. Both the pair-wise and multinode constraints are computed and stored, which are then used to constrain the ATPG process.

Constrained ATPG

Suppose we are given a set of functionally untestable delay faults, Fu, and we want to make sure that the generated vectors will not incidentally detect any fault in Fu. A naive approach is to fault simulate the faults in Fu whenever a vector is obtained, and the ATPG engine would backtrack if some faults in Fu are incidentally detected and continue to search for a different pattern. However, this naive approach can be computationally expensive. To reduce the expense, instead of focusing on Fu, each fault in Fu is projected onto the state space S, to identify subspaces that will detect them. The projected state space S essentially constitutes a portion of the illegal states. This is because any state s that can detect any fault f in Fu is an unreachable state, because fault f would otherwise be functionally detectable. Subsequently, the ATPG needs to use the projected state space to avoid searching in the identified illegal subspaces. The projected state space, S, can be represented as a constraint formula, as explained earlier in Section 6.7.

Given a constraint formula that represents the set of illegal states, during the ATPG process, we must make sure that no clause in the constraint formula ever evaluates to false (i.e., all literals in a clause simultaneously evaluate to false). The set of constraints also enables speed up of the ATPG by avoiding futile search spaces. Whenever a decision is made on a state variable si, we apply this decision assignment to all the constrained clauses in the formula that contain si. Application of this assignment may result in some unit clauses (a unit clause is an unsatisfied clause with exactly one remaining unassigned literal left). This remaining literal is called an implication. The implied variable automatically becomes the next decision variable. For example, consider a constraint clause (¬s1 +¬s2 +¬s3). Suppose s1 has been decided or implied to be 1, and the current decision makes s2 = 1. At this time, this clause yields an implication, s3 = 0. With s1 = s2 = 1, the only way not to cause any conflict is to have s3 = 0. We also check whether there is conflict (where one clause evaluates to false). If there is a conflict, the ATPG backtracks immediately [Liu 2003].

Concluding Remarks

Delay testing is becoming increasingly critical with rising clock frequencies and reduced timing margins. Structural delay testing is applied primarily using scan chain structures and slow-fast-slow clocking, using the transition fault model. Increasing quality demands have made the transition fault model inadequate. The transition fault model has been extended to target small delay defects by incorporating timing information. The path-delay fault model has also been made practical for small delay defects by generating path-delay tests that cover all transition fault sites. These tests run the risk of over-testing chips by detecting sequentially untestable faults. Pseudo-functional test can be used to avoid such over-testing.

In this chapter we described the common delay test approaches, with a focus on structural delay testing using two-pattern tests, where transitions are launched into the circuit and then the results captured within the rated clock cycle time. The use of design-for-test features such as scan chains provides the controllability and observability to make structural testing practical. This is combined with a slow scan in of the initialization vector, then application of the test vector via launch-on-shift (LOS) or launch-on-capture (LOC), and then capture of the results. With knowledge of the expected path delays, faster-than-at-speed testing can be used to increase detection of small delay defects.

We described the traditional delay fault models, including the transition fault, gate-delay fault, and path-delay fault, as well as newer models, including inline-delay fault, propagation delay fault, segment delay fault, and defect-based delay fault models. The quality of tests, in terms of test sensitization, was also discussed. Sensitization is particularly important when testing for small delay defects.

We discussed fault simulation and test generation for transition, gate/inline-delay path-delay faults and defect-based delay faults. Transition fault simulation and test generation is widely used, because it is a minor modification of stuck-at fault simulation and test generation. Defect-based delay fault simulation and test generation combines some aspects of both gate-delay and path-delay fault simulation and test generation. We discussed in more detail KLPG test generation, which illustrates the issues in both gate-delay and path-delay test generation. A common problem in structural test is that the initialization vector may not be a legal state of the circuit. As a result, functionally infeasible paths may be tested. If these tests fail, a chip that passes functional test will be rejected. The solution is to generate pseudo-functional tests, which have a high probability of starting from a functionally reachable state.

An increasing challenge in delay testing is to achieve high coverage of small delay defects and good correlation to functional test results. A major part of this challenge is to account for supply noise during delay testing, as discussed in Chapter 7. A further challenge is achieving this coverage in the presence of process variation, which increases the number of potentially longest paths in the circuit. The greatest challenge is incorporating these new goals into testing while avoiding an increase in test time.

Exercises

6.1

(Untestable Transition Faults) Consider the following circuit fragment. Assume the three inputs to the circuit are outputs of flip-flops.

  1. Compute the set of untestable transition faults with launch-on-capture.

  2. Compute the set of untestable transition faults with launch-on-shift.

Exercises

6.2

(Pseudo-functional Testing) Consider the following circuit fragment, and assume the three inputs to the circuit are outputs of flip-flops. Compute the set of unreachable states represented by the constraint (gj = 0, gk = 1).

Exercises

6.3

(Pseudo-functional Testing) Consider the following circuit fragment, where a, b, c, and d are outputs of flip-flops. Let the set of unreachable states be abcd = {0000, 0001, 0010, 0100, 0101, 0110}. Derive a succinct constraint/expression for representing this set of unreachable states in terms of a, b, c, d, x, and y.

Exercises

6.4

(Transition Fault Testing) For the circuit shown here, find the following tests, if they exist.

  1. Find a test for H s@1.

  2. Find a test for transition fault D slow to rise.

  3. Find a test for transition fault F slow to fall.

  4. To ensure test quality in part c, we want to make A a static 0. Please explain why.

  5. For path C-F-J-L, find all its static, robust, and nonrobust path delay tests.

Exercises

6.5

(Circuit Delays) For the circuit shown here and the given input transitions on inputs A and B, draw the waveforms for C, D, and E, for the specified delays.

  1. Zero delay for each gate.

  2. Unit delay for each gate.

  3. Rising delay of 5, and falling delay of 7 for each gate.

  4. Rising/falling delay of 5 for every gate, with inertial delay of 3.

  5. Rising/falling delay (min, max) = (5, 7) for every gate.

Exercises

6.6

(Fault Count) For the two circuits shown here, determine the following.

  1. The total number of transition delay faults.

  2. The total number of path delay faults.

  3. Longest path (assume unit gate delay).

Exercises

6.7

(Scan Test Time) Given a combinational circuit with eight scan inputs and three scan outputs, determine the total time to apply one LOS pattern and one LOC pattern. Assume the scan clock frequency is 20 MHz, and system clock frequency is 100 MHz.

6.8

(Robust Path Delay Test) For the circuit shown here, verify that all path-delay faults are robustly testable.

Exercises

6.9

(Path Delay Test) Consider path P1: B-D-F-G-I and path P2: B-D-E-H-I in the following circuit.

  1. Derive a robust test for path-delay fault ↑P2, where ↑ means a rising transition on the input to the path.

  2. Can you derive a robust test for path-delay fault ↑P1? Why or why not?

  3. Derive a nonrobust test for path-delay fault ↑P1.

Exercises

6.10

(LOS and LOC) In the timing diagram that follows, the launch clock cycle is labeled L and the capture clock cycle is labeled C. Draw the corresponding waveform of the scan enable signal for both the LOS and LOC test application methods.

Exercises

6.11

(Path Sensitization) What is the difference between robust test and nonro-bust test?

6.12

(Off-Path Signals) Specify the off-path signal states for delay testing of a two-input XNOR gate.

6.13

(Path Delay Test) Consider the following circuit. Is the path delay fault {↓, b-c-e-g-h-i} robustly testable, nonrobustly testable, or only functionally sensitizable? How about path delay fault {↑, a-c-f-h-i}? Count the total number of robustly testable path delay faults in the circuit.

Exercises

6.14

(Sequential Path-Delay Fault Test) In the following circuit, determine whether there are robust path-delay fault tests for faults {↑, h-f -g} and {↓, h-f -g}.

Exercises

6.15

Exercises(A Design Practice) Use the ATPG programs and user’s manuals contained on the Companion Web site to generate transition fault test patterns for the combinational (full-scan) versions of the three largest ISCAS89 circuits and record the number of test patterns needed for each circuit.

6.16

Exercises(A Design Practice) Follow the instructions for Exercise 6.15, but generate transition fault test patterns when the SDF file for the combinational circuit is available. Compare the delay size generated for each pattern.

Acknowledgments

The authors wish to thank Shrirang Yardi and Karthik Channakeshava of Virginia Tech for reviewing the text. The authors would also like to thank Jing Wang, Zheng Wang, and Lei Wu of Texas A&M University for reviewing the text and providing exercises.

References

Books

Delay Test Application

Delay Fault Models

Delay Test Sensitization

Delay Fault Simulation

Delay Fault Test Generation

Pseudo-Functional Testing to Avoid Over-Testing

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset