Chapter 1. Introduction

Laung-Terng (L.-T.) WangSynTest Technologies, Inc., Sunnyvale, California

Charles E. StroudAuburn University, Auburn, Alabama

Nur A. ToubaUniversity of Texas, Austin, Texas

About This Chapter

Over the past three decades, we have seen the semiconductor manufacturing technology advance from 4 microns to 45 nanometers. This shrinkage of feature size has made a dramatic impact on design and test. Now we find system-on-chip (SOC) and system-in-package (SIP) designs that embed more than 100 million transistors running at operating frequencies in the gigahertz range. Within this decade, there will be designs containing more than a billion transistors. These designs can include all varieties of digital, analog, mixed-signal, memory, optical, microelectromechanical systems (MEMS), field programmable gate array (FPGA), and radiofrequency (RF) circuits. Testing designs of this complexity is a significant challenge, if not a serious problem. Data have shown it is beginning to require more than 20% of the development time to generate production test patterns of sufficient fault coverage to detect manufacturing defects.

Additionally, when the SOC design is operated in a system, soft errors induced by alpha-particle radiation can adversely force certain memory cells or storage elements to change their states. These soft errors can cause the system to malfunction. As complementary metal oxide semiconductor (CMOS) scaling continues, the combined manufacturing defects and soft errors start to threaten the practicality of these nanometer SOC designs.

In this chapter, we first describe the importance of SOC testing and review the design and test challenges reported in the International Technology Roadmap for Semiconductors (ITRS). Next, we outline the Institute of Electrical and Electronics Engineers (IEEE) standards used for testing SOC designs. These include the 1149.1 and 1149.6 boundary-scan standards, the 1500 core-based test and maintenance standard, and the 1149.4 analog boundary-scan standard. Some SOC design examples, including a network-on-chip (NOC) design, are then illustrated. Finally, we provide an overview of the book in terms of the chapters that discuss how to test various components and aspects of these highly complex nanometer SOC designs. The book concludes with an invited survey chapter on testing aspects of nanotechnology trends, which covers four of the most promising nanotechnologies: resonant tunneling diodes (RTDs), quantum-dot cellular automata (QCA), hybrid CMOS/nanowires/nanodevices, and carbon nanotubes (CNTs).

Importance of System-on-Chip Testing

In 1965, Gordon Moore, Intel’s cofounder, predicted that the number of transistors integrated per square inch on a die would double every year [Moore 1965]. In subsequent years, the pace slowed, but the number of transistors has continued to double approximately every 18 months for the past two decades. This has become the current definition of Moore’s law. Most experts expect that Moore’s law will hold for at least two more decades. Die size will continue to grow larger, but, at the same time, minimum feature size will continue to shrink. Although smaller transistor size can result in smaller circuit delay, a smaller feature size for interconnects does not reduce the signal propagation delay; thus, the signal propagation delay in interconnects has been the dominant factor in determining the delay of a circuit [Dally 1998]. To alleviate this problem, interconnects are made thicker to reduce the sheet resistance. Unfortunately, this induces crosstalk noises between adjacent interconnects because of capacitive and inductive coupling. This is referred to as a signal integrity problem, and it is extremely difficult to detect [Chen 2002]. As the clock frequency has been pushed up into the gigahertz range and supply voltage has also been scaled down along with device scaling, the power supply voltage drop caused by L(di/dt) can no longer be ignored. This has caused a power integrity problem that again is extremely difficult to solve because finding test patterns with maximum current changes is quite difficult [Saxena 2003].

As the manufacturing technology continues to advance, precise control of the silicon process is becoming more challenging. For example, it is difficult to control the effective channel length of a transistor such that the circuit performance, including power and delay, exhibits much larger variability. This is a process variation problem, and it can make delay testing extremely complex [Wang 2004]. To reduce the leakage power dissipation, many low-power design techniques have been widely used. Unfortunately, low-power circuits might result in new fault models that increase the difficulty of fault detection; for example, a drowsy cache that can be supplied by low voltage (e.g., 0.36 V) when it is idle has been proposed recently to reduce the leakage current [Kim 2004]. Though the leakage current can be reduced by several orders of magnitude, a new fault model called a drowsy fault can occur that causes a memory cell to fall asleep forever. Unfortunately, testing drowsy faults requires excessively long test application times, as it is necessary to drive the memory cells to sleep and then wake them up. As we move into the nanometer age and in order to keep up with Moore’s law, many new nanotechnologies and circuit design techniques must be developed and adopted, all of which pose new test challenges that must be addressed concurrently. Otherwise, the cost of test would eventually surpass the cost of silicon manufacturing, as illustrated in Figure 1.1, according to roadmap data given in [SIA 1997] and [SIA 1999].

Fabrication capital versus test capital.

(Courtesy of [Cheng 2006].)

Figure 1.1. Fabrication capital versus test capital.

In 2004, the Semiconductor Industry Association (SIA) published an International Technology Roadmap for Semiconductors (ITRS), which includes an update to the test and test equipment trends for nanometer designs through the year 2010 and beyond [SIA 2004]. The ITRS is an assessment of the semiconductor technology requirements with the objective of ensuring advancements in the performance of integrated circuits. This assessment, also known as a roadmap, is a cooperative effort of the global industry manufacturers and suppliers, government organizations, consortia, and universities.

The ITRS identifies the technological challenges and needs facing the semiconductor industry through the end of the next decade. Difficult near-term and long-term test and test equipment challenges were reported in [SIA 2004] and are listed in Tables 12.1 and 12.2 of [Wang 2006]. The near-term challenges through 2010 for nanometer designs with feature size ≥45 nm include high-speed device interfaces, highly integrated designs, reliability screens, manufacturing test cost, as well as modeling and simulation. The long-term challenges beyond 2010 for nanometer designs with feature size <45 nm include the device under test (DUT) to automatic test equipment (ATE) interface, test methodologies, defect analysis, failure analysis, and disruptive device technologies. These difficult challenges encompass a full spectrum of test technology trends imperative for nanometer designs, including (1) developing new design for testability (DFT) and design for manufacturability (DFM) methods for digital circuits, analog circuits (including RF and audio circuits as well as high-speed serial interfaces), MEMS, and sensors; (2) developing the means to reduce manufacturing test costs as well as enhance device reliability and yield; and (3) developing techniques to facilitate defect analysis and failure analysis. The ITRS [SIA 2004] further summarizes the design test challenges, as shown in Table 12.3 of [Wang 2006]. These include (1) effective speed testing with increasing core frequencies and widespread proliferation of multi-GHz serial input/output (I/O) protocols; (2) capacity gap between design complexity and DFT, test generation, and fault grading tools; (3) quality and yield impact resulting from test process diagnostic limitations; (4) signal integrity testability and new fault models; (5) system-on-chip (SOC) and system-in-package (SIP) test including integrated self-test for heterogeneous SOCs and SIPs; (6) diagnosis, reliability screens, and yield improvement; and (7) fault-tolerance and online testing.

In [SIA 2005] and [SIA 2006], these difficult challenges were further refined and split into key drivers and difficult challenges. A future opportunities section was also added. The key drivers (not in any particular order) include (1) device trends such as increasing device integration (SOC, SIP, multichip packaging [MCP], and three-dimensional [3D] packaging) and integration of emerging and nondigital CMOS technologies (RF, analog, optical, and MEMS); (2) increasing test process complexity such as “distributed test” to maintain cost scaling; and (3) continued economic scaling of test such as managing (logic) test data volume. Difficult challenges (in order of priority) include (1) test for yield learning that is critically essential for fabrication process and device learning below optical device dimensions; (2) screening for reliability (e.g., causing erratic, nondeterministic, and intermittent device behavior); (3) increasing systemic defects such as detecting symptoms and effects of line width variations, finite dopant distributions, and systemic process defects; and (4) potential yield losses caused by tester inaccuracies (e.g., timing, voltage, current, temperature control), over-testing (e.g., delay faults on nonfunctional paths), etc. Future opportunities (not in any particular order) include (1) test program automation (not automatic test pattern generation [ATPG]); (2) simulation and modeling of test interface hardware and instrumentation seamlessly integrated to the device design process; and (3) convergence of test and system reliability solutions between test (DFT), device, and system reliability (error detection, reporting, and correction).

A circuit defect may lead to a fault, a fault can cause a circuit error, and a circuit error can result in a system failure. Two major defect mechanisms can cause the SOC design to malfunction: manufacturing defects and soft errors. Manufacturing defects are physical (circuit) defects introduced during manufacturing that cause the design to fail to function properly in the device, on the printed circuit board (PCB), or in the system or field. These manufacturing defects can result in static faults (such as stuck-at faults) or timing faults (such as delay faults). There is general consensus with the rule of ten, which says that the cost of detecting a faulty device increases by an order of magnitude as we move through each stage of manufacturing, from device level, to board level, to system level, and finally to system operation in the field [Wang 2006]. Soft errors, also referred to as single event upsets (SEUs), are transient faults induced by environmental conditions, such as alpha-particle radiation, which cause a fault-free design to malfunction when deployed on-board, in-system, or in-field [May 1979] [Baumann 2005]. The probability of the occurrence of soft errors increases with decreasing feature size. For example, the probability of SEUs increased by a factor of more than 21 when moving from a feature size of 0.6 microns to 0.35 microns [Ohlsson 1998]. Transient faults are nonrepeatable temporary faults and thus cannot be detected during manufacturing. However, these defect mechanisms must be screened during manufacturing or tolerated in the design in order to enhance device reliability and yield, reduce defect level and test costs, and improve system reliability and system availability.

Yield and Reject Rate

Some percentage of the manufactured devices are expected to be faulty because of manufacturing defects. The yield of a manufacturing process is defined as the percentage of acceptable parts among all parts that are fabricated:

Yield and Reject Rate

There are two types of yield loss: catastrophic and parametric. Catastrophic yield loss is due to random defects, and parametric yield loss is due to process variations. Automation of and improvements in an integrated circuit (IC) fabrication process line drastically reduce the particle density that creates random defects over time; consequently, parametric variations resulting from process fluctuations become the dominant reason for yield loss.

Methods to reduce process variations during fabrication are generally referred to as design for yield (DFY). The circuit implementation methods to avoid random defects are generally referred to as design for manufacturability (DFM). Broadly speaking, any DFM method helps to increase manufacturing yield and thus can be considered as a DFY method. Manufacturing yield relates to the failure rate λ. The bathtub curve shown in Figure 1.2 is a typical device (or system) failure chart indicating how early failures, wearout failures, and random failures contribute to the overall device (or system) failures.

Bathtub curve.

Figure 1.2. Bathtub curve.

The infant mortality period (with decreasing failure rate) occurs when a product is at its early production stage. Failures are mostly due to poor process or design which leads to poor product quality and thus the product should not be shipped during this period to avoid massive field returns. The working life period (with constant failure rate) represents the product “working life.” Failures during this period tend to occur randomly. The wearout period (with increasing failure rate) indicates the “end of life” of the product. Failures during this period are due to age defects, such as metal fatigue, hot carriers, electromigration, dielectric breakdown, etc. For electronic products, this period is of less concern because they often would not enter this region because of technology advances and obsolescence.

When ICs are tested, the following two undesirable situations may occur:

  1. A faulty device appears to be a good part passing the test.

  2. A good device fails the test and appears as faulty.

These two outcomes are often due to a poorly designed test or the lack of design for testability (DFT). As a result of the first case, even if all products pass acceptance test, some faulty devices will still be found in the manufactured electronic system. When these faulty devices are returned to the IC manufacturer, they undergo failure mode analysis (FMA) for possible improvements to the IC development and manufacturing processes [Amerasekera 1987]. The ratio of field-rejected parts to all parts passing quality assurance testing is referred to as the reject rate, also called the defect level:

Bathtub curve.

For a given device, the authors in [McCluskey 1988] showed that defect level DL is a function of process yield Y and fault coverage FC:

DL = 1–Y(1–FC)

The defect level provides an indication of the overall quality of the testing process [Williams 1981] [Bushnell 2000] [Jha 2003]. Generally speaking, a defect level of 500 parts per million (PPM) may be considered to be acceptable, whereas 100 PPM or lower represents high quality. The goal of six sigma manufacturing, also referred to as zero defects, is 3.4 PPM or less.

Example 1.1

Assume the process yield is 50% and the fault coverage for a device is 90% for the given test sets. By the preceding equation, we obtain DL = 1 – 0.5(1–0.9) = 0.067. This means that 6.7% of shipped parts will be defective or the defect level of the product is 67,000 PPM. On the other hand, if a DL of 100 PPM is required for the same process yield of 50%, then the fault coverage required to achieve the PPM level is FC = 1 – (log(1 –DL)/log(Y)) = 0.99986. Because it could be extremely difficult, if not impossible, to generate tests that have 99.986% fault coverage, improvements over process yield might become mandatory in order to meet the stringent PPM goal.

Reliability and System Availability

Traditionally, component reliability, also called device reliability, is measured by acceptable defect level (defective PPM), failure rate per 1000 hours, noise characteristics, etc. Because process yield and fault coverage affect defect level, reliability screen methods, such as stress testing (through burn-in) and IDDQ testing (by measuring the device leakage currents), are often used to accelerate the failure rate with pre-selected test sets. These methods, called reliability screens, are mainly developed to weed out weak devices before mass production so as to reduce test escapes that will cause field returns. Once weak devices are found, FMA is performed to analyze, debug, locate, and correct the failures so the process yield can be increased later.

When a manufactured electronic system is shipped to the field, it may also undergo testing as part of the installation process to ensure that the system is fault-free before placing the system into operation. During system operation, a number of events can result in a system failure; these events include single-event upsets (SEUs), electromigration, and material aging. Suppose the state of system operation is represented as S, where S = 0 means the system operates normally and S = 1 represents a system failure. Then S is a function of time t, as shown in Figure 1.3.

System operation and repair.

Figure 1.3. System operation and repair.

Suppose the system is in normal operation at t = 0, it fails at t1, and the normal system operation is recovered at t2 by some software modification, reset, or hardware replacement. Similar failure and repair events happen at t3 and t4. The duration of normal system operation (Tn), for intervals such as t1t0 and t3t2, is generally assumed to be a random number that is exponentially distributed. This is known as the exponential failure law. Hence, the probability that a system will operate normally until time t, referred to as system reliability, is given by:

P (Tn >t) = eλt

where λ is the failure rate. Because a system is composed of a number of components, the overall failure rate for the system is the sum of the individual failure rates (λi) for each of the k components:

System operation and repair.

The mean time between failures (MTBF) is given by:

System operation and repair.

Similarly, the repair time (R) is also assumed to obey an exponential distribution and is given by:

P(R>t) = e–μt

where μ is the repair rate. Hence, the mean time to repair (MTTR) is given by:

System operation and repair.

The fraction of time that a system is operating normally (failure-free) is called the system availability and is given by:

System operation and repair.

This formula is widely used in reliability engineering; for example, telephone systems are required to have system availability of 0.9999 (simply called four nines), whereas high-reliability systems may require seven nines or more.

In general, system reliability requires the use of spare components or units (made of components) in a fault-tolerant system to increase the system availability [Siewiorek 1998]. Device reliability, on the other hand, depends on reliability screens in very-large-scale-integration (VLSI) or SOC devices to improve process yield and hence the device defect level. Unfortunately, existing reliability screens are becoming either quite expensive (as in the case of burn-in) or ineffective (as in the case of using IDDQ testing) for designs manufactured at 90 nm or below. Fundamentally new long-term solutions must be developed for reliability screens and may include significant on-die hardware for stressing or special reliability measurements, as indicated in [SIA 2004].

Today, little evidence shows that higher device reliability can prolong the working life of a device, but in theory, it could extend the availability of the system when embedded with such devices. Fault-tolerant architectures commonly found in high-reliability systems are now applied to robust SOC designs to tolerate soft errors and to make them error-resilient. These error resilience (or defect tolerance) schemes are now referred to as design for reliability (DFR).

Example 1.2

The number of failures in 109 hours is a unit (abbreviated FITS) that is often used in reliability calculations. For a system with 500 components where each component has a failure rate (λi) of 1000 FITS, the system failure rate (λ) becomes (1000/109) × 500 = 5 × 10–4. Because MTBF = 1/λ the mean time between failures of the system is 2000 hours. Suppose that the availability of the system must be at least 99.999% each year. Then, the repair time allocated for system repair each year should be less than t = T × (1–system availability) = 1 × 365 × 24 × 60 × 60 × (1 – 0.99999) = 315 seconds, which is about 5 minutes. This implies that only fault tolerance with built-in self-repair (BISR) capability can meet the system availability requirement.

Basics of SOC Testing

The various DFM, DFY, and DFR methods that have been proposed in academia and industry are mainly used to improve the manufactured device quality and to extend the availability of the system once the manufactured devices are used in the field. When an SOC design fails as a chip, on a board, or in the system, the ability to find the root cause of the failure in a timely manner becomes critical. In this section, we briefly discuss several IEEE standards (including 1149.1, 1149.4, 1149.6, and 1500) and other techniques that ease silicon test and debug as well as system-level test and diagnosis. For detailed descriptions, the reader is referred to key references cited.

Boundary Scan (IEEE 1149.1 Standard)

The success of scan-based DFT techniques from the mid-1970s through the mid-1980s led to their adaptation for testing interconnect and solder joints on surface mount PCBs. This technique, known as boundary scan, eventually became the IEEE 1149.1 standard [IEEE 1149.1-2001] and paved the way for floating vias, microvias, and mounting components on both sides of PCBs to reduce the physical size of electronic systems. Boundary scan provides a generic test interface not only for interconnect testing between ICs but also for access to DFT features and capabilities within the core of an IC as illustrated in Figure 1.4 [IEEE 1149.1-2001] [Parker 2001]. The boundary-scan interface includes four mandatory input/output (I/O) pins for Test Clock (TCK), Test Mode Select (TMS), Test Data Input (TDI), and Test Data Output (TDO). A test access port (TAP) controller is included to access the boundary-scan chain and any other internal features designed into the device, such as access to internal scan chains, built-in self-test (BIST) circuits, or, in the case of field programmable gate arrays (FPGAs), access to the configuration memory. The TAP controller is a 16-state finite state machine (FSM) with standardized state diagram illustrated in Figure 1.4b where all state transitions occur on the rising edge of TCK based on the value of TMS shown for each edge in the state diagram. Instructions for access to a given feature are shifted into the instruction register (IR) and subsequent data are written to or read from the data register (DR) specified by the instruction (note that the IR and DR portions of the state diagram are identical in terms of state transitions and TMS values). An optional Test Reset (TRST*) input can be incorporated to asynchronously force the TAP controller to the Test Logic Reset state for application of the appropriate values to prevent back driving of bidirectional pins on the PCB during power up. However, this input was frequently excluded because the Test Logic Reset state can easily be reached from any state by setting TMS = 1 and applying five TCK cycles.

Boundary-scan interface: (a) boundary-scan implementation and (b) TAP controller state diagram.

Figure 1.4. Boundary-scan interface: (a) boundary-scan implementation and (b) TAP controller state diagram.

The basic boundary-scan cell (BSC) used for testing interconnect on a PCB is illustrated in Figure 1.5a along with its functional operation. A more complex construction of the triple BSC bidirectional I/O buffer is illustrated in Figure 1.5b. The bidirectional buffer illustrates the need for the double-latched scan chain design of the basic BSC to prevent back driving of other bidirectional buffers on a PCB while shifting in test patterns and shifting out test responses. The Update latch holds all values (including tri-state control value) stable at the pads during the shifting process. Once the test pattern is shifted into the boundary-scan chain via the Capture flip-flops, the Update DR state of the TAP controller (see Figure 1.4b) transfers the test pattern to the Update latches for external applications to the PCB interconnect testing or access to DFT features and capabilities within the core of an IC.

Boundary-scan cells: (a) boundary-scan cell (BSC) and operation modes and (b) bidirectional buffer with BSCs.

Figure 1.5. Boundary-scan cells: (a) boundary-scan cell (BSC) and operation modes and (b) bidirectional buffer with BSCs.

The design of the basic BSC shown in Figure 1.5a also facilitates application of test patterns at the input buffers to the internal core of a device, as well as capture of output responses from the internal logic at the output buffers. This internal test, referred to as INTEST in the IEEE 1149.1 standard, is an optional but valuable instruction, although it is not always implemented in an effort to reduce the area overhead and performance penalty associated with boundary scan. The external test of PCB interconnects, referred to as EXTEST, is a mandatory instruction to test the PCB interconnects for which boundary scan was intended. Another mandatory feature is the BYPASS instruction and uses the bypass register (shown in Figure 1.4a and consisting of a single flip-flop), which allows the entire boundary-scan chain to be bypassed to provide faster access to other devices on the board with the daisy-chained boundary-scan interface.

The boundary-scan interface is significant for several reasons. Obviously, it provides an approach to testing interconnects on digital PCBs and overcomes the test problems encountered with surface mount technology emerging in the late 1980s. Although boundary scan does not directly address test logic internal to the devices on a PCB, it does provide a standardized interface that can be used to access internal test mechanisms, such as scan chains and BIST circuits, designed specifically for the internal logic testing. Perhaps more important, it provides a proven solution to the test problems that would later be encountered with SOC and SIP implementations. Because the basic idea is to incorporate a large number of complex cores in a single chip (in the case of an SOC) or a single package (in the case of an SIP), testing interconnects between these modules can be performed in a manner similar to boundary scan. Hence, the IEEE 1149.1 standard can be, and is, extended and adapted to test the interconnections between the cores.

Boundary Scan Extension (IEEE 1149.6 Standard)

Although the IEEE 1149.1 boundary-scan standard has been the mainstay for board-level interconnect testing since the early 1990s, it is becoming ineffective to test modern-day high-speed serial networks that are AC-coupled (a “blocking” capacitor is placed in the net between the driver and receiver). Also, most high-speed serial networks use differential I/O pads.

The IEEE 1149.6 standard [IEEE 1149.6-2003] was introduced to address these problems with multigigabit I/Os. It is an extension of the IEEE 1149.1 standard to deal with this differential, AC-coupled interface as illustrated in Figure 1.6. When testing the two nets using the IEEE 1149.1 standard, the presence of the coupling capacitor in AC-coupled networks “blocks” DC signals. As a result, the DC level that is applied to the net during a boundary-scan EXTEST instruction decays over time to an undefined logic level. This places a minimum frequency requirement on TCK that the 1149.1 standard cannot support. The IEEE 1149.6 standard addresses this problem by capturing the edges of data transitions instead of capturing data levels; hence, the minimum TCK frequency requirement is removed.

Differential, AC-coupled network.

Figure 1.6. Differential, AC-coupled network.

IEEE 1149.6 addresses the problem of differential signaling by adding a single, boundary-scan cell, internal to the driver and boundary-scan cells on each input to the functional, differential receiver. The single boundary-scan cell on the driver side minimizes the loading and performance impact. Boundary-scan cells on each input of the receiver provide better coverage than a single boundary-scan cell implementation.

The IEEE 1149.6 circuit is composed of four parts: (1) the analog test receiver, (2) the digital driver logic, (3) the digital receiver logic, and (4) the 1149.6 test access port (TAP). Each part is discussed in the subsequent paragraphs.

The analog test receiver is the most critical part of the 1149.6 implementation because it is the test receiver that is able to capture transition edges. The test receiver uses a “self-referenced” comparator, along with voltage and delay hysteresis to capture a valid edge and filter any unwanted noise. The test receiver uses a low-pass filter to create a delayed reference signal.

The digital driver logic is a simple extension to the IEEE 1149.1 driver. Unlike the 1149.1 driver, the 1149.6 driver is required to drive a pulse when it is executing the (1149.6) EXTEST_PULSE (or EXTEST_TRAIN) instruction. The EXTEST_PULSE instruction is used to drive the output signal to the opposite state, wait for the signal to fully decay, and then drive the signal to the correct value (this is the value that gets captured). By allowing the signal to fully decay, the maximum voltage swing is generated on the next driven edge, allowing for better capture by the analog test receiver. In rare cases a continuous waveform may be required for some high-speed logic. In this case, the EXTEST_TRAIN instruction is used instead of the EXTEST_PULSE to generate a continuous waveform based on TCK. The digital driver logic must also support the 1149.1 EXTEST instruction. It simply extends the 1149.1 logic by multiplexing the 1149.6 signal into the 1149.1 shift/update circuit, after the update flip-flop.

The digital receiver logic takes the output of the analog test receiver and sets a capture flip-flop to a corresponding logical zero or one. The digital test receiver logic also ensures that a valid transition has been captured on every test vector by initializing the state of the capture memory before the transition is driven onto the net. Without this initialization, it would be impossible to determine if two sequential transitions in the same direction (positive or negative) occurred, or if only one transition occurred (i.e., if a positive transition occurs and is captured in the memory, and the subsequent test vector also generates a positive transition, there is no way to determine if the second transition occurred without clearing the contents of the capture memory before the second transition occurs).

Changes were made to the 1149.1 TAP to allow the 1149.6 driver logic to generate pulses. It was determined that the 1149.6 TAP would require an excursion through the Run-test/Idle state to allow for the generation of the pulse or pulses required by the EXTEST_PULSE and EXTEST_TRAIN instructions. Entry into the Run-test/Idle state during the execution of either EXTEST_PULSE or EXTEST_TRAIN would generate an “AC Test” signal. This would in turn cause the data which were driven onto the net during the Update-DR state to be inverted upon entry into the Run-test/Idle state (on the first falling edge of TCK) and to be inverted again on the exit from Run-test/Idle (on the first falling edge of TCK in the Select-DR state). As previously mentioned, the data signal is inverted, and then allowed to fully decay, in order to guarantee the maximum transition from the driver.

In summary, the IEEE 1149.6 standard is an extension of the IEEE 1149.1 standard; as a result, the 1149.6 standard must comply with all 1149.1 rules. The 1149.6 logic allows for testing of AC-coupled networks by capturing edges of pulses that are generated by 1149.6 drivers. A special, analog test receiver is used to capture these edges. The 1149.6 receiver logic is placed on both inputs of the differential receiver logic. Special hysteresis logic filters out noise and captures only valid transitions. These extensions allow for an equivalent level of testing (to 1149.1) for high-speed digital interconnects.

Boundary-Scan Accessible Embedded Instruments (IEEE P1687)

The use of BIST circuits and other embedded instruments is becoming more prevalent as devices become larger, faster, and more complex. These instruments can increase the device’s test coverage, decrease test and debug time, and provide correlation between the system and ATE environments. Examples of some of these instruments are logic BIST, memory BIST (for both internal and external memory), built-in pseudo-random bit sequence (PRBS) or bit-error-rate testing for serializer/deserializer (SerDes), power management and clock control logic, and scan register dump capabilities. Currently, many of these instruments are not documented or are documented in an ad hoc manner. This makes access to the instruments difficult and time consuming at best, and oftentimes can make access impossible. The lack of a standard interface (through the IEEE 1149.1 TAP) to these instruments makes automation virtually impossible as well.

The proposed standard P1687 (IJTAG) will develop a standard methodology to access embedded test and debug features via the IEEE 1149.1 TAP [IEEE P1687-2007]. The proposed standard does not try to define the instruments themselves but instead tries to standardize the description of the embedded features and the protocols required to communicate with the embedded instrument. The proposed standard may also define requirements for the interface to the embedded features. This proposed standard is an extension to IEEE 1149.1 and uses the 1149.1 TAP to manage configuration, operation, and collection of data from the embedded instrument. More information can be found on the IEEE P1687 Web site (http://grouper.ieee.org/groups/1687).

Core-Based Testing (IEEE 1500 Standard)

There are a number of important differences between PCB and SOC (or SIP) implementations that must be considered when testing cores of an SOC [Wang 2006]. Cores can be deeply or hierarchically embedded in the SOC requiring a test access mechanism (TAM) for each core. The number and types of cores can be diverse and provided by different vendors with different types of tests and test requirements, and in the case of intellectual property (IP) cores, there may be little, if any, detailed information about the internal structure of the core. Although the clock rate of information transfer between internal cores is typically much high than that of the I/O pins of the SOC, additional ports to a core are much less costly compared to additional pins on a package. As a result, boundary scan alone does not provide a complete solution to the problem of testing the cores in an SOC. Therefore, the IEEE 1500 standard was introduced to address the problems associated with testing SOCs [Seghal 2004] [IEEE 1500-2005].

The most important feature of the IEEE 1500 standard is the provision of a “wrapper” on the boundary (I/O terminals) of each core to standardize the test interface of the core. An overall architecture of an SOC with N cores, each wrapped by an IEEE 1500 wrapper, is shown in Figure 1.7, and the structure of a 1500 wrapped core is given in Figure 1.8. The wrapper serial port (WSP) is a set of I/O terminals of the wrapper for serial operations, which consists of the wrapper serial input (WSI), the wrapper serial output (WSO), and several wrapper serial control (WSC) terminals. Each wrapper has a wrapper instruction register (WIR) to store the instruction to be executed in the corresponding core, which also controls operations in the wrapper including accessing the wrapper boundary register (WBR), the wrapper bypass register (WBY), or other user-defined function registers. The WBR consists of wrapper boundary cells (WBCs) that can be as simple as a single storage element (flip-flop for observation only), similar to the BSC shown in Figure 1.5a, or a complex cell with multiple storage elements on its shift path.

Overall architecture of a system per the IEEE 1500 standard.

Figure 1.7. Overall architecture of a system per the IEEE 1500 standard.

A core with the IEEE 1500 wrapper.

Figure 1.8. A core with the IEEE 1500 wrapper.

The WSP supports the serial test mode similar to that in the boundary-scan architecture, but without using a TAP controller. This implies that the serial control signals of 1500 can be directly applied to the cores and hence provide more test flexibility. For example, delay testing that requires a sequence of test patterns to be consecutively applied to a core can be supported by the 1500 standard. As shown in Table 1.1, in addition to the series I/O data signals WSI and WSO, the WSC consists of six mandatory terminals (WRCK, WRSTN, SelectWIR, CaptureWR, UpdateWR, and ShiftWR), one optional terminal (TransferDR), and a set of optional clock terminals (AUXCKn). The functions of CaptureWR, UpdateWR, and ShiftWR are similar to those of CaptureDR, UpdateDR, and ShiftDR of boundary scan, respectively (see Figure 1.5a). The SelectWIR is used to determine whether to select the WIR or not. This is required because no TAP controller is available for the 1500 standard. The TransferDR is used to transfer test data to correct positions on the shift path of a WBC when the shift path contains multiple storage elements. This enables multiple test data to be stored in consecutive positions of the shift path and hence allows delay testing to be carried out. Finally the AUXCKn terminals can be used to test cores with multiple clock domains.

Table 1.1. IEEE 1500 Wrapper Interface Signals

Signal

Function

Wrapper serial input (WSI)

Serial data input to wrapper

Wrapper serial output (WSO)

Serial data output from wrapper

Wrapper clock (WRCK)

Clock for wrapper functions

Wrapper reset (WRSTN)

Resets wrapper to normal system mode

SelectWIR

Determines whether to select WIR

CaptureWR

Enables capture operation of selected register

ShiftWR

Enables shift operation of selected register

UpdateWR

Enables update operation of selected register

TransferDR

Optional transfer operation of selected register

AUXCKn

Up to n optional clocks for wrapper functions

Wrapper parallel in (WPI)

Optional parallel test data from TAM

Wrapper parallel control (WPC)

Optional parallel control from TAM

Wrapper parallel out (WPO)

Optional parallel test response to TAM

In addition to the serial test mode, the 1500 standard also provides an optional parallel test mode with a user-defined, parallel test access mechanism (TAM). Each core can have its own wrapper parallel control (WPC), wrapper parallel input (WPI), and wrapper parallel output (WPO) signals. A user-defined parallel TAM, as summarized in the final three entries of Table 1.1 [IEEE 1500–2005], can transport test signals from the TAM-source (either inside or outside the chip) to the cores through WPC and WPI, and from the cores to the TAM-sink through WPO in a parallel manner; hence, it can greatly reduce the test time.

A variety of architectures can be implemented in the TAM for providing parallel access to control and test signals (both input and output) via the wrapper parallel port (WPP) [Wang 2006]. Some of these architectures are illustrated in Figure 1.9, including (1) multiplexed access where the cores time-share the test control and data ports, (2) daisy-chained access where the output of one core is connected to the input of the next core, and (3) direct access to each core.

Example of user-defined parallel TAM architectures: (a) multiplexed, (b) daisy-chain, and (c) direct access.

Figure 1.9. Example of user-defined parallel TAM architectures: (a) multiplexed, (b) daisy-chain, and (c) direct access.

Although it is not required or suggested in the 1500 standard, a chip with 1500-wrapped cores may use the same four mandatory pins as in the IEEE 1149.1 standard for chip interface so that the primary access to the IEEE 1500 architecture is via the boundary scan. An on-chip test controller with the capability of the TAP controller in the boundary-scan standard can be used to generate the WSC for each core. This on-chip test controller concept can also be used to deal with the testing of hierarchical cores in a complex system [Wang 2006].

An additional problem encountered with the increasing complexity of SOCs is the ability to verify or debug a design once it has been implemented in its final medium, for example silicon. The IEEE 1500 standard can be used for core-level testing with some capability for debug [Marinissen 2002] [Zorian 2005]. However, the process of post-silicon qualification or debug is time consuming and expensive, requiring as much as 35% to 70% of the total time-to-market interval. This is due in part to limited internal observability but primarily to the fact that internal clock frequencies are generally much higher than can be accessed through IEEE 1500 circuitry. One solution to this problem of design for debug and diagnosis (DFD) is the insertion of a reconfigurable infrastructure used initially for silicon validation or debug, with later application to system-level test and performance measurement. The infrastructure includes a wrapper fabric similar in some ways to the 1500 wrapper but reconfigurable in a manner similar to the programmable logic and routing resources of an FPGA. In addition, a signal monitoring network is incorporated along with functions analogous to an embedded logic analyzer for in-system, at-speed triggering and capture of signals internal to the SOC [Abramovici 2006].

Analog Boundary Scan (IEEE 1149.4 Standard)

There are also mixed-signal SOC implementations that include analog circuitry not addressed by either the IEEE 1149.1 or 1500 standard. Boundary scan for PCBs was extended to include mixed-signal systems with both digital and analog components in the IEEE 1149.4 standard [IEEE 1149.4-1999]. The purpose of the IEEE 1149.4 standard is “to define, document, and promote the use of a standard mixed-signal test bus that can be used at the device and assembly levels to improve the controllability and observability of mixed-signal designs and to support mixed-signal built-in test structures in order to reduce both test development time and test cost, and to improve test quality.” Figure 1.10 shows a typical 1149.4 equipped IC and its PCB-level environment. At the chip level, an 1149.4 IC has two internal analog buses (AB1/AB2) connected to its analog pins. At the board level, AB1 and AB2 are connected to two board-level analog buses through two analog testability ports (AT1/AT2). Hence, one is able to access analog pins through the testability construct.

IEEE 1149.4 internal test configuration.

Figure 1.10. IEEE 1149.4 internal test configuration.

The IEEE 1149.4 standard defines test features to provide standardized approaches to (1) interconnect testing that tests the open/short of simple interconnects, (2) parametric testing that measures electrical parameters of extended interconnects with passive devices, and (3) internal testing that tests the functionality of on-chip analog cores.

An 1149.4-compliant design has a typical circuit architecture shown in Figure 1.11. In addition to boundary-scan circuitry for the digital domain, the standard includes an analog test access port (ATAP) which consists of a minimum of one analog test input and one analog test output connection. Connections between the two internal analog test buses (AB1 and AB2) and the ATAP are controlled by the test bus interface circuit (TBIC). The analog test buses provide connections to the analog boundary modules (ABMs), which are analogous to the BSCs in the digital domain. Each ABM (see Figure 1.12) includes switches that allow connection to AB1, AB2, a high voltage (VH), a low voltage (VL), a reference voltage (VG), or an I/O pin of an analog core. This allows not only interconnect testing of analog signal nets on the PCB for opens and shorts, but perhaps more important, it allows for the test and measurement of passive components such as resistors and capacitors commonly used for filtering and coupling analog signal nets on a PCB [Wang 2006].

Typical IEEE 1149.4-compliant chip architecture.

Figure 1.11. Typical IEEE 1149.4-compliant chip architecture.

IEEE 1149.4 analog boundary module (ABM).

Figure 1.12. IEEE 1149.4 analog boundary module (ABM).

The IEEE 1149.4 standard is a superset of the IEEE 1149.1 standard, so each component responds to the mandatory instructions defined in 1149.1 [IEEE 1149.1-2001]. Herein, three types of instruction are defined: (1) mandatory instructions, (2) optional instructions, and (3) user-defined instructions.

Mandatory instructions include (1) BYPASS that isolates the chip from the entire DFT construct; (2) SAMPLE/PRELOAD that captures a digitized snapshot of the analog signal on the pin or loads a digital data pattern to specify the operation of the ABM; (3) EXTEST that disconnects analog pins from the core to perform open/short and parametric testing of the interconnects; and (4) PROBE that connects analog pins to internal buses for allowing access to the pins from edge connectors of the board.

Optional instructions recommended in the standard include (1) INTEST for internal core testing based on the PROBE instruction; (2) RUNBIST for the application of BIST circuitry; (3) CLAMP for properly conditioning the pin to VH, VL, or VG; (4) HIGHZ for isolating the pin; and (5) device identification register such as IDCODE and the USERCODE as defined in the IEEE 1149.1 standard. In addition, users can define their own instructions, which will be treated as the extension of optional instructions.

Based on the preceding instructions, open/short interconnect testing, extended interconnect measurement, and internal analog core testing can be executed. Let us use interconnect testing as an example to show how 1149.4 works. Without external instrument, the ABM can perform interconnect testing in general and open/short in particular, as shown in Figure 1.13. The tested wire (shown in bold in the figure) connects analog Pin1 of Chip1 to Pin2 of Chip2. The three-step test procedure is as follows:

  1. Switch VH to Pin1 to detect a 1 at the comparator of Pin2.

  2. Switch VL to Pin1 to detect a 0 at the comparator of Pin2.

  3. Switch VH to Pin1 to detect a 1 at the comparator of Pin2.

Example for interconnect testing using the IEEE 1149.4 bus.

Figure 1.13. Example for interconnect testing using the IEEE 1149.4 bus.

Normally, VH and VL are set to VDD and VSS, and VTH is set to 0.5VDD. The procedure detects static short faults and 1-to-0 and 0-to-1 transitions of open faults.

The most serious drawback of the 1149.4 standard is the parasitic effect associated with the long buses. A typical pin has a stray capacitance of 2~4 pF, a via has 0.5~1 pF, and a 1cm wire has 0.25~0.5 pF. Therefore, the bandwidth is severely limited by the stray capacitance of the bus and the stray resistance of the switches (in KΩ range). In that case, the standard recommends the replacement of passive switches with active current buffers and voltage buffers.

Basics of Memory Testing

Advances in deep-submicron process technology have driven VLSI designs toward system-on-chip (SOC) applications that integrate intellectual property (IP) cores from various sources. Memory is one of the most universal cores in that almost all SOC devices contain some type of embedded memories. Nowadays embedded static random-access memories (SRAMs) are widely used, because by merging memory with logic, data bandwidth is increased and hardware cost can be reduced. For pad-limited, multimillion-gate designs, embedded dynamic random-access memories (DRAMs) (and pseudo SRAMs or 1T SRAMs which contain only one transistor in each memory cell [Leung 2000]) are also becoming an attractive solution because of their compact memory density. However, with the rapid increase in capacity and density of these memory cores, the ability to detect, diagnose, and even repair all defective memories has quickly become a difficult and challenging problem, resulting in an increase in test cost and yield loss.

Fundamental topics in memory testing have been extensively covered in [van de Goor 1991]. In [Wang 2006], the authors first discuss the industry-wide use of memory fault models and (March) test algorithms (patterns), with special emphasis on memory fault simulation and test algorithm generation. A great deal of discussions is then centered on memory BIST and built-in self-repair (BISR), as well as memory diagnosis and failure analysis. The introduction of nanotechnologies and SOC devices brings forth new problems in semiconductor memory testing. Both the number of embedded memory cores and area occupied by memories are rapidly increasing in SOC devices. In addition, memories have been widely used as the technology driver—that is, they are often designed with a density that is at the extremes of the process technology. Therefore, the yield of on-chip memories usually determines chip yield. Go/no-go testing is no longer enough for embedded memories in the SOC era—memory diagnosis and failure analysis is quickly becoming a critical issue, as far as manufacturing yield and time-to-volume of SOC devices are concerned.

When memory cores (the number can be hundreds or even thousands) are embedded in an SOC and are surrounded by logic blocks, proper DFT methodology should be provided for core isolation and tester access, and a price has to be paid for the resulting hardware overhead, performance penalty, and noise and parasitic effects. Even if these are manageable, memory testers for full qualification and testing of the embedded memories will be much more expensive because of their increased speed and I/O data bandwidth, and if we also consider engineering change the overall investment in test equipment will be even higher. Fortunately, BIST has become a widely accepted practical solution to this dilemma. With BIST, the external tester requirement can be minimized, and tester time can be greatly reduced because parallel testing at the memory bank and chip levels is feasible. Another advantage of BIST is that it also is a good approach to protecting IP—that is, the IP (memory cores in this case) provider needs only deliver the BIST activation and response sequences for testing and diagnosis purposes without disclosure of the design details.

There are, however, some important issues that a pure BIST scheme does not solve, such as diagnosis and repair. High-density, high-operating clock rate, and deep-submicron technology are introducing more new failure modes and faults in memory circuits. Conventional memory testers designed for mass production tests provide only limited information for failure analysis that usually is insufficient for fast debugging. Extensive test data logging with real-time analysis, screening, compression, and diagnosis are still prohibited by high cost. Designers need a diagnosis-support mechanism within the BIST circuit and even a BISR scheme to increase product quality, reliability, and yield. The mechanism should be easy to deploy and use—tools for automatic generation and insertion of the BIST/BISR circuits as well as accompanying software and scripts are expected.

Memory repair has long been an important technique that is used to avoid yield loss. Memory repair requires redundant elements or spare elements such as spare rows, columns, or blocks of storage cells. The redundancy is added so that most faulty cells can be repaired or, more specifically, replaced by spare cells [Cenker 1979] [Smith 1981] [Benevit 1982] [Wang 2006]. Redundancy, however, adds to cost in another form. Analysis of redundancies to maximize yield (after repair) and minimize cost is a key process during manufacturing [Huang 2003]. Redundancy analysis using expensive memory testers is becoming inefficient (and therefore not cost-effective) as chip density continues to grow. Therefore, built-in redundancy analysis (BIRA) and BISR are now among the top items to be integrated with memory cores.

We illustrate the combined operation of BIST, BIRA, and BISR with the following example. Figure 1.14, taken from [Wang 2006], depicts the block diagram of a BISR scheme, including the BIST module, BIRA module, and test wrapper for the memory. The BIST circuit detects the faults in the main memory and spare memory and is programmable at the March element level [Huang 1999] [Wang 2006]. The BIRA circuit performs redundancy allocation. The test wrapper switches the memory between test/repair mode and normal mode. In test/repair mode, the memory is accessed by the BIST module, whereas in normal mode the wrapper selects the data outputs either from the main memory or the spare memory (replacing the faulty memory cells) depending on the control signals from the BIRA module. This BISR is a soft repair scheme; therefore, the BISR module will perform testing, analysis, and repair upon every power up. As Figure 1.14 indicates, the BIST circuit is activated by the power-on reset (POR) signal. When we turn on the power, the BIST module starts to test the spare memory.

Block diagram of a BISR scheme [Wang 2006].

Figure 1.14. Block diagram of a BISR scheme [Wang 2006].

Once a fault is detected, the BIRA module is informed to mark the defective spare row or column as faulty through the error (ERR) and fault syndrome (FS) signals. After finishing the spare memory test, the BIST circuit tests the main memory. If a fault is detected (ERR outputs a pulse), the test process pauses and the BIST module exports FS to the BIRA module, which then performs the redundancy analysis procedure. When the procedure is completed and the memory testing is not yet finished, the BIRA module issues a continue (CNT) signal to resume the test process. During the redundancy analysis procedure, if a spare row is requested but there are no more spare rows, the BIRA module exports the faulty row address through the export mask address (EMA) and mask address output (MAO) signals. The memory will then be operated in a downgraded mode (i.e., with smaller usable capacity) by software-based address remapping. If downgrade mode is not allowed, MAO is removed and EMA indicates whether the memory is repairable. When the main memory test and redundancy analysis are finished, the repair end flag (REF) signal goes high and the BIRA module switches to the normal mode. The BIRA module then serves as the address remapper, and the memory can be accessed using the original address bus (ADDR). When the memory is accessed, ADDR is compared with the fault addresses stored in the BIRA module. If ADDR is the same as any of the fault addresses, the BIRA module controls the wrapper to remap the access to spare memory.

Although BIST schemes, such as the one in the previous example, are promising, a number of challenges in memory testing must be considered. For example, BIST cannot replace external memory testers entirely if the BIST schemes used are only for functional testing. Even BIST with diagnosis support is insufficient because of the large amount of diagnosis data that must be transferred to an external tester, typically through a channel with limited bandwidth. Furthermore, memory devices normally require burn-in to reduce field failure rate. For logic devices IDDQ is frequently used during burn-in to detect the failing devices, but IDDQ for memories is difficult. What, then, should be done to achieve the same reliability requirement when we merge memory with logic? The combination of built-in current sensors and BIST is one possible approach, and the memory burn-in by BIST logic is another.

Yet another challenge is timing qualification or AC testing. With the shrinkage of feature size in process nodes, an increasing number of parametric failures caused by process variation are becoming critical as far as yield is concerned. In memories, these subtle defect types may cause additional circuit delays and even a system failure. The current practice of screening and diagnosing parametric defects and failures in the industry is AC testing to test the timing-critical parameters, but this is a time-consuming process as many parameters need to be tested. The lack of AC test capability is one of the reasons that BIST circuits have yet to replace traditional memory testers. For example, consider testing an asynchronous memory with synchronous BIST logic. The BIST timing resolution would not be able to compete with that of a typical external memory tester. It is hoped that this problem can be solved by proper delay fault models that may result from a systematic investigation of the relationship between delay faults and memory failures.

The success of BIST in SRAM does not guarantee its success in DRAM, flash memory, content addressable memory (CAM), etc. For embedded DRAM, as an example, the need for an external memory tester cannot be eliminated unless redundancy analysis and repair can be done on-chip, in addition to AC testing by BIST. The problems increase when merging logic with memory such as DRAM and flash memory [Wu 1998]. In addition to process technology issues, there are problems of guaranteeing the performance, quality, and reliability of the embedded memory cores in a cost-effective manner. Once again, new failure modes or fault models have to be tested because March algorithms, such as those used in SRAM BIST schemes, are considered insufficient for DRAM or flash memory.

Flash memory is by far the most popular nonvolatile memory, and it has been widely used in portable devices such as PDAs, cell phones, MP3 players, and digital cameras. The advent of deep-submicron IC manufacturing technology and SOC design methodology has greatly increased the use of embedded flash memory in those applications. For commodity flash memory, it is customary to test and repair the memories using a probe station during wafer-level test. However, for embedded flash memory, BIST/BISR has been considered the most cost-effective solution that addresses this issue, though more research is needed to make it practical. It should be also noted that for flash memory, there are special faults and redundancy architectures that are more complicated than SRAM [Yeh 2007].

Although flash memory is widely used today, it suffers from problems such as the use of high voltage for program and erase operations, as well as reliability issues. The industry has been trying to find a new nonvolatile memory to replace flash memory. Possible candidates include magnetic random-access memory (MRAM), ferroelectric random-access memory (FeRAM), and ovonic universal memory (OUM). Among them, MRAM has the advantages of high speed, high density, low power consumption, and almost unlimited read/write endurance [Su 2004, 2006]. The data storing/switching mechanism of MRAM is based on the resistance change of the magnetic tunnel junction (MTJ) device in each cell [Su 2004, 2006]. In [Su 2006], some MRAM testing issues are addressed including fault modeling, testing, and diagnosis. The write disturbance fault (WDF) model for MRAM is stressed, which is a fault that affects the data stored in the MRAM cell because of the excessive magnetic field during the write operation. The faulty behavior is mainly represented by the variation of the MTJ cell operating region.

In summary, although there have been many advances in memory testing, there are also many new challenges for the future. The basics of memory testing were covered extensively in [van de Goor 1991] and [Wang 2006], and some of the new memory test challenges have been summarized here. Memory testing is an important topic and will continue to be discussed throughout various chapters of this book in terms of how it relates to that particular topic area.

SOC Design Examples

Numerous SOC designs are developed and manufactured each year. Today, designing a complex SOC circuit is no longer a mission-impossible task. Designers can often pick commercially available IP cores and integrate them together to fit the design requirements. These IP cores range from digital to memory to analog and mixed-signal cores. Table 1.2 lists a few popular IP cores and some of the major IP suppliers.

Table 1.2. Popular IP Cores and Major IP Suppliers

IP Core

Major IP Suppliers

Central processing unit (CPU)

ARM, MIPS, Tensilica, ARC

Digital signal processor (DSP)

Texas Instruments

Static random-access memory (SRAM)

Virage Logic, ARM, MoSys

Dynamic random-access memory (DRAM)

Virage Logic, ARM

Structured ASIC (FPGA)

Digital-to-analog converter (DAC)

Analog Devices

Analog-to-digital converter (ADC)

Analog Devices

Universal serial bus (USB)

Phase-locked loop (PLL)

ARM, Virage Logic

In this section, we illustrate a few representative SOC designs. More SOC designs and architectures can be found in [Dally 2004] and [De Micheli 2006]. The SOC designs selected here mainly cover the whole spectrum of IP cores, which can represent significant test challenges in the nanometer design era. Our objective is to convey to the reader that unless these SOC design-related test issues were solved in a timely manner, few billion-transistor SOC designs and safety-critical nanoscale devices would come to life to benefit the whole society.

BioMEMS Sensor

The wireless bioMEMS sensor developed at National Taiwan University is the first wearable prototype to detect C-reactive protein (CRP) based on nanomechanics [Chen 2006]. The CRP concentration in human serum is below 1μg/mL for a healthy person but may rise up to 100 or even 500 times in response to infection in the body. In [Blake 2003], the authors showed that high levels of CRP in the bloodstream raise the risk of a heart attack. One of the greatest benefits of this biosensor is that cardiovascular-event-related proteins, such as CRP, can be dissociated from anti-CRP by applying a low-frequency AC electric field (with 0.2 Hz 1 V electrical signal) to the sensor. This allows the design of a wireless label-free detection system for disease-related CRP using a microcantilever [Arntz 2003] with a safe “reusable” feature.

The CRP sensing mechanism is shown in Figure 1.15 [Chen 2006]. First, CMOS compatible silicon nitride is deposited on the silicon substrate. Next, a MEMS cantilever is fabricated by photolithography followed by micromachining technology. The optimum sensor size of a length of 200μm with each leg 40μm wide is determined by the balance among spring constant, bio-induced stress and stability in the flow field. On the top side of the cantilever, chromium (Cr), gold (Au), bio-linker, and anti-CRP are deposited. Chromium is adopted to improve the adhesion of gold to silicon nitride, whereas the gold layer is used to immobilize the bio-linker. The anti-CRP is bonded to the bio-linker for probing CRP. Specific biomolecular interactions between CRP and anti-CRP alter the intermolecular nanomechanical interactions within the bio-linker layer. As a result, the cantilever is bent. The bending of the cantilever can be measured by optical beam deflection or piezoresistive techniques. The authors adopted the former approach because of its excellent minimum detection limit of 0.1 nm.

CRP sensing mechanism using microcantilever.

Figure 1.15. CRP sensing mechanism using microcantilever.

To demonstrate how the prototype works, the authors in [Chen 2006] further set up an experiment, as shown in Figure 1.15. After placing a poly-dimethyl-siloxane (PDMS)-based “cap” above the functionalized cantilever, two microfluidic channels and one liquid chamber for reaction are formed. The reagents are injected into the channels and chamber using a syringe pump. A laser beam through the optically transparent PDMS cap is focused onto the tip of the cantilever with the help of a charge-coupled device (CCD) camera, thereby ensuring the alignment among the laser beam, the cantilever, and the position-sensitive detector (PSD). The function blocks of the wireless CRP measurement system are shown in Figure 1.16 [Chen 2006]. Custom commands from a personal computer (PC) are received by a 0.45V operated ASK receiver consisting of a common-source amplifier, gain stages, and a diode-RC demodulator. The 0.45 V operation is achieved by using low-threshold (0.2V) transistors. After decoding the commands, the microcontroller unit (MCU) activates the 8-bit charge-redistributed analog-to-digital converter (ADC) and preamplifier to convert the analog bio-signals measured by the commercially available PSD into digital values. Finally, an ASK transmitter comprising a ring oscillator and a common-source class-C power amplifier transmits the digital values to the PC for analysis.

Function blocks of the wireless CRP measurement system.

Figure 1.16. Function blocks of the wireless CRP measurement system.

Fabricated at a 180-nm process node, the wireless ASK chip is operated at 500KHz and has a die size of 3.4 mm2. Although this chip is small, the wireless CRP measurement system presents many test challenges, because it includes a microfluidic bioMEMS sensor, a PSD module, an RF transceiver (transmitter and receiver), an analog preamplifier, an ADC, and an MCU embedded with memory and analog circuits state control logic. Today, semiconductor lasers [Arntz 2003] and microlenses used in a compact disc (CD) player can be placed on top of the PDMS cap to replace the He-Ne laser, the reflected mirror, and the lens. A standard CMOS poly-resistor layer can be also used as the cantilever for the detection of the piezoresistive change caused by bending. Thus, the authors anticipate that the bioMEMS sensor, the semiconductor laser, and the microlenses can be also integrated with the PSD module and the wireless ASK circuit on the same die. This will make testing of this SOC device more challenging. Self-diagnosis and self-calibration might be needed to ensure the correct operation of the device all the time. When the device becomes implantable, it will require BISR.

Network-on-Chip Processor

The Cell processor [Pham 2005, 2006] co-designed by Sony, Toshiba, and IBM is one of the earliest network-on-chip (NOC) processors developed to address high-performance distributed computing. Built as the first-generation multiprocessor with a vision of bringing supercomputer power to everyday life, the Cell processor supports multiple operating systems including Linux and is designed for natural human interactions including photorealistic, predictable real-time response, and virtualized resource for concurrent activities. The processor architecture includes one power processor element (PPE), eight synergistic processor elements (SPEs), an element interconnection bus (EIB), a memory interface controller (MIC), a bus interface controller (BIC), a pervasive unit (PU) that supports extensive test, monitoring, and debug functions, a power management unit (PMU), and a thermal management unit (TMU). The chip was fabricated at a 90-nm process node and can operate at more than 4 GHz at a nominal voltage of 1.1 V or higher. The high-level chip diagram is shown in Figure 1.17.

The EIB in the Cell processor.

Figure 1.17. The EIB in the Cell processor.

The PPE is a 64-bit dual-threaded processor based on the Power Architecture [Rohrer 2004]. The processor contains a power execution unit (PXU), 32-KB instruction and data caches (L1), and a 512-KB cache (L2). Each SPE contains a synergistic execution unit (SXU), which is an independent processor, and a 256-KB local store (LS). The EIB connects the PPE, the eight SPEs, the MIC, and the BIC. The EIB is central to the Cell processor. This coherent bus transfers up to 96 bytes per processor cycle. The bus is organized as four 16-byte-wide (half-rate) rings, each of which supports up to three simultaneous data transfers. A separate address and command network manages bus requests and the coherence protocol. With a dual-threaded PPE and eight SPEs, this Cell processor is capable of handling 10 simultaneous threads and over 128 outstanding memory requests.

The MIC supports two Rambus XDR memory banks. For added reliability, the MIC supports error correcting code (ECC) and a periodic ECC scrub of memory. For debug and diagnosis, the MIC also allows the processor to be stopped and its system state scanned out. The BIC provides two Rambus RRAC flexible I/O (FlexIO) interfaces with varying protocols and bandwidth capabilities to address differing system requirements. The interfaces can be configured as either two I/O interfaces (IOIF 0/1) or as an I/O and a coherent SMP interface (IOIF and BIF). Both MIC and BIC are operated asynchronously to the processor; hence, each bus contains speed-matching SRAM buffers and logic. The processor side operates at half the global processor clock rate, whereas the XDR side operates at one half the rate of the XDR interface and the FlexIO side operates at 1/2 or 1/3 the rate of the RRAC transceivers.

The PU (not shown in Figure 1.17) contains all of the global logic needed for basic functional operation of the chip, lab debug, and manufacturing test. To support basic functional operation of the chip, the PU contains a serial peripheral interface (SPI) to communicate with an external controller during normal operation, clock generation and distribution logic to enable the correct phase-locked loop (PLL) functions on the chip, and power-on-reset (POR) that systematically initializes all the units of the processor. The POR engine has a debug mode that allows its embedded 32 instructions to be single-stepped, skipped, or performed out of order. For lab debug and bring up, the PU contains chip-level global fault isolation registers to allow the operating system to quickly determine which unit generated an error condition and a performance monitor (PFM), which consists of a centralized unit connected to all functional units on the chip via a trace/debug bus to assist with performance analysis. An on-board programmable trace logic analyzer (TLA) captures and stores internal signals while the chip is running at full speed to assist with debug. An on-chip control processor, with an IEEE 1149.1 boundary-scan interface, is also available to help with debug. For manufacturing test, the PU supports 11 different test modes, including the array built-in self-test (ABIST), commonly referred to as memory BIST (MBIST), which tests all memories in parallel to reduce test time, and the logic BIST (LBIST) that includes a centralized controller in the PU and 15 LBIST satellites elsewhere in the design. At-speed BIST is provided on both ABIST and LBIST, and at-speed scan is implemented on internal scan chains. The PU also provides the logic used for programming and testing electronic fuses (eFuses), which are used for array repair and chip customization during the manufacturing test.

The PMU and TMU (not shown in Figure 1.17) work in tandem to manage chip power and avoid permanent damage to the chip because of overheating. For power reduction, the PMU provides a mechanism to allow software controls to reduce chip power when the full processing capabilities are not needed. The PMU also allows the operating system to throttle (by adjusting instruction issue rate), pause, or stop single or multiple elements to manage chip power. The processor embeds a linear sensor (linear diode) to monitor the global chip temperature for controlling external cooling mechanisms. Ten digital thermal sensors are distributed on the chip to monitor temperatures in critical local regions (hot spots). One digital thermal sensor (DTS) is located in each element, and one is adjacent to the linear sensor. The TMU continuously monitors each DTS and can be programmed to dynamically control the temperature of each element and interrupt the PPE when a temperature specified for each element is observed. Software controls the TMU by setting four temperature values and the amount of throttling for each sensor in the TMU. With increasing temperature, the first temperature specifies when the throttling of an element stops, the second when throttling starts, the third when the element is completely stopped, and the fourth when the chip’s clocks are shut down.

In contrast to bus interconnect technology commonly found in SOC designs, NOC-platform-based SOC designs tend to contain two or more processors and a programmable crossbar switch (for dynamic routing) to improve on-chip communication efficiency [Dally 2004] [Jerraya 2004] [McNairy 2005] [Naffziger 2005, 2006] [De Micheli 2006]. The multiprocessor architecture and crossbar switch have represented additional test challenges and spurred eminent DFT needs for at-speed BIST [Wang 2006], silicon debug and diagnosis [Abramovici 2006] [Wang 2006], online self-checking and fault tolerance [Siewiorek 1998], software-based self-testing [Cheng 2006], FPGA testing [Stroud 2002], and high-speed I/O interfaces testing [Gizopoulos 2006].

About This Book

The subsequent chapters present promising techniques to address critical ITRS needs and challenges for testing nanometer SOC designs [SIA 2005, 2006]. These techniques include specific DFT architectures for testing digital, memory, as well as analog and mixed-signal circuits. As the test needs and challenges facing the semiconductor industry in the nanometer age are so broad and difficult in nature, it is becoming evident that further research must be conducted and better solutions have to be found on all subjects mentioned here.

DFT Architectures

SOC designs contain a variety of components, including digital, memory, as well as analog and mixed-signal circuits, all of which need to be tested. DFT is essential for reducing test costs and improving test quality.

Scan is widely used for digital logic testing. However, one of the challenges in the nanometer design era is dealing with the rapidly growing amount of scan data required for testing complex SOC designs. The bandwidth between an external ATE and the device under test is limited, creating a bottleneck on how fast the chip can be tested. This has led to the development of test compression and logic BIST architectures, which reduce the amount of data that needs to be transferred between the ATE and device under test. Another key issue for scan testing is applying at-speed tests, which are crucial for detecting delay faults. Chapter 2 reviews the basics of scan testing and presents test compression and logic BIST architectures as well as architectures for applying at-speed tests.

SOC designs contain numerous embedded cores, each of which has a set of tests that need to be applied. This presents challenges in terms of providing test access to the embedded cores and dealing with high test data volumes. Chapter 4 presents DFT architectures that facilitate low-cost modular testing of SOCs. Optimization techniques are described for wrapper designs and test access mechanisms to facilitate efficient test scheduling and reduce test data volume. Network-on-chips (NOCs), which provide a packet-based mechanism for transferring data between cores, are becoming increasingly attractive as nanometer design complexity increases. Chapter 4 discusses DFT architectures for testing NOCs including the interconnect, router, and network interface.

A system-in-package (SIP) is a combination of semiconductors, passives, and interconnect integrated into a single package. The use of SIPs is growing rapidly. A SIP contains multiple dies in the same package that presents additional test issues beyond those of an SOC. The assembly of the SIP needs to be tested, and the SIP may include MEMS and RF components. Chapter 5 provides background on SIPs and discusses the test challenges along with some solutions.

Field programmable gate arrays (FPGAs) are widely used and have grown in complexity to where they are able to implement very large systems. The reprogrammability of FPGAs makes them an attractive choice, not only for rapid prototyping of digital systems but also for low-to-moderate volume or fast time-to-market systems. Testing FPGAs requires loading numerous test configurations, which can be time consuming. Chapter 12 provides background on the architecture and operation of FPGAs and presents approaches for testing FPGAs as well as RAM cores embedded in FPGAs.

High-speed I/O interfaces involve using source-synchronous serial links that can have speeds exceeding 1 Gb/s. These interfaces are often used on digital systems and, for economic reasons, it is necessary to test them using a digital tester. Chapter 14 describes high-speed I/O architectures and techniques for testing them.

Analog ICs and the analog portions of mixed-signal circuits require different test techniques than those applied to digital components. Analog testing involves specification-based approaches as opposed to the fault-model-based and defect-based approaches used for digital testing. Chapter 15 reviews analog testing and then focuses on test architectures that can be used for on-chip testing and measurement of the analog portion(s) of SOC designs.

New Fault Models and Advanced Techniques

Nanometer technology is more susceptible to process variations and smaller defects, which often manifest themselves as timing-related problems. Conventional stuck-at fault testing becomes increasingly less effective, and new fault models need to be considered. Moreover, the lower voltage levels and smaller noise margins in nanometer technology make it increasingly susceptible to transient errors, thereby creating new challenges in terms of dealing with soft errors and reliability issues.

Delay faults are prevalent in nanometer technology and must be detected to achieve high quality. Delay faults can arise because of global process variations or local defects. Testing for delay faults typically requires two-pattern tests, which add an additional complication for test application in scan circuits. Chapter 6 describes delay test approaches including new defect-based delay fault models along with simulation and test generation techniques.

Power dissipation is typically much higher during testing than during normal operation because there is much greater switching activity. Excessive average power during testing can result in overheating and hot spots on the chip. Excessive peak power during testing can cause ground bounce and power supply droop, which may cause a good part to fail unnecessarily. Chapter 7 presents techniques for reducing power dissipation during testing, including low-power scan and BIST techniques.

New defect mechanisms are evolving in nanometer technology such as copper-related defects, optical defects, and design-related defects caused by threshold voltage variations and the use of variable power supply voltages in low power designs. Defects do not necessarily manifest themselves as a single isolated problem such as an open or a short. A circuit parameter out of specification can cause increased susceptibility to other problems (temperature effects, crosstalk, etc.). Moreover, the lower voltage levels and smaller noise margins in nanometer technology result in increased susceptibility to radiation-induced soft errors. Chapter 8 describes techniques for coping with the physical failures, soft errors, and reliability issues that arise in nanometer technology.

SOCs typically contain one or more processors. Software running on a processor can be used to perform a self-test of the SOC. This offers a number of advantages including reducing the amount of DFT circuitry required, providing functional at-speed tests, and avoiding the problem of excessive power consumption that arises in structural tests. Chapter 11 describes software-based self-testing techniques that can target the different components of an SOC, including the processor itself, global interconnect, nonprogrammable cores, and analog and mixed-signal (AMS) circuits.

As wireless devices are becoming increasingly prevalent, RF testing has been gaining importance. RF testing has different considerations than conventional analog and mixed-signal testing and requires the use of a different set of instruments to perform the measurements. Noise is a particularly important factor in RF measurements. Chapter 16 provides background on RF devices and RF testing methods. Advanced techniques for testing RF components along with future directions in RF testing are also described.

Yield and Reliability Enhancement

Nanometer technology is increasingly less robust because of less precise lithography, greater process variations, and greater sensitivity to noise. To cope with these challenges, features must be added to the design to enhance yield and reliability. How to do this in a systematic and effective manner is a topic of great interest and research.

One way to improve reliability is to have a fault-tolerant design that can continue operation in the presence of a fault. Fault tolerance requires the use of redundancy and hence adds area, performance, and power overhead to a design. In the past, fault-tolerant design has mainly been used only for mission-critical applications (medical, aviation, banking, etc.) where the cost of failure is very high. Most low-cost systems have not incorporated much fault tolerance. However, as failure rates continue to rise in nanometer SOC designs, which are increasingly susceptible to noise, it is becoming necessary to incorporate fault tolerance even in mainstream low-cost systems. Chapter 3 reviews the fundamentals of fault-tolerant design and describes many of the commonly used techniques.

Design for manufacturability (DFM) involves making layout design changes to improve any aspect of manufacturability, from mask making through lithography and chemical-mechanical processing. DFM is increasingly important in nanometer technology. Design for yield (DFY) involves techniques that are specifically targeted toward improving yield. Coming up with metrics for yield that could be optimized during design is a major challenge because the manufacturing process is not static; hence the relative importance of different factors impacting yield is constantly changing as process improvements are made. Chapter 9 discusses DFM and DFY, gives examples, and highlights the issues and challenges.

An important step for ramping up yield is to be able to debug and diagnose silicon parts that fail. By identifying the root cause of a failure, it is possible to correct any design bugs that may exist and improve the manufacturing process. Debug and diagnosis can be time consuming. Chapter 10 describes design for debug and diagnosis (DFD) techniques that involve adding features to a design to help speed up the process. These techniques permit engineers to more rapidly obtain information to aid in the process of finding the root cause of a failure.

Nanotechnology Testing Aspects

Although CMOS technology is not predicted to reach fundamental scaling limits for another decade, alternative emerging technologies are being researched in hopes of launching a new era in nanoelectronics. Future nanoelectronic devices are expected to have extremely high defect rates, making test and fault tolerance key factors for obtaining working devices. Another emerging technology is microelectromechanical systems (MEMS) devices, which are miniature electromechanical sensors and actuators fabricated using VLSI processing techniques. Testing MEMS is an interesting topic as MEMS physically interact with the environment, rather than only electrical interfaces like traditional ICs, as illustrated in the bioMEMS sensor example presented in this chapter.

MEMS has many application arenas, including accelerometers, pressure sensors, microoptics, inkjet nozzles, optical scanners, and fluid pumps, and has the potential to be used in many other applications. Chapter 13 describes MEMS devices and discusses the techniques that are used to test and characterize them. In addition, DFT and BIST techniques are presented that have been proposed as well as implemented on commercially available MEMS devices.

Several promising nanoelectronic technologies are emerging, including resonant tunneling diodes (RTDs), quantum-dot cellular automata (QCA), silicon nanowiressingle electron transistors, and carbon nanotubes (CNTs). These are all described in Chapter 17, along with the research work that has begun to develop test techniques for them.

Exercises

1.1

(Defect Level) Assuming the process yield (Y) is 90% and the device’s fault coverage (FC) is 80%, 90%, or 99%, calculate their respective defect levels in terms of defective parts per million (PPM).

1.2

(Defect Level) Consider a system that contains five printed circuit boards (PCBs) each embedded with 40 ASIC devices. Assume that the cost to repair a device on a PCB is $1000. If each ASIC device is manufactured with a process yield of 80% and tested with fault coverage of 90%, what is the manufacturing rework cost to repair the system?

1.3

(Fault Coverage) What’s the fault coverage (FC) required to achieve 500 PPM for a given process yield (Y) of 90%?

1.4

(Process Yield) What’s the process yield (Y) for a given device to arrive at 100 PPM when the device’s fault coverage (FC) is 95%?

1.5

(Mean Time between Failures) Suppose a computer system has 10,000 components each with a failure rate of 0.01% per 1000 hours. What is the period of 99% system reliability?

1.6

(Mean Time between Failures) For a system containing N identical subsystems each with a constant failure rate λ, what is the overall system reliability? If the system contains three identical subsystems, what is the mean time between failures (MTBF) of the system?

1.7

(System Availability) Repeat Exercise 1.6. Suppose each of the three subsystems anticipates 20 failures per 1,000 hours and we need system availability of at least 99%. What is the repair time it would take to repair the system each year?

1.8

(Boundary Scan) How many mandatory and optional boundary-scan pins are required according to the IEEE 1149.1 and 1149.6 standards? Can they be shared with existing primary input or output pins? List three reasons for implementing boundary scan in an ASIC device.

1.9

(Boundary-Scan Extension) Explain why the IEEE 1149.1 standard cannot test high-speed, AC-coupled, differential networks and how the IEEE 1149.6 standard solves this problem. Use timing diagrams to show the problem with AC-coupled nets and how the IEEE 1149.6 standard solves the problem. Draw a timing diagram that shows how an 1149.6 Update/Capture cycle might look (including all relevant TAP actions), and show data driven and captured by the 1149.6 logic.

1.10

(Boundary-Scan Accessible Embedded Instruments) Give an example in which the proposed IEEE P1687 boundary-scan accessible embedded instruments standard (IJTAG) may aid in automating design and test reuse for an embedded test instrument.

1.11

(Core-Based Testing) How many mandatory and optional core-based test pins are required according to the IEEE 1500 standard? Can they be shared with existing primary input or output pins? List three reasons for implementing core-based wrappers in an ASIC device.

1.12

(Analog Boundary Scan) How many mandatory and optional core-based test pins are required according to the IEEE 1149.4 standard? Can they be shared with existing primary input or output pins? List three reasons for implementing analog boundary scan in an ASIC device.

1.13

(Analog Boundary Scan) Figure 1.18 shows a typical two-port network connecting (P1 P2) of chip A and (P3 P4) of chip B. Assume that P2 and P4 are grounded. The hybrid (H) parameters are defined by the equations given here:

Exercises
  1. Fill the following configuration table for the H parameter measurement.

  2. Repeat (a) for the measurements of Y, Z, and G parameters.

Testing two-port network.

Figure 1.18. Testing two-port network.

Note: Vs: voltage source; Vm: voltage meter; Is: current source; Im: current meter.

H

P1

P3

 

h11

Is, Vm

GND

h11=Vm/Is

h12

  

h12=

h21

  

h21=

h22

  

h22=

  • Each entry in the table shall be filled with: Vs, Is, Vm, Im, GND, and Open.

  • The example in h11 indicates that P1 is connected to the current source and the voltage meter; P3 is connected to ground (GND); h11 is obtained by the equation in the last column.

1.14

(Memory Testing) Define the following terms about memory testing: BIST, BIRA, and BISR. What are the advantages and disadvantages of soft repair over hard repair?

Acknowledgments

The authors wish to thank Professor Kuen-Jong Lee of National Cheng Kung University for contributing a portion of the IEEE 1149.1 and 1500 sections; William Eklow of Cisco Systems for contributing the IEEE 1149.6 and P1687 (IJTAG) sections; Professor Chauchin Su of National Chiao Tung University for contributing the IEEE 1149.4 section; Professor Cheng-Wen Wu of National Tsing Hua University for contributing the Basics of Memory Testing section; Professor Shey-Shi Lu of National Taiwan University for providing the bioMEMS sensor material; Professor Xinghao Chen of The City College and Graduate Center of The City University of New York for providing valuable comments; and Teresa Chang of SynTest Technologies for drawing a portion of the figures.

References

Books

Importance of System-on-Chip Testing

Basics of SOC Testing

Basics of Memory Testing

SOC Design Examples

About This Book

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset