Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. Field Programmable Gate Array Testing

Charles E. StroudAuburn University, Auburn, Alabama

About This Chapter

Since the mid-1980s, field programmable gate arrays (FPGAs) have become a dominant implementation medium for digital systems. During that time, FPGAs have grown in complexity as their ability to implement system functions has increased from a few thousand logic gates to tens of millions of logic gates. The largest FPGAs currently available exceed a billion transistors. The ability to program, or configure, FPGAs to perform virtually any digital logic function provides an attractive choice, not only for rapid prototyping of digital systems but also for low-to-moderate volume or fast time-to-market systems. The ability to reconfigure FPGAs to perform different digital logic functions without modifying the physical system facilitates implementation of advanced system applications such as adaptive computing and fault tolerance.

The chapter begins with an overview of the typical architecture and configuration features of FPGAs as well as the testing challenges posed by these complex devices. We follow this discussion with a review of the various approaches to testing FPGAs that have been proposed and implemented. The remaining sections discuss testing and diagnosis of various resources incorporated in FPGAs, including programmable logic and routing resources as well as specialized cores such as memories and digital signal processors (DSPs). The chapter concludes with a discussion of one of the new frontiers in FPGA testing in which embedded processor cores within the FPGA are used to test and diagnose the FPGA itself.

Overview of FPGAs

FPGAs come in a variety of sizes and features, which continue to change with advances in integrated circuit fabrication technology and system applications. As a result, it is difficult to write a comprehensive treatise on FPGAs because the material tends to become obsolete shortly after publication. Although a number of books ([Trimberger 1994] [Oldfield 1995] [Salcic 1997]) and papers ([Brown 1996a, 1996b]) have been written about FPGA architectures and operation, a wealth of current information can be found in the data sheets and application notes provided by the FPGA manufacturers such as [Altera 2005], [Atmel 2005], [Cypress 2005], [Lattice 2006], [Xilinx 2006a], and [Xilinx 2006b]. The following subsections provide an overview of FPGAs based on the typical architectures, features, and capabilities that are currently available.

Architecture

FPGAs generally consist of a two-dimensional array of programmable logic blocks (PLBs) interconnected by a programmable routing network with programmable input/output (I/O) cells at the periphery of the device, as illustrated in Figure 12.1. Each PLB usually consists of one or more lookup tables (LUTs) and flip-flops, as illustrated in Figure 12.2. A LUT typically has four inputs and is used to implement combinational logic by storing the truth table for any combinational logic equation of up to four input variables. It should be noted, however, that some FPGAs have LUTs with as few as three inputs or as many as six inputs. The LUT in some FPGAs can also be configured to function as a small random access memory (RAM) or shift register. The flip-flops in the PLB are used to implement sequential logic and often include programmable clock enable and set/reset capabilities. The flip-flops in some FPGAs can also be configured to operate as level sensitive latches. Additional logic besides LUTs and flip-flops is usually incorporated in the PLB; some examples include multiplexers for combining LUTs to construct large combinational logic functions via the Shannon expansion theorem, fast carry logic to implement adders/subtractors or counters, and logic to implement special functions like array multipliers.

Figure 12.1. Typical FPGA architecture.

Figure 12.2. Basic PLB architecture.

The programmable interconnect network consists of horizontal and vertical wire segments of varying lengths along with programmable switches, referred to as programmable interconnect points (PIPs) or configurable interconnect points (CIPs). The lengths of the wire segments are usually referred to by the number of PLBs they span where typical lengths include one, two, four, six, and eight PLBs. There are also long lines that span one quarter, one half, and the full array of PLBs. The wire segments and their associated PIPs can be classified as local and global routing resources. Local routing resources are used to interconnect internal portions of the PLB, directly adjacent PLBs, or to connect the PLB to the global routing resources. Global routing resources interconnect PLBs with other nonadjacent PLBs, I/O cells, and specialized cores.

The PIPs are used to connect or disconnect the wire segments to form the signal nets for the system application. The basic PIP consists of a transmission gate controlled by a configuration memory bit as illustrated in Figure 12.3a. When the PIP is activated (turned on), the two wire segments are connected to provide a continuous signal path. When the PIP is deactivated (turned off), the two wire segments are isolated and can provide paths for two independent signals. There are several types of PIP structures with the simplest being the break-point PIP (see Figure 12.3b), which connects/disconnects two vertical or two horizontal wire segments. The cross-point PIP (see Figure 12.3c) connects a vertical wire segment and a horizontal wire segment such that the signal path can turn a corner and fanout on both vertical and horizontal routing resources. Multiple PIPs and their associated wire segments can be arranged to form more complicated and sophisticated programmable routing structures. For example, six break-point PIPs can be arranged as illustrated in Figure 12.3d to form a compound cross-point PIP where two independent signal paths can be routed straight through (one vertical and the other horizontal) or turn corners. The programmable routing structure most frequently encountered in current FPGAs uses N break-point PIPs to form a multiplexer PIP (see Figure 12.3e) where one of the N inputs can be selected and routed to the multiplexer output. The output of most multiplexer PIPs is buffered to prevent signal degradation on long and/or heavily loaded signal nets.

Programmable interconnect points: (a) basic PIP structure, (b) break-point PIP, (c) cross-point PIP, (d) compound cross-point PIP, and (e) multiplexer PIP.

Figure 12.3. Programmable interconnect points: (a) basic PIP structure, (b) break-point PIP, (c) cross-point PIP, (d) compound cross-point PIP, and (e) multiplexer PIP.

A simplified programmable I/O cell is illustrated in Figure 12.4 and consists of a bidirectional buffer with tristate control to facilitate user specification of input, output, and bidirectional ports for any given system application. Multiplexers controlled by configuration memory bits are used to select registered or nonregistered signals to or from the pad. These registers can be flip-flops, latches, or flip-flops that can also be configured as a latch. They are used to meet critical system timing specifications including setup time, hold time, and clock-to-output delay. Multiple flip-flops and latches are incorporated for the input and the output portion of the I/O cell to support double-data-rate (DDR) and/or serial-to-parallel and parallel-to-serial conversions for high-speed data interfaces (SERDES). Programmable clock enable and set/reset signals, under the control of configuration memory bits, are often associated with each register. In addition, the bidirectional buffer typically has programmable current drive and voltage level capabilities for various I/O voltage standards along with programmable pullup/pulldown features and programmable delay characteristics. In some FPGAs, two I/O cells can be combined to support differential pair I/O standards.

Figure 12.4. Typical I/O cell structure.

A trend in FPGAs is to include cores for specialized functions such as single-port and dual-port RAMs, first-in first-out (FIFO) memories, multipliers, and DSPs. Within any given FPGA, all memory cores are usually of the same size in terms of the total number of memory bits, but each memory core in the array is individually programmable. For example, RAM cores can be individually configured to select the mode of operation (single-port, dual-port, FIFO, etc.), architecture in terms of number of address locations versus number of data bits per address location, active edge of clocks, and active level of control signals (clock enables, set/reset, etc.), as well as options for registered input and output signals. The types and quantities of programmable resources are summarized in Table 12.1 and represent the typical small and large FPGAs available.

Table 12.1. Typical Quantities of Programmable Resources in Current FPGAs

	FPGA Resource	Small FPGA	Large FPGA
Logic	PLBs per FPGA	256	25,920
	LUTs and flip-flops per PLB	1	8
Routing	Wire segments per PLB	45	406
	PIPs per PLB	140	4100
Specialized Cores	Bits per memory core	128	36,864
	Memory cores per FPGA	16	576
	DSP cores	0	512
Other	Input/output cells	62	1200
	Configuration memory bits	42,104	79,704,832

Another trend is the incorporation of one or more embedded processor cores. These processor cores range in size from 8-bit to 64-bit architectures and can operate at clock frequencies as high as 450 MHz. In some devices, the program and data memories are dedicated to the processor core, whereas in other devices the RAM cores within the FPGA are used to construct the program and data memories. In the latter case, programmable logic and routing resources are also required to implement the interface between the processor and the RAM cores used for program and data memories. An alternative to the dedicated embedded processor core, referred to as a hard core, is the synthesis of soft core processors into the programmable logic, routing, and memory resources of the FPGA. In the case of a soft core, the processor architecture is described in a hardware description language, such as Verilog or VHDL, and the normal system design flow is used for integration and synthesis of the processor core with the system application. Soft core processors typically run at lower maximum clock frequencies than hard core processors with dedicated and optimized resources. However, greater versatility and flexibility are associated with the soft core in terms of the number of processor cores that can be incorporated in a single FPGA and the ability to incorporate the soft processor core in almost any FPGA.

With the incorporation of embedded cores such as memories, DSPs, and microprocessors, FPGAs more closely resemble system-on-chip (SOC) implementations. At the same time, some SOCs incorporate embedded FPGA cores; these devices are often referred to as configurable SOCs. Complex programmable logic devices (CPLDs) are similar to FPGAs in that they contain programmable logic and routing resources as well as specialized cores such as RAMs and FIFOs. The primary difference is that CPLDs typically use programmable logic arrays (PLAs) for implementing combinational logic functions instead of the LUTs typically found in FPGAs. As a result, a PLB in a CPLD tends to have more flip-flops and combinational logic resources than a PLB in an FPGA. However, FPGA has become the most frequently used term to refer to all programmable logic devices including CPLDs. Therefore, the term FPGA will be used in this chapter, but it should be noted the techniques presented are also applicable to CPLDs and embedded FPGA cores in SOC implementations.

Configuration

The system function performed by an FPGA is controlled by an underlying programming mechanism. Early programming technologies included fuses, which later changed to antifuses, as well as floating-gate technologies typically used in erasable programmable read-only memories (PROMs) and, more recently, in flash memories [Trimberger 1994]. The advantage of these programming technologies is their nonvolatility such that the configuration data for the system function are not lost when the power to the device is turned off. The disadvantage of fuse/antifuse technologies is that they are one-time-programmable (OTP). The disadvantage of floating-gate technologies is that, with the exception of flash memory, they are not in-system reconfigurable (ISR) and in some cases are not in-system programmable (ISP).

The most frequently used programming technology in current FPGAs is a static RAM. Although RAM-based configuration memories are ISR, they are also volatile. For a RAM-based FPGA to operate in a system, the configuration memory must first be written with the data that will define the operation of that FPGA in the system. This process is frequently referred to as downloading the configuration data to the FPGA, but is also called configuring or programming the FPGA. The configuration data include the contents of the LUTs, which specify the combinational logic functions performed as well as the various bits used to control the mode of operation of flip-flops/latches for sequential logic functions. A large portion of the configuration memory is used to control the PIPs that are used to route signals for interconnection of the system logic functions. Other configuration memory bits control the modes of operation of I/O cells and specialized cores such as RAMs, FIFOs, and DSPs. The system function can be changed at any time by rewriting the configuration memory with new data, referred to as reconfiguration. The time required to completely configure or reconfigure an FPGA can be long, particularly for FPGAs with large configuration memories. Alternatively, partial reconfiguration can be used to reconfigure only the portions of the FPGA that change from one system function to the next. Another feature supported by many current FPGAs is dynamic partial reconfiguration, also referred to as run-time reconfiguration, where a portion of the FPGA can be reconfigured while the remainder of the FPGA continues to perform normal system operation. Dynamic partial reconfiguration also prevents loss of data contained in system function memory elements (flip-flops, RAMs, etc.) during the reconfiguration process.

There are two basic modes for configuration of FPGAs, typically called master and slave modes. The configuration mode is controlled by dedicated mode input pins to the FPGA that can be hardwired on a printed circuit board (PCB). In master mode, the FPGA configures itself by reading the configuration data from a PROM residing on the PCB with the FPGA. Configuration using master mode is usually performed during power up of the PCB. In slave mode, configuration of the FPGA is controlled and performed by an external device, such as a processor. As a result, the slave mode is used for partial and dynamic partial reconfiguration of the FPGA. Most FPGAs support the ability to configure multiple FPGAs from a single PROM by daisy-chaining the FPGAs, as illustrated in Figure 12.5. Here the first FPGA in the chain operates in master mode to supply the configuration clock (CCLK) to read the configuration data from the PROM, while the remaining FPGAs in the chain operate in slave mode. As each FPGA in the daisy chain completes its configuration process, subsequent configuration data from the PROM are passed on to the next FPGA in the daisy chain by connecting the configuration data input (Din) to the data output (Dout). Many FPGAs provide serial and parallel configuration interfaces for both master and slave configuration modes. The parallel interface is used to speed up the configuration process by reducing the download time via parallel data transfer. The most common parallel interfaces use 8-bit or 32-bit configuration data buses. This facilitates configuration of FPGAs operating in slave mode by a processor, which can also perform reconfiguration or dynamic partial reconfiguration of the FPGAs once the system is operational. To avoid a large number of pins dedicated to a parallel configuration interface, these parallel data bus pins can be optionally reused for normal system function signals once the device is programmed. The IEEE standard boundary scan interface, in conjunction with associated instructions to access the configuration memory, can also be used for slave serial configuration in many FPGAs.

Figure 12.5. Master and slave mode FPGA configuration in a daisy chain.

Partial reconfiguration data are generated by comparing the new configuration data to be downloaded with the reference configuration data for the implementation residing in the FPGA. The partial reconfiguration data contain the configuration data for only those portions of the FPGA that change between these two configurations along with the appropriate commands to the configuration registers to perform partial reconfiguration. The smallest portion of an FPGA that can be reconfigured is a function of the organization of the configuration memory design. In some FPGAs, for example, a single PLB, or a portion thereof, can be reconfigured without affecting any other PLB in the array; this is referred to as a PLB addressable configuration memory. However, most FPGAs have a frame addressable configuration memory where a single frame spans multiple PLBs in the array, usually an entire column of PLBs. As a result, the frame size is often a function of the array size and increases for larger FPGAs. Multiple frames, typically 10 to 20, are needed to completely specify the configuration data for a PLB or column of PLBs and the associated routing resources. Therefore, the size of the partial reconfiguration data to be downloaded is a function of the frame size and the number of frames different between the configuration of the reference design residing in the FPGA and the new design to be downloaded.

The ability to perform partial reconfiguration requires an increase in complexity of the configuration interface circuitry compared to a full reconfiguration. For example, the user must have the ability to specify the address of the frame to be written as well as the configuration data to be written into that frame address. As a result, the configuration interface circuitry must support addressing and accessing the frame address register (FAR) and frame data register (FDR) as well as other registers such as command and control registers. The command and control registers are provided to facilitate specialized features for not only the partial reconfiguration process but also for the operation of the FPGA once it is configured. An example feature is multiple frame write capability where the configuration data contained in the FDR can be written to multiple frames to reduce configuration time, because the size of the frame address is typically much smaller than the frame data. This is particularly useful in the case of very regular designs with repeated identically configured circuit functions. Another example feature is security for configuration memory readback where the configuration memory can be read to verify proper download of the configuration data. Once verified, configuration memory readback access can be disabled for intellectual property protection of the design configured in the FPGA. As a result of this type of security feature, different boundary scan instructions are needed for write access and read access of the configuration memory. Some FPGAs allow encryption of the configuration data as an additional security feature.

In addition to configuration memory readback for verifying proper download, many FPGAs support partial configuration memory readback. The FAR is used to specify the address of a specific frame, or the starting address for a set of consecutive frames, to be read. In some FPGAs, the contents of memory elements, such as flip-flops or RAMs, can be read during partial configuration memory readback in addition to the configuration data contained in the frame.

A recent and exciting trend in FPGAs is the ability of an embedded processor core, either hard core or soft core, to read or write the FPGA configuration memory. As a result, the embedded processor core is capable of on-chip dynamic partial reconfiguration of portions of the FPGA. Hence, configuration data can be algorithmically generated from within the FPGA itself instead of traditional downloading. This presents opportunities for many new types of system applications in FPGAs, as well as new types of testing solutions, as will be discussed later.

The Testing Problem

The programmability of an FPGA poses a number of challenges when it comes to complete and comprehensive testing of the FPGA itself. First, a large number of configurations must be downloaded into the FPGA to test the programmable resources in their various modes of operation. The size of the configuration memory is an important factor in testing FPGAs, because the total test time is usually dominated by the time required to download configuration data. Fortunately, partial reconfiguration can reduce the total time associated with downloading these test configurations by writing only the portions of configuration memory that change from one test configuration to the next. The FPGA testing problem is further complicated by the growing size of FPGAs in terms of the PLB array, frequently changing architectures, and the introduction of specialized embedded cores such as memories and DSPs. If the FPGA can be completely tested and determined to be fault-free, the intended system function can be programmed onto the FPGA with a high probability of proper operation. When faults are detected, the system function can be reconfigured to avoid the faulty resources if they can be diagnosed (identified and located). During manufacturing testing, the ability to locate and identify faults is also important for yield enhancement. In addition, the programmability of FPGAs has resulted in the ability to use faulty devices for system-specific configurations that do not use the faulty resources [Abramovici 2004] [Toth 2004]. Therefore, diagnosis of the faulty resources is an important aspect of FPGA testing in order to take full advantage of the fault- or defect-tolerant potential of these devices.

FPGA testing must consider a variety of fault models. In addition to classical stuck-at faults in the programmable logic resources, the configuration memory bits that define the logic functions performed by these resources must also be tested for stuck-at-0 and stuck-at-1 faults [Abramovici 2001]. For complete testing, the logic resources must be tested in different modes of operation, which in turn requires multiple reconfigurations of the logic resources under test in the FPGA. The number of configuration memory bits associated with the programmable routing resources is typically three to four times the number of configuration memory bits associated with the programmable logic resources. As a result, the programmable interconnect network poses a bigger testing challenge than the logic resources. The fault models used for testing the routing resources include shorts (bridging faults) and opens in the wire segments, wire segments stuck-at-1 and stuck-at-0, as well as PIPs stuck-on and stuck-off along with their controlling configuration memory bits stuck-at-1 and stuck-at-0. Whereas the PIP stuck-off fault can be detected by a simple continuity test, stuck-on faults are similar to bridging faults and require opposite logic values be applied to the wire segments on both sides of the PIP while monitoring both wire segments in order to detect the stuck-on fault [Stroud 2002b]. More recent issues in FPGA testing include delay fault testing [Chmelar 2003]. Testing for delay faults in FPGAs is important because the transmission gates used to construct the PIPs in the interconnect network are particularly susceptible to defects that affect the delay.

FPGA testing is further complicated by the capabilities, or lack thereof, of the computer-aided design (CAD) tools associated with FPGAs. The primary goal of these CAD tools is the synthesis and implementation of system applications into the FPGA. Because the system functions are usually described in VHDL or Verilog, the synthesis tools attempt to optimize the implementation to obtain the most efficient implementation in the FPGA in terms of area and performance. As a result, CAD tools attempt, during the optimization process, to eliminate unused logic and interconnections in the final implementation. Unfortunately, this optimization is often at odds with the goals of testing. For example, consider the multiplexer in Figure 12.6a and its associated gate-level implementation. To detect the stuck-at-1 (sa1) fault denoted by the “X” in the figure, we must apply the logic values shown next to each input; in other words, we must select the A input while applying a logic 1 to the unselected B input. This is not a problem in a normal circuit, but when the select signal S is sourced by a configuration memory bit, the logic sourcing signals on the unselected inputs is considered to be unused by FPGA CAD tools and will be eliminated, not permitting sourcing of the required test logic values. A more frequent problem exists at the inputs to the multiplexer PIPs that are used extensively in recent FPGAs to construct the programmable interconnect network. Each PIP in the multiplexer shown in Figure 12.6b must be tested for both stuck-on and stuck-off faults. The stuck-off fault can be detected by activating each PIP in turn and passing both logic 0 and logic 1 values through the multiplexer (as shown for input A). To test the stuck-on fault, however, we must apply the opposite logic values, with respect to the logic values being passed through the activated PIP, to one or more of the deactivated PIPs (as shown for input B) to detect the short circuit caused by the PIP stuck-on fault. Therefore, we need a minimum of one test configuration for each input to the multiplexer and, for each of these configurations, we must apply opposite logic values to the activated PIP and at least one of the deactivated PIPs. However, synthesis tools typically require a signal to have both source and sink in order for the signal path to be routed through the programmable interconnect network, which prevents routing a signal to an unselected input of a multiplexer, for example. This problem has been referred to as the “invisible logic problem” and must be overcome to completely test an FPGA [Stroud 1996]. The solution requires using FPGA vendor-specific design tools, intermediate netlist languages, and techniques that are not used in a normal design flow but nevertheless provide the control needed to establish proper testing conditions.

Figure 12.6. Programmable resource testing problem example: (a) gate-level multiplexer implementation and (b) multiplexer PIP.

Testing Approaches

A taxonomy of FPGA testing techniques is summarized in Table 12.2 and includes a number of general types and categories. For example, the generation and application of input test stimuli as well as the analysis of output responses can be performed external or internal to the FPGA. In built-in self-test (BIST) approaches, both test pattern generation and output response analysis are performed internal to the FPGA, whereas either one or both functions are performed external to the FPGA in external testing approaches. For system-level testing, another classification is based on whether the testing is online while the FPGA is also performing its normal system functions, or offline where the system is not operational during testing of the FPGA. A testing approach can be further classified based on whether its goal is to test only those resources to be used by the intended system function, referred to as application-dependent testing, or all resources regardless of the system function to be implemented in the FPGA, called application-independent testing. FPGA manufacturing testing is usually external and, by definition, offline and application independent. On the other hand, a BIST approach could be internal, online, and application dependent.

Table 12.2. FPGA Testing Taxonomy

Test Approach Attribute	Classification
Test pattern application and output response analysis	Internal (BIST)			External
System-level testing	Offline			Online
System application	Independent			Dependent
Target programmable resources	Logic			Routing
Target programmable resources	PLBs	I/O cells	Cores	Local	Global

Different techniques are typically used to test programmable logic and routing resources as a result of the different architectures and fault models associated with the targeted resources. However, the routing resources must be used to test the logic resources and vice versa. Therefore, although this final classification may not necessarily be mutually exclusive, it does provide a general overall classification for test approaches. Programmable logic resources can be subdivided into PLBs, I/O cells, and other specialized cores such as memories and DSPs. Programmable interconnect resources can be subdivided into local and global routing resources, where local routing resources are dedicated to a given logic resource, or pair of adjacent resources, with global routing resources providing the interconnection of nonadjacent resources. The following subsections discuss the first three classifications, and the subsequent section provides a more detailed discussion of testing logic and routing resources in the context of BIST as a representative approach to testing these resources. An earlier classification and survey of FPGA test and diagnostic techniques can be found in [Doumar 2003].

External Testing and Built-In Self-Test

In external testing techniques, the FPGA is programmed for a given test configuration with the application of input test stimuli and the monitoring of output responses performed by external sources such as a test machine [Huang 1998] [Renovell 1998]. Once the FPGA has been tested for a given test configuration, the process is repeated for subsequent test configurations until the targeted resources within the FPGA have been completely tested. As a result, external test approaches are typically used for manufacturing testing only. For FPGAs with boundary scan that support INTEST capabilities, the input test stimuli can be applied, and output responses can be monitored, via the boundary scan interface. As a result, internal testing of the FPGA via boundary scan INTEST could be performed during system-level testing. Otherwise, the FPGA I/O pins must be used, resulting in package-dependent testing. It should be noted, however, that few FPGA manufacturers support the INTEST feature.

The basic idea of BIST for FPGAs is to configure some of the programmable logic and routing resources as test pattern generators (TPGs) and output response analyzers (ORAs). These TPGs and ORAs constitute the BIST circuitry used to detect faults in PLBs, routing resources, and specialized cores such as RAMs and DSPs. This facilitates both manufacturing and the system-level use of the BIST configurations. Different approaches are typically used for testing PLBs, routing resources, and embedded cores, as will be discussed later in the chapter.

Online and Offline Testing

BIST approaches for FPGAs have been used at the system level for both online and offline testing. Online testing uses dynamic partial reconfiguration to detect and diagnose faults concurrently with normal system operation [Abramovici 2004] [Verma 2004] [Emmert 2007]. This is accomplished by designating portions of the FPGA as self-test areas while the remaining portions of the FPGA implement the working system function. All of the logic resources within the self-test area can be tested without affecting the operation of the system function because no system function logic is located in the self-test area. Although some of the routing resources within the self-test area must be reserved in order to interconnect the portions of the system function residing on either side of the self-test area, the remaining routing resources can be tested. For example, the self-test area illustrated in Figure 12.7 runs vertically through the array such that the vertical routing resources can be tested while the horizontal routing resources are used to interconnect the system function, illustrated by the horizontal lines crossing the self-test area. To test the horizontal routing resources, a horizontal self-test area must be included in which the vertical routing resources are used to interconnect the system function.

Figure 12.7. Online testing: (a) before relocation and (b) after relocation.

When the logic and routing resources in the self-test area have been tested and found to be fault-free, a portion of the system function is relocated to the self-test area, as illustrated in Figure 12.7a, such that the vacated region now functions as the new self-test area, as illustrated in Figure 12.7b. As a result, the vertical and horizontal self-test areas repeatedly move through the FPGA to test all programmable resources. When faults are detected, the detected faults are diagnosed (located and identified) so that the portion of the system function to be relocated into the self-test area can be reconfigured to use spare resources and avoid the identified faulty resources. One goal of online testing approaches is to minimize the latency between the occurrence of a fault and its subsequent detection and diagnosis. Another goal is to minimize the size of the self-test area as well as the number of resources reserved for spares in order to maximize the resources available for the system function.

Offline testing, on the other hand, requires that the system, or a portion of the system, be taken out of service in order to perform the test. As a result, offline testing is typically performed periodically during low-demand periods of system operation or on demand when an error is detected during system operation via concurrent error detection circuits, such as parity, checksum, or cyclic redundancy code checks. One goal of offline testing is to maximize the quantity of resources under test during each test configuration to minimize the test time and the associated time that the system is out of service. Once the FPGA has been tested and determined to be fault-free, the system function can be configured in the FPGA. Otherwise, when faults are detected, diagnostic configurations or procedures can be applied such that the system function can be reconfigured to avoid the identified faults before returning the system to service. An advantage of offline testing is that there is no penalty in terms of the resources available for system function implementation because the FPGA is completely reconfigured before and after testing.

Application Dependent and Independent Testing

Because test time is usually dominated by the number of times the FPGA must be reconfigured and the associated download time, application-dependent testing approaches seek to test only those resources the intended system function will use [Tahoori 2004] [Tahoori 2007]. One approach to application-dependent testing is to include design for testability (DFT) circuitry in the system function to be implemented by the FPGA. These DFT approaches include scan design or BIST techniques typically used in application-specific integrated circuits (ASICs) such that only the logic and routing resources used by the system function are tested. The primary disadvantage of this approach is the additional area overhead and performance penalties traditionally associated with DFT circuitry. Although a few FPGA architectures have incorporated scan chains, or dynamically controlled multiplexers at the inputs to flip-flops that can be used for scan chains, this is not the case in the majority of commercially available FPGAs. As a result, there can be a significant increase in the number of PLBs required to implement the system function when DFT circuitry is incorporated in the design, as much as 50% in some examples [Renovell 2001]. For example, adding a multiplexer to the input of a flip-flop for implementing scan design requires two LUT inputs (one input for the scan input data and one input for the scan mode control) in addition to the inputs needed for the system function. For an FPGA with four-input LUTs, this doubles the number of LUTs that drive flip-flops and that implement system functions requiring more than two inputs.

There is an alternative application-dependent testing approach for FPGAs that support reading the contents of memory elements during partial configuration memory readback as well as the ability to set logic values in memory elements during partial reconfiguration. For FPGAs that support both of these features, test patterns specific to the system function application programmed in the FPGA can be written into the memory elements via the partial reconfiguration process. The FPGA is then clocked once in its system mode of operation. Partial configuration memory readback is then used to retrieve the output responses of the combinational logic from the memory elements. This process is repeated until all necessary test patterns have been applied and output responses have been retrieved. Because the configuration memory bits for the memory elements may be located in frames with other configuration data for the system function, such as LUT contents and routing, a read-modify-write operation on the configuration memory can be used to modify the contents of the memory elements without disturbing (or needing to know and store) the contents of those other configuration memory bits. As a result, this approach is similar to scan design but without any additional area overhead or performance penalties to the system function.

Application-independent testing approaches, on the other hand, seek to test all programmable resources in the FPGA, independent of the system application to be configured in the FPGA. Once the programmable resources have been completely tested, the FPGA can be programmed with the intended system function without any overhead or performance penalties resulting from the DFT circuitry. Because application-independent techniques seek to test all resources in the FPGA to ensure that they are fault-free, regardless of any given system function, the FPGA must be reconfigured many times to test all of the resources. Both external testing and BIST approaches have been developed for application-independent testing. Because of the regularity of the FPGA structure, iterative logic array (ILA) testing approaches have been applied to facilitate scaling the test architecture and constant test time independent of array size [Huang 1998] [Toutounchi 2004]. Application-independent external testing [Chmelar 2003] and BIST approaches [Abramovici 2003] have been developed to detect delay faults in FPGAs.

BIST of Programmable Resources

This section examines various BIST approaches for testing logic and routing resources. Other than the incorporation of TPGs and ORAs, the test configurations for the resources under test are, for the most part, independent of whether the test approach is application-independent external testing or BIST. The testing process consists of (1) configuring the FPGA to test some specific resource (either logic or routing), (2) applying test patterns and analyzing the output responses, (3) reconfiguring to test the target resource in a different mode of operation, and (4) repeating these steps until the target resource is completely tested. There is a significant difference in the number of test configurations needed to completely test logic resources versus routing resources, as can be seen in Table 12.3 for some commercially available FPGAs. The number of test configurations needed to completely test routing resources ranges from 3 to more than 23 times the number needed to completely test PLBs. This is primarily because routing resources constitute approximately 80% of the configuration memory bits in most FPGAs, but there are other architectural characteristics that affect the total number of test configurations required [Stroud 2003].

Table 12.3. FPGA Test Configurations

FPGA		Number of Test Configurations
Vendor	Series	PLBs	Routing	Cores	Reference
Lattice	ORCA2C	9	27	0	[Abramovici 2001]
Lattice	ORCA2CA	14	41	0	[Stroud 2002b]
Atmel	AT40K/AT94K	4	56	3	[Sunwoo 2005]
Cypress	Delta39K	20	419	11	[Stroud 2000]
Xilinx	4000E/Spartan	12	128	0	[Stroud 2003]
	4000XL/XLA	12	206	0	[Stroud 2003]
	Virtex/Spartan-II	12	283	11	[Dhingra 2005]
	Virtex-4	15	?	15	[Milton 2006]

Logic Resources

Early FPGAs consisted only of an array of PLBs and programmable routing resources. As a result, the first testing approaches, including BIST, for logic resources were specifically for PLBs. Over time, however, I/O cells became more complex, and specialized cores with various programmable options were incorporated in the array. Therefore, BIST approaches originally developed for PLBs have been modified and extended to testing I/O cells and specialized cores.

Three BIST approaches for testing logic resources are illustrated in Figure 12.8. In general, multiple identically configured TPGs are implemented to supply test patterns to alternate programmable logic resources under test, which will be referred to as blocks under test (BUTs). The outputs of the identically configured BUTs are then monitored by one or more comparison-based ORAs. The BUTs are repeatedly reconfigured in their various modes of operation until they are completely tested. A set of test configurations that completely test a set of BUTs in all of their modes of operation is referred to as a test session.

Figure 12.8. FPGA logic resource BIST architectures: (a) basic comparison, (b) circular comparison, and (c) expected results.

Multiple TPGs are used to prevent faults from escaping detection because of a faulty TPG or faulty routing resources interconnecting the TPG and BUTs. With multiple TPGs, a faulty TPG would produce different test patterns from those of the fault-free TPG; therefore, the output responses of the BUTs driven by the faulty TPG would also be different and would be detected by the ORAs. Another important use of multiple TPGs is to control the loading on the TPG outputs, because the number of resources under test can be quite large, as the data in Table 12.1 show. The loading can best be controlled by creating the complete BIST architecture in a small portion of the array and repeating this structure to fill the array [Abramovici 2001]. The use of multiple TPGs also requires a mechanism for synchronizing the TPGs, such as a common reset signal, so that the TPGs apply the same test patterns to their respective BUTs.

In the basic comparison architecture shown in Figure 12.8a, the BUTs in the middle of the array are monitored by two ORAs and compared with the outputs of two different BUTs. Along the edges of the array, the BUTs are monitored by only one ORA. In the circular comparison architecture of Figure 12.8b, all BUTs are monitored by two ORAs and compared to two different BUTs. It has been shown that there are a few pathological cases of multiple faults that can escape detection using the basic comparison architecture [Abramovici 2001], and there are fewer pathological cases with the circular comparison architecture. A final architecture that has been used with success in some cases is the expected results architecture, illustrated in Figure 12.8c, where the ORAs compare the outputs of the BUTs with expected results generated by the TPGs. This architecture is most effective when the expected results can be efficiently generated by the TPG; for example, when testing RAM cores, the expected results are known as part of the test algorithm. However, it is important to have different TPGs generating the test patterns and the expected results to avoid faulty BUTs escaping detection as a result of a faulty TPG. A solution is to have multiple identical TPGs that generate both test patterns and expected results, but to connect the expected results produced by one TPG to ORAs that compare the output responses from BUTs driven by test patterns from a different TPG.

All of these architectures can be oriented along the columns of the array as shown in Figure 12.8 or can be oriented along the rows of the array by rotating the architectures by 90 degrees. Control of the physical layout is necessary to achieve the desired orientation of the BIST architecture. In most cases, this control cannot be achieved through the normal synthesis design flow and requires the use of FPGA vendor-specific design tools and techniques. The orientation of the architecture is important when using partial reconfiguration to reconfigure the BUTs in their various modes of operation. For example, when the frames of the configuration memory are aligned along the columns of the array, the column oriented architectures of Figure 12.8 give optimum results because the configurations of the TPGs, ORAs, TPG-to-BUT routing, and BUT-to-ORA routing remain unchanged and only the frames containing the BUTs must be written to reconfigure the BUTs for the next test configuration. This minimizes the amount of configuration data that must be downloaded to the FPGA for each test configuration and the total test time.

The programmable resources used to implement the TPGs and ORAs for BIST of the various logic resources are summarized in Table 12.4. The actual TPG design is a function of the logic resources being tested. When implemented in PLBs, a lower bound on the number of PLBs required to implement a TPG, T_PLB, is given by:

Equation 12.1.

Table 12.4. Programmable Resources Used for BIST of Logic Resources

Resource Under Test	TPGs	ORAs
PLBs	PLBs or DSP cores	PLBs
LUT RAMs	PLBs or DSP and RAM cores	PLBs
I/O cells	PLBs or DSP and RAM cores	PLBs
Cores (memories, DSPs, etc.)	PLBs	PLBs

where B_IN is the number of inputs to a BUT and N_FF is the number of flip-flops in a PLB. However, a TPG can also be implemented by storing test patterns (and expected results) in a RAM core that is then used in a ROM mode of operation with a counter to sequence through the addresses of the ROM; this counter can be implemented in PLBs or a DSP core. Another technique for implementing a TPG is to use a DSP core by itself. Although the accumulator of a DSP can easily be used to implement a counter, another approach is to add a large prime number constant to the accumulator to produce exhaustive test patterns with more transitions in the most significant bits than is the case with a counter [Rajski 1998].

ORAs are most efficiently implemented using PLBs because of the large number of ORAs typically required. The total number of ORAs, N_ORA, needed is given by:

Equation 12.2.

where N_BUT is the total number of BUTs and B_OUT is the number of outputs per BUT. The ORA can be one of several types of implementations, some of which are summarized in Figure 12.9. In all cases, a comparator is used to detect any mismatches in the output responses of the BUTs caused by faults. Any mismatch is then latched via the flip-flop in conjunction with the feedback and OR gate and held until the end of the BIST sequence where a logic 1 indicates that a mismatch was encountered. The primary ORA design issue is how the contents of the ORAs will be retrieved at the end of the BIST sequence. In the ORA design shown in Figure 12.9a, a shift register mode of operation is incorporated to allow the contents of the ORAs to be shifted out at the end of the BIST sequence. This requires the additional multiplexer functionality, which adds two inputs to each ORA for a total of five inputs including the feedback. Because most FPGAs provide only three-input or four-input LUTs, the ORA with shift register retrieval requires two LUTs and one flip-flop. By using configuration memory readback to retrieve the ORA contents (in those FPGAs that support reading contents of memory elements during configuration readback), the shift register mode can be eliminated as illustrated in the ORA design shown in Figure 12.9b, which requires only one three-input LUT and one flip-flop. As a result, the number of PLBs required to implement ORAs, O_PLB, is given by:

Equation 12.3.

Comparison-based ORA designs: (a) ORA with shift register retrieval, (b) ORA for configuration readback, and (c) ORA for comparing multiple outputs.

Figure 12.9. Comparison-based ORA designs: (a) ORA with shift register retrieval, (b) ORA for configuration readback, and (c) ORA for comparing multiple outputs.

Reading back the full configuration memory to obtain the contents of the ORA flip-flops is inefficient in terms of test time. On the other hand, the technique is very effective in FPGAs that support partial configuration memory readback and concentrate the configuration memory bits that contain the contents of the flip-flops in a minimum number of frames. The orientation of the BIST architecture (row or column) with the frames of the configuration memory helps to minimize the time spent retrieving BIST results because only those frames that contain ORA flip-flop contents must be read. BIST results can be retrieved at the end of each BIST sequence or, using dynamic partial reconfiguration, at the end of a test session. Whereas the latter approach reduces overall test time, the tradeoff is some minor loss in diagnostic resolution; the faulty resource(s) can still be identified, but the failing mode of operation of the faulty resource cannot. A final ORA design is illustrated in Figure 12.9c, which compares multiple sets of outputs from the BUTs. This design is effective in FPGAs with LUTs having a large number of inputs, but is also useful for BUTs with a large number of outputs.

Programmable Logic Blocks

One of the first BIST architectures for testing PLBs is illustrated in Figure 12.10a where the TPGs are constructed from PLBs [Abramovici 2001]. Both counters and linear feedback shift registers (LFSRs) have been used as TPG functions. A more recent approach uses DSP cores in an accumulator mode of operation with the addition of a large prime constant as the TPG [Milton 2006]. This allows the column of PLBs normally used for the TPGs to implement ORAs instead. Hence, the circular comparison architecture of Figure 12.10b can be implemented. In both PLB BIST architectures, only half of the PLBs can be BUTs at any given time because the other PLBs function as ORAs and TPGs in the case of the basic comparison architecture. As a result, two test sessions are required to completely test all PLBs in the array. Note that the sets of PLB test configurations in Table 12.3 would be applied twice, once for each test session.

Figure 12.10. PLB BIST architecture: (a) basic comparison and (b) circular comparison.

After the initial download for a given test session, the BUTs are repeatedly reconfigured in their various modes of operation until they are completely tested. Therefore, it is important to align the BUTs with the orientation of the frames in the configuration memory to minimize the number of frames that must be written to reconfigure the BUTs for subsequent test configurations. This also aligns the ORAs with the frame for retrieving the pass/fail contents of the ORAs after the completion of each BIST sequence, or at the end of each test session, via partial configuration memory readback. The resultant test time speedup is illustrated in Table 12.5 for Xilinx Virtex and Virtex-4 FPGAs [Dhingra 2006]. The time required for full reconfiguration and full configuration memory readback at the end of each test configuration is used to normalize the data. The test time speedup obtained for Virtex-4 is better than that obtained in Virtex, by about 2.5 times, the result of better frame organization and multiple frame write capabilities. The table also gives the reduction in the memory storage requirements needed to store the BIST configuration data files when using partial reconfiguration.

Table 12.5. Test Time Speed-up and Memory Reduction

Download and Readback Method	Test Time Speedup		Memory Reduction
Download and Readback Method	Virtex	Virtex-4	Virtex	Virtex-4
Full reconfiguration with full configuration memory readback	1	1	1	1
Partial reconfiguration with partial configuration memory readback	4.6	8.9	3.2	5.3
Dynamic partial reconfiguration with partial configuration memory readback at end of session	5.1	12.9	3.2	5.3

The actual test configurations are developed based on the specific PLB architecture. As an example of this process, consider the simple PLB architecture of Figure 12.11, which consists of two three-input LUTs with common inputs, a flip-flop, and four multiplexers. The four multiplexers respectively provide the ability to combine the two LUTs to form a single logic function of four inputs, synchronous clock enable, synchronous programmable set or reset, and the ability to bypass the flip-flop for a combinational logic function only. Sixteen configuration memory bits define the contents of the two LUTs (C₇–C₀ for LUT C and S₇–S₀ for LUT S) to establish the truth tables for their respective combinational logic functions. For example, LUT C can implement the carry function of a full adder while LUT S implements the sum function. An additional six configuration memory bits (CB₅–CB₀) control functionality of the rest of the PLB. CB₄ controls the output multiplexer to select or bypass the flip-flop for sequential or combinational logic functions, respectively. CB₃ configures the flip-flop to be either set or reset when the set/reset input is active. A logic 1 in CB₅ disables the D3 input and forces Smux to select the output of LUT S, as would be the case for the full adder implementation example. The final three bits (CB₂–CB₀) drive exclusive-OR functions to define the active levels for the clock enable and set/reset inputs as well as the active edge of the clock, where logic 0’s would result in active high control signals and a rising edge clock while logic 1’s would invert the control signals and active clock edge.

Figure 12.11. Example PLB architecture for test configuration development.

The primary goal in test configuration development is to minimize the number of test configurations to be downloaded. In this example, a minimum of three test configurations is needed to completely test the PLB for all stuck-at faults. Three possible test configurations are summarized in Table 12.6 in terms of the logic values of the configuration bits as well as the individual and cumulative fault coverage (for 174 total collapsed stuck-at gate-level faults) obtained for each configuration. The input test patterns include all 2⁶ possible input values produced either from a counter or an LFSR. Two configurations (configuration 1 and configuration 2) are needed to test the LUTs where opposite logic values are loaded in each LUT for each configuration. In this example, the LUT configuration data implement the truth tables for exclusive-OR (XOR) and exclusive-NOR (XNOR) functions, which are a good choice for testing LUTs. Note that opposite truth tables are loaded in the two LUTs for each configuration in order to apply opposite logic values to the inputs of the multiplexer controlled by the D3 input. Two configurations (configuration 1 and configuration 2) are needed, in which the output multiplexer selects the flip-flop to test the active levels of the control signals to the flip-flops, active edge of the clock, and the programmable set/reset. The last configuration tests the flip-flop bypass.

Table 12.6. Test Configurations for Example PLB Architecture

Configuration Bits	Configuration 1	Configuration 2	Configuration 3
C₇-C₀LUT C	01101001 = XNOR	10010110 = XOR	10010110 = XOR
S₇-S₀LUT S	10010110 = XOR	01101001 = XNOR	01101001 = XNOR
CB₅-CB₀	010000	011111	100000
Individual FC	149/174 = 85.6%	149/174 = 85.6%	108/174 = 62.1%
Cumulative FC	149/174 = 85.6%	170/174 = 97.7%	174/174 = 100%

Input/Output Cells

The I/O cells have a number of programmable features that not only facilitate user control and specification of the input, output, and bidirectional ports for any given system application but also meet critical system attributes such as timing requirements, voltage standards, and drive capabilities. In newer FPGAs, an increasing amount of programmable logic resources has been incorporated in I/O cells to support system application requirements such as high-speed data transfer. The typical range of programmable resources varies from two flip-flops and 4 multiplexers with 3 I/O standards and 4 delay values in simpler FPGAs to 10 flip-flops and 33 multiplexers with 69 I/O standards and 64 delay values in some of the more complex FPGAs. Hence, the number of test configurations for all of the modes of operation can be large.

Boundary scan, and more specifically the EXTEST feature, has traditionally been used to test input and output buffers, tristate control, and pads along with interconnections between devices on a PCB. However, the EXTEST capability can only test the bidirectional buffer portion of the programmable I/O cells of an FPGA; it cannot test other programmable logic resources in the I/O cells, such as the flip-flops used for registered inputs and outputs or the programmable routing resources connecting I/O cells to the FPGA core. Although the boundary scan INTEST feature could be used to test these programmable logic and interconnect resources, few FPGAs provide INTEST capability as it is not a required feature of the IEEE boundary scan standard.

Using the bidirectional capabilities of the I/O cell, the logic BIST architecture can be applied to the I/O cells as illustrated in Figure 12.12 [Vemula 2006]. Test patterns from two identically configured TPGs are applied to the output portion of the I/O cells with the return path through the input portion of the I/O cell to the ORAs. The output responses of identically configured I/O cells under test are compared by ORAs implemented in the PLBs of the FPGA core. The I/O cells under test and the ORAs are arranged to provide a circular comparison such that every I/O cell is observed by two ORAs and compared with two different I/O cells under test. The basic comparison and expected results architectures can also be used for testing I/O cells.

Figure 12.12. I/O cell BIST architecture.

The I/O cells under test are repeatedly reconfigured in their various modes of operation. Because there are more PLBs than I/O cells, all I/O cells can usually be tested in a single test session. This BIST approach also facilitates testing programmable interconnect resources associated with the I/O cells. However, there are some testing limitations because the BIST approach requires that the I/O cells be configured in the bidirectional mode of operation. For example, not all I/O voltage standards can be used in the bidirectional mode and, as a result, cannot be tested with this BIST approach. Furthermore, the potential effects of the bidirectional mode must be considered for the system-level application of BIST for I/O cells as other components may be driving the nets. Differential I/O can be tested using this approach in those FPGAs where the pair of I/O cells can be configured as bidirectional while in the differential mode.

Specialized Cores

Current FPGAs typically incorporate multiple regular structure cores such as memories (RAMs and FIFOs), multipliers, and DSPs. To maximize the efficiency of these devices in a large variety of system applications, these regular structure cores are also programmable. For example, RAM cores can be programmed for a range of different address and data bus widths, synchronous and asynchronous write and read modes, as well as single and dual port operation. These cores must be tested in their various modes of operation. All three BIST architectures for programmable logic resources illustrated in Figure 12.8 have been used successfully to test specialized cores embedded in FPGAs. However, the TPGs for testing these cores are often more complex than those for testing PLBs or I/O cells because specific, complex test algorithms are needed.

One of the most efficient RAM test algorithms currently in use, in terms of test time and fault detection capability, is the March LR algorithm [Van de Goor 1996] summarized in Table 12.7. This algorithm has a test time on the order of 16N, where N is the number of address locations. In addition to classical stuck-at faults, this algorithm is capable of detecting pattern sensitivity faults, intraword coupling faults, and bridging faults in the RAM. For word-oriented memories, a background data sequence (BDS) must be added to detect these faults within each word of the memory [Hamdioui 2004]. The March LR with BDS given in Table 12.7 is for a RAM with 2-bit words, but in general the number of background data sequences, N_BDS, is given by:

Equation 12.4.

Table 12.7. RAM Test Algorithms

Test Algorithm	March Test Sequence
March DPR
MATS+
March X
March Y
March LR w/o BDS
March LR with BDS
March S2pf-
March D2pf
Notation:	w0 = write 0 (or all 0’s) r1 = read 1 (or all 1’s)
portA:portB	↑ = address up ↓ = address down = address either way

where K is the number of bits per word. When the RAM core can be configured in a 1-bit/word mode of operation, the March LR without BDS will test for pattern sensitivity and coupling faults. However, applying March LR with BDS to the largest data width access to the RAM will also detect bridging faults in the parallel data bus into and out of the RAM. Furthermore, some FPGA RAM cores include additional space for parity that cannot be accessed in the 1-bit/word mode such that the March LR with BDS must be applied to those modes that give access to the additional memory storage for parity. As a result, it is most efficient in terms of test time to apply March LR with BDS to the largest data width mode of the RAM. Once the memory core has been tested for pattern sensitivity, transition, and coupling faults, a simpler test algorithm—such as MATS+ (a 5N test), March X (a 6N test), or March Y (an 8N test) from Table 12.7—can be applied for the other address and data width modes of operation to test the programmable address decoding circuitry. Usually, the number of test configurations needed to test the different address and data width options is sufficient to also test all other programmable features such as initialization values, active level for control signals, active edge of clocks, and synchronous/asynchronous operation.

For RAM cores that support true dual-port operation, with two separate address and data ports that can access any location in the memory, the March LR test algorithm is applied to each port in turn to test the single port operation followed by the March S2pf- and D2pf algorithms, summarized in Table 12.7, to test the dual port operation [Hamdioui 2002]. Some RAMs are said to have a “dual-port” mode of operation that is not truly dual port; these RAM simply provide one port for write operations and another port for read operations. This is typical in LUTs that can function as a small RAM. For this case, a simple test algorithm developed for testing these so-called dual-port RAMs is given in Table 12.7 and denoted March DPR [Stroud 2003].

Some FPGAs also incorporate FIFO memories or, more often, RAMs that can also be configured in FIFO modes of operation. In the latter case, it is best to initially test the memory in RAM mode of operation using March LR with BDS to detect pattern sensitivity and coupling faults in the memory core. Subsequent configurations can test the FIFO modes of operation as well as the full and empty flags associated with FIFO operation. Many FPGA FIFOs support programmable “almost full” and “almost empty” flags, and these must be tested as well. The main problem with programmable “almost” full and empty flags is that the FIFO must be reconfigured many times just to set the values at which these flags will go active. The following March X algorithm for testing the FIFO addressing logic and associated flags is based on the algorithm in [Atmel 1999].

March X FIFO Test Algorithm (assuming N address locations):

Reset the FIFO, check that Empty flag is active.
Repeat N times: write FIFO with all 0’s, check that Empty flag goes inactive after the first write cycle, Full flag goes active after last write cycle, and that Almost Empty flag goes inactive and Almost Full flag goes active at the appropriate points in the sequence. (Partial reconfiguration can be used at this point to set the next value of the “almost” full flag before proceeding with additional write cycles; repeat this process for each “almost” full flag value to be tested while proceeding through the write sequence.) Perform one additional write if the FIFO has a Write Error signal to indicate an attempted write when the FIFO is full.
Repeat N times: read FIFO expecting all 0’s and write FIFO with all 1’s, check that Full flag toggles after each read and write cycle.
Repeat N times: read FIFO expecting all 1’s and write FIFO with all 0’s, check that Full flag toggles after each read and write cycle.
Repeat N times: read FIFO expecting all 0’s, check that Full flag goes inactive after first read cycle, Empty flag goes active after last read cycle, and that Almost Empty flag goes active and Almost Full flag goes inactive at the appropriate points in the read sequence. (Partial reconfiguration can be used to set the next value of the “almost” empty flag before proceeding with additional read cycles; repeat this process for each “almost” empty flag value to be tested while proceeding through the read sequence.) Perform one additional read if FIFO has a Read Error signal to indicate an attempted read when the FIFO is empty.

More recently, FPGAs have incorporated error correcting code (ECC) RAMs to tolerate errors, including single event upsets (SEUs), by adding Hamming code generation circuitry at the input to the RAM along with detection and correction circuitry at the output of the RAM as discussed in Chapter 3. Hamming code bits are generated over the incoming data and then stored in the RAM with the data bits. At the output of the RAM, the Hamming code is regenerated for the data bits as discussed in Chapter 3 (Section 3.4.3.2) and illustrated in Figure 3.25 and compared to the stored Hamming code to identify single bit errors for single-error-correction (SEC). A parity bit across the entire code word (data bits plus Hamming bits) is included to provide double error detection (DED) as summarized in Table 3.4. ECC RAMs pose the interesting problem of detecting faults in a circuit designed to be fault-tolerant.

Any parity tree can be completely tested for all gate-level stuck-at faults in only four vectors if one knows the connections of the exclusive-OR gates that form the tree [Mourad 1989]. Unfortunately, only the manufacturer may know the connections for the various parity trees in the generate and detect/correct circuits. Regardless, complete testing can be obtained as shown in Table 12.8 for different sets of test vectors that target the generate and detect/correct circuits in an ECC RAM with 64 data bits, seven Hamming bits for SEC, and an overall parity bit for DED. Pin fault coverage considers only the inputs and output of the exclusive-OR gate, whereas gate-level fault coverage considers the internal gate structure (two NOR gates and an AND gate) of the exclusive-OR. The cumulative gate-level fault coverage (denoted “Cum.” in the table) is given for the progression through the various sets of test vectors.

Table 12.8. Fault Coverage for Hamming Code Circuits in 64-bit ECC RAM

Circuit	Vectors	# Vectors	Fault Coverage
			Pin	Gate	Cum.
ECC Generate	All 0’s; walk 1-thru-0’s	65	100%	87.7%	87.7%
	All 1’s	1	50%	26.5%	93.9%
	Walk two 1’s-thru-0’s	2016	99.9%	99.6%	100%
ECC Detect	Output of ECC generate vectors	2082	56%	58.4%	58.4%
and Correct	Init: Walk 1-thru-0’s; all 1’s; all	321	100%	95.2%	98.1%
	Hamming values w/ data = 0’s;
	Init: Walk two 1’s-thru-0’s	2556	73.5%	71.9%	100%
Note: Init indicates that vectors are initialized in ECC RAM during download.

The first set of vectors for the ECC generate circuit includes the all 0’s test pattern and walking a logic 1 through a field of 0’s for a total of 65 vectors. This small set of vectors will detect 100% of the stuck-at pin faults for exclusive-OR gates regardless of the connections in the actual parity trees [Jone 1994]. This is because a logic 1 is propagated through every path from the inputs to the output (this detects all single stuck-at-0’s), whereas the all 0’s pattern detects all single stuck-at-1 faults. However, when testing pin faults of an exclusive-OR gate, only three combinations {01, 10, and either 00 or 11} are needed to detect each input and the output stuck-at-0 and stuck-at-1 [Mourad 1989]. In actual gate-level implementations of an exclusive-OR function, all four vectors are needed for complete testing; this can be seen in Table 12.8, where only 87.7% of the gate-level stuck-at faults were detected with the same set of 65 vectors. The all 1’s vector can be easily applied to the exclusive-OR gates at the inputs to the parity tree, which increases the fault coverage to 93.9% with a total of 66 vectors, but the remaining gates in the parity tree see all 0’s from the outputs of the first level exclusive-OR gates. One way to apply the {11} vector to the input of all exclusive-OR gates independent of the internal connections is to generate all possible combinations of two logic 1’s in a field of 0’s, which, when combined with the other 66 vectors, gives 100% cumulative gate-level stuck-at fault coverage [Jone 1994], as Table 12.8 shows. This combined set of test patterns also detects all multiple stuck-at faults and bridging faults in the parity-based ECC generate circuit.

To detect faults in the ECC detect/correct circuit, error conditions must be created to decode and correct the errors. The input values to this circuit are the data stored in the RAM, which include Hamming bits produced by the generate circuit. As a result, these data values do not include error conditions unless there is a fault in the RAM or in the ECC generate circuit. This result can be seen from the data in Table 12.8, where applying the outputs of the complete set of vectors to the generate circuit obtains only 56% pin fault coverage and 58.4% gate-level fault coverage. The only way to create error conditions is to initialize the ECC RAM (when downloading the configuration) with data and Hamming values that apply error conditions to the ECC detect/correct circuit. By including the all 0’s, all 1’s, and walking a logic 1 through a field of 0’s in the initialization to the ECC RAM during download, 100% pin fault coverage and 98.1% gate-level fault coverage can be obtained. During test application, the ECC RAM is first read to apply these initialized values to the ECC detect/correct circuit. Next, the input vectors are applied to the ECC generate circuit, whose outputs are stored in the RAM and then read for application to the ECC detect/correct circuit. To detect the remaining faults in the ECC detect/correct circuit, independent of the internal connections in the parity trees, we can once again generate every combination of two logic 1’s in a field of logic 0’s. Unfortunately, these vectors can only be applied by initializing the ECC RAM during configuration of the FPGA such that, depending on the size of the RAM, additional downloads may be required for the complete set of required initialization values. Fortunately, because the TPGs, ORAs, and routing remain constant, a partial reconfiguration will reduce the test time.

Similarly, DSP and multiplier cores are tested algorithmically because the number of inputs makes exhaustive testing impractical. However, a number of test algorithms have been developed for multipliers [Gizopoulos 1998] and multiplier/accumulators [Rajski 1998] that are fundamental components of most DSP cores in current FPGAs. Multiple test configurations are needed to test the associated programmable modes for active edges of clocks, active levels of clock enables and resets, registered inputs and outputs, number of pipelining stages, and cascaded modes of interconnection [Stroud 2005a].

Diagnosis

The three BIST architectures for testing programmable logic resources in Figure 12.8 also provide good diagnostic resolution for the identification of faulty resources based on the BIST results. Diagnosis is trivial for the expected results architecture of Figure 12.8c because the failing ORAs indications correlate to the faulty resources, without any additional processing of the BIST results. The basic comparison and circular comparison architectures, on the other hand, require a diagnostic procedure to process the BIST results and identify the faulty resources. A relatively straightforward, tabular diagnostic procedure was developed for the basic comparison architecture of Figure 12.8a [Abramovici 2001]. In that approach, only one ORA monitors the PLBs along the edge of the array. As a result, the diagnostic resolution is lower along the edges of the array, making unique diagnosis of a set of faulty PLBs difficult and sometimes requiring additional BIST configurations. The diagnostic procedure was later extended for use in online BIST of PLBs where a sequence of separate test configurations with two BUTs and one ORA resulted in a small circular array of four BUTs and four ORAs when superimposed [Abramovici 2004]. When the diagnostic procedure was later extended for use when testing the RAMs and multipliers in FPGAs, the circular comparison of Figure 12.8b was found to improve diagnostic resolution over that of the comparison-based BIST architecture of Figure 12.8a as there are no edges where BUTs are observed by only one ORA [Stroud 2005a] [Stroud 2005b].

An important assumption in the original diagnostic procedure was that there are no more than two consecutive BUTs with equivalent faults, where consecutive BUTs in this context means that they are monitored by a common set of ORAs. However, the diagnostic procedure was extended to indicate when this situation may have occurred in the circular comparison architecture in order to reconfigure the BIST architecture to obtain a unique diagnosis. This enhanced tabular diagnostic procedure can be applied to both the basic comparison and circular comparison architectures and is applied as follows (refer to example A in Table 12.9 while reading the procedure to observe the application of steps 1 to 3):

Logic Resource Diagnostic Procedure

Record the ORA results and set the faulty/fault-free status of all BUTs to unknown, as indicated by an empty entry in the table.
For every set of two or more consecutive ORAs with 0’s (i.e., column 1, Basic example A), enter a 0 for all BUTs observed by these ORAs to indicate that those BUTs are fault-free (i.e., column 2, Basic example A).
For every adjacent 0 and 1 followed by an empty entry (i.e., column 2, Basic example A), enter a 1 to indicate that the BUT is faulty (i.e., column 3, Basic example A). This step is recursively applied while such entries exist.
If an ORA indicates a failure but both BUTs monitored by the ORA are determined to be fault-free, this is referred to as an ORA inconsistency [Abramovici 2001]. In this case, one of the following three conditions exist: (a) there is a fault in the routing resources between one of the BUTs and the ORA, (b) the ORA is faulty, or (c) there are more than two consecutive BUTs with equivalent faults (for circular comparison only). Condition A or condition B exists if there is only one ORA inconsistency. However, if there are multiple ORA inconsistencies in the circular comparison architecture, then condition C may exist. In the latter case, reconfigure the circular comparison order, and repeat the test and diagnostic procedure.
If all BUTs have been marked as faulty or fault-free, then a unique diagnosis has been obtained; otherwise, any BUT that remains marked as unknown may be faulty. In the latter case, reconfigure the circular comparison order to compare different BUTs, or rotate the basic comparison architecture by 90°; then repeat the test and diagnostic procedure.

Figure 12.9. Diagnostic Procedure Examples

Three examples of the application of this diagnostic procedure are illustrated in Table 12.9, where O_ij denotes an ORA comparing the outputs of BUTs B_i and B_j. As the examples for the basic comparison architecture show, the BUTs with unknown status (indicated by a “?” in the table) are located near the edges of the array where diagnostic resolution is lower because the BUTs are observed by only one ORA. In example A, it was determined that BUTs B₄ and B₅ are faulty and have equivalent faults as a result of their common ORA indicating no mismatch. In example B, it can be determined that at least one of the two BUTs, B₅ and B₆, is faulty. These ambiguities in the diagnosis can be removed by rotating the BIST architecture by 90° where rows of ORAs are comparing rows of BUTs, such that the sets of BUTs being compared are orthogonal, and reapplying the diagnostic procedure to the new BIST results [Abramovici 2001]. This improves diagnostic resolution at the cost of doubling the test time, but a unique diagnosis can be obtained for almost any combination of faulty BUTs. The circular comparison architecture, on the other hand, results in fewer ambiguities leading to a unique diagnosis in more cases. Example C illustrates the reason for the assumption of no more than two consecutive BUTs with equivalent faults when applying this diagnostic procedure to the basic comparison architecture; otherwise, the diagnosis is incorrect. The two ORA inconsistencies observed in the circular comparison architecture, on the other hand, indicate that there may be more than two consecutive BUTs with equivalent faults and that the circular comparison order should be modified and the test and diagnosis repeated.

The simplicity of this tabular diagnostic procedure facilitates straightforward implementation in, and execution by, an embedded processor core [Stroud 2005b]. Although the new enhancements to the diagnostic procedure in steps 4 and 5 remove the previous assumption of no more than two consecutive BUTs with equivalent faults for the circular comparison architecture, it should be noted that finding a minimum set of circular comparison configurations that guarantee unique diagnosis remains an open problem. A unique diagnosis of the faulty BUT(s) provides sufficient information for reconfiguration of the system function to avoid the fault in coarse-grain fault-tolerant applications. However, finer diagnostic resolution can be obtained with additional test configurations to determine if the fault will not affect operation so that the system configuration can continue to use the faulty resource [Abramovici 2004].

Interconnect Resources

Because the programmable routing network in the FPGA is used to interconnect the TPGs, BUTs, and ORAs when testing logic resources, many faults in routing resources used by those test configurations are detected. In some cases, an interconnect fault can result in an ORA inconsistency during the diagnostic procedure. However, because the programmable interconnect accounts for a large area of the FPGA and a large portion of the configuration memory, it is difficult to determine the thoroughness and quality of interconnect fault detection during BIST for programmable logic resources. Therefore, dedicated programmable interconnect network test configurations are used to ensure effective testing of the routing resources as well as to target the fault models specific to these resources. Similar to BIST for logic resources, BIST for routing resources generally consists of configuring some of the logic resources as TPGs and ORAs while configuring the routing resources (wire segments and PIPs) as wires under test. However, unlike the arrays of identical programmable logic resources, the programmable interconnect network in an FPGA consists of different types of PIPs and wire segments of various lengths associated with global and local, as well as vertical and horizontal, routing resources. Because of the large number of wire segments and PIPs, only a small portion of the routing resources can be under test in any given test configuration. As a result, the sets of wires under test are repeatedly reconfigured to test the various routing resources in the FPGA. The total number of test configurations required to completely test the routing resources depends on the complexity of the interconnect network as well as the PLB architecture used to construct the TPGs and ORAs. Depending on the complexity of the interconnect network, the number of test configurations for routing resources is generally an order of magnitude larger than the number needed to test the logic resources, as illustrated in Table 12.3.

Two general BIST approaches have proven to be effective in testing the programmable interconnect resources in FPGAs including the wire segments, PIPs, and configuration memory bits that control the PIPs. The first BIST approach for routing resources was comparison based, as illustrated in Figure 12.13a [Stroud 1998]. In this approach, one or more TPGs source exhaustive test patterns over two sets of N wires under test that are monitored at the destination end by comparison-based ORAs to detect any mismatches caused by faults [Stroud 2002b]. Once again, the use of multiple TPGs reduces the probability of faults in the PLBs used to construct the TPGs from causing faults in the interconnect network to escape detection. One potential problem with this approach is that equivalent faults in the two sets of wires under test can escape detection. This problem can be overcome if each set of wires under test is tested more than once and compared to different sets of wires under test. However, this increases the number of test configurations that must be downloaded and applied.

Figure 12.13. FPGA routing BIST architectures: (a) comparison based and (b) parity based.

The other BIST approach is parity based as illustrated in Figure 12.13b. In this approach, the TPG sources exhaustive test patterns over one set of N wires under test and produces a parity bit that is also sent to the ORA. The ORA performs a parity check function to detect faults [Sun 2000]. One problem with the original approach was that the parity bit was assumed to be routed over fault-free interconnect resources. However, the faulty/fault-free status of the routing resources is unknown at the start of testing. This approach was later modified to send the parity bit over a wire under test for a total of N + 1 wires under test and to incorporate multiple types of TPGs and ORAs (for example, countup with even parity and countdown with odd parity) to produce opposite logic values needed to detect PIP stuck-on faults and bridging faults [Sunwoo 2005].

The detailed implementation of these routing BIST approaches varies considerably depending on the types of PIPs, wire segments, and faults targeted for testing. For example, when testing global routing resources, multiple ORAs can be located along the set of wires under test, as illustrated in Figure 12.14a, to test the global-to-local routing resource connections along with the global interconnect. When testing local routing resources, on the other hand, wires under test may be routed through PLBs (through the LUT bypassing the flip-flop) or to adjacent PLBs, as illustrated in Figure 12.14b and Figure 12.14c, respectively. These basic implementations must be reconfigured a number of times to test all of the routing resources along the potential paths. In addition, because of the directional nature of multiplexer PIPs used in most current FPGA interconnect networks, these routing BIST architectures must be flipped about the vertical axis to test routing resources with signal flow in the opposite direction. Finally, these routing BIST architectures must also be rotated to test the vertical routing resources. Hence, the total number of routing BIST configurations tends to be large. Although partial reconfiguration can reduce the total time required to download the complete set of test configurations, the total time to test the programmable interconnect resources remains a dominant factor in FPGA testing.

Figure 12.14. Example routing BIST implementations: (a) global routing, (b) local routing, and (c) local routing adjacent PLBs.

By filling the FPGA with many small routing BIST circuits consisting of independent TPGs, ORAs, and sets of wires under test in the FPGA, the diagnostic resolution based on the BIST results can be improved because an ORA indicating the presence of a fault also identifies the self-test area containing the fault. For example, consider the local routing BIST implementations illustrated in Figure 12.14b and Figure 12.14c. A failing ORA indication in the PLB feed-through implementation isolates the fault within the horizontal routing resources being tested along that row of PLBs. In the adjacent PLBs implementation of Figure 12.14c, a failing ORA indication isolates the fault to the neighboring PLBs. For manufacturing yield enhancement, this resolution may be sufficient for failure mode analysis or for reconfiguring around the faulty area in coarse-grain fault-tolerant system applications. However, for fine-grain fault-tolerant system applications, better diagnostic resolution requires additional test configurations to subdivide the set of wires under test for identification of the faulty wire under test [Stroud 2002b]. Additional test configurations can then be applied to reroute the suspected faulty wire under test for identification of the specific faulty wire segment or PIP.

To illustrate the development of BIST configurations for the programmable routing resources, assume that each example PLB of Figure 12.11 has an associated set of multiplexer PIPs, as illustrated in Figure 12.15. The PLB and its routing resources form a unit cell, which is repeated in an N × M array to form the core of an FPGA. In this example, there are two input signals sourced from the unit cell above (or north) denoted NI0 and NI1. Similarly there are two input signals from each of the other three directions (east, west, and south), denoted EI0, EI1, WI0, WI1, SI0, and SI1, respectively. As a result, there are a total of 8 input signals from adjacent PLBs to the set of multiplexer PIPs. When combined with the two outputs (Sout and Cout) of the PLB, 10 possible signals can be selected for inputs (D0 through D3) to the PLB, as illustrated in Figure 12.15a, or selected for outputs to be routed to adjacent unit cells, as illustrated in Figure 12.15b. Each multiplexer PIP has 8 inputs and is constructed as a nondecoded multiplexer with eight associated configuration bits as illustrated in Figure 12.15c. The configuration bits for each of the 12 multiplexer PIPs that form the programmable routing resources in a unit cell are summarized in Table 12.10 in terms of the input that would be selected for a given multiplexer when the associated configuration bit is activated (set to a logic 1); at most, 1 configuration bit will be activated for a given multiplexer PIP. Note that the configuration bit values specified in Table 12.10 are with respect to the specified ordering of the configuration bits given in the table; for example, CB7 is the most significant bit (MSB) and CB0 is the least significant bit (LSB) for the multiplexer producing D0. The routing resources for a unit cell require 96 configuration bits compared to only 22 for the PLB, which conforms with the fact that routing resources account for about 80% of the total configuration bits in a typical FPGA.

Programmable routing resources for simple FPGA example: (a) routing resources for PLB inputs, (b) routing resources for PLB outputs, and (c) multiplexer PIP for PLB routing resources.

Figure 12.15. Programmable routing resources for simple FPGA example: (a) routing resources for PLB inputs, (b) routing resources for PLB outputs, and (c) multiplexer PIP for PLB routing resources.

Table 12.10. Configuration Bits for Multiplexer PIPs of Simple FPGA

MUX Out	Config Bits	MUX PIP InputsMSB-Activated Configuration Bits LSB
D0	CB7-0	NI0	NI1	EI0	EI1	WI0	WI1	SI0	Sout
D1	CB15-8	NI0	NI1	EI0	EI1	WI0	Sout	SI0	SI1
D2	CB23-16	NI0	NI1	EI0	Sout	WI0	WI1	SI0	SI1
D3	CB31-24	NI0	Sout	EI0	EI1	WI0	WI1	SI0	SI1
NO0	CB39-32	Cout	Sout	EI0	EI1	WI0	WI1	SI0	SI1
NO1	CB47-40	Cout	Sout	EI0	EI1	WI0	WI1	SI0	SI1
EO0	CB55-48	NI0	NI1	Cout	Sout	WI0	WI1	SI0	SI1
EO1	CB63-56	NI0	NI1	Cout	Sout	WI0	WI1	SI0	SI1
WO0	CB71-64	NI0	NI1	EI0	EI1	Cout	Sout	SI0	SI1
WO1	CB79-72	NI0	NI1	EI0	EI1	Cout	Sout	SI0	SI1
SO0	CB87-80	NI0	NI1	EI0	EI1	WI0	WI1	Cout	Sout
SO1	CB95-88	NI0	NI1	EI0	EI1	WI0	WI1	Cout	Sout

To provide feedback for sequential logic functions such as counters, the Sout output of the PLB is also an input to the input multiplexer PIPs D0 through D3. The output multiplexer PIPs provide the selection of signals to be sent to each of the four adjacent unit cells such that there are two output signals per side of the unit cell, denoted NO0 and NO1 for the north side, EO0 and EO1 for the east side, WO0 and WO1 for the west side, and SO0 and SO1 for the south side of the PLB. Each output multiplexer can select either of the two outputs of the PLB (Cout or Sout) or six of the eight signals coming into the unit cell. In the latter case, those signals would pass through this unit cell on to an adjacent unit cell, which provides the primary mechanism for routing between nonadjacent cells in our simple FPGA.

Any given multiplexer PIP in this example architecture will require a minimum of eight test configurations. For complete testing with eight configurations, each configuration must test a different PIP for a stuck-off fault and at least one other PIP for a stuck-on fault. The PIP stuck-off fault is tested by a simple continuity test in which both logic 0 and logic 1 are passed through the activated PIP in the signal path from TPG to ORA. To test for an unselected input PIP stuck-on, we must apply opposite logic values to those being passed through the activated PIP for that multiplexer such that if the PIP were stuck-on, the opposite logic value would affect the logic value at the input to the buffer. If only one PIP is tested for the stuck-on fault during each configuration, then a different PIP must be tested during each of the eight test configurations. One difficulty in developing these test configurations is routing the appropriate opposite logic values to the unselected multiplexer input whose PIP is to be tested for the stuck-on fault.

Theoretically, a minimum of eight test configurations would be required to test all of the multiplexer PIPs concurrently, but this is not possible in practice. For example, because any given PLB can function as TPG or ORA, separate sets of test configurations must be applied to test the multiplexers at the inputs to the PLB while functioning as an ORA and to test the output multiplexers when the PLB functions as a TPG. Furthermore, the interconnection pattern (as summarized in Table 12.10 for our simple FPGA) prevents testing all of the multiplexer PIPs concurrently. For example, assume we implement the comparison-based routing BIST architecture illustrated in Figure 12.13a using the test configuration implementation in Figure 12.14c. In this case, only two of the input multiplexers will be tested during any given configuration. The PLBs functioning as ORAs could implement the logic equation (D0 ⊕ D3) + D1 in the combined four-input LUT. As a result, only input multiplexers for D0 and D3 are tested, while the multiplexer for D1 selects the feedback from Sout and the multiplexer for D2 is unused. Assuming the TPGs to the left and right of the ORA are producing opposite logic values on their two outputs, then four test configurations can be implemented in this arrangement with the following sequence of inputs selected for the two multiplexer PIPs under test: D0 = {EI0, EI1, WI0, WI1} and D3 = {WI0, WI1, EI0, EI1}. In this case, we will have tested four of the eight total inputs for both stuck-on and stuck-off faults in these two multiplexers. By rotating the architecture, we can test the north and south inputs to these multiplexers in an additional four configurations. A similar set of eight configurations would be needed to test the multiplexers for inputs D1 and D2. Note that the Sout input to these four multiplexers cannot be tested when the PLB is an ORA, but instead it can be tested by passing signals through the LUTs using the PLB feed-through implementation illustrated in Figure 12.14b.

Embedded Processor-Based Testing

The ability of embedded processor cores (either hard or soft cores) to write and read the configuration memory via an internal configuration access port (ICAP) facilitates reconfiguration from within the FPGA. If the processor and program memory are dedicated resources with access to the configuration memory that does not require any additional programmable resources, then the programmable resources of the FPGA can be tested in one test session [Sunwoo 2005]. Otherwise, two test sessions must be applied to the FPGA, as illustrated in Figure 12.16. In each test session, the resources in half of the array are under test, while the other half of the array implements the processor core, TPGs, and interface circuitry to the ICAP [Milton 2006]. Once half of the array is tested, the positions are swapped through a full reconfiguration of the FPGA from external sources, and the other half of the array is tested. Although this doubles the number of test configurations, the overall test time is reduced because internal reconfiguration by the processor core is performed at a much higher clock rate than downloading (usually by a factor of 5 to 25). In addition, the embedded processor core has parallel access to the configuration memory (typically 8, 16, or 32 bits).

Figure 12.16. Soft core embedded processor-based BIST for FPGAs: (a) session 1 and (b) session 2.

Locating the TPGs in the half of the array with the processor core facilitates implementation of the circular comparison BIST architecture for improved diagnostic resolution. The processor core can also perform the TPG function in some cases, but this introduces the problems associated with a single TPG such as excessive loading and faulty resources under test escaping detection when the TPG is faulty. The latter problem raises the issue of what happens when the processor core or the programmable resources used to construct the processor core are faulty. Although this remains an open problem in embedded processor based BIST of FPGAs, one approach is to emulate faults by manipulating configuration memory bits in the BUTs to create a set of expected failures. This provides a sanity check that the embedded processor is working.

Two approaches have been developed for implementing BIST using embedded processors. In one approach, the BIST configurations are algorithmically generated from within the embedded processor. In this approach, a relatively small program is stored in the program memory of the embedded processor core, which is then used to generate the complete initial BIST configuration as well as to reconfigure the resources under test for the subsequent test configurations [Sunwoo 2005]. In the other approach, the initial BIST configuration for the resources to be tested is downloaded into the FPGA along with the program for reconfiguration of the resources under test for subsequent BIST configurations [Milton 2006]. In this second approach, there is a download for each test session associated with the resources to be tested. Although the first approach gives a better test time speedup, the development effort and size of the reconfiguration program is larger than the second approach. Both of these approaches can be used to test the programmable logic and routing resources as well as other embedded cores such as memories and DSPs. In both approaches, the embedded processor controls and executes the BIST sequence and retrieves the BIST results from the ORAs for each BIST configuration and can also perform on-chip diagnosis in addition to performing reconfiguration of the FPGA for subsequent BIST configurations in the test session [Stroud 2005b].

Experimental results from an actual implementation of an embedded processor-based BIST approach are summarized in Table 12.11. In this BIST approach, the processor generates the initial BIST configurations internally in addition to executing the BIST sequence, retrieving BIST results, and reconfiguring the FPGA for subsequent BIST configurations. In this implementation, the processor was a hard core with dedicated program memory such that all resources in the FPGA could be tested in parallel. Table 12.11 compares the time for external download and execution of a complete set of BIST configurations (see the “External” column) with embedded processor-generated BIST configurations (see the “Processor” column) [Sunwoo 2005]. It should be noted that the download time for the processor-based BIST approach is the time required to load the program memory and that the execution time includes the time required to reconfigure the FPGA and execute the BIST sequence.

Table 12.11. Test Time Speedup with Embedded Processor-Based BIST

Resource	Function	External	Processor	Speedup
PLB	Download	7.680 sec	0.101 sec	76.0
BIST	Execution	0.016 sec	0.085 sec	0.2
	Total time	7.696 sec	0.186 sec	41.4
Routing	Download	20.064 sec	0.110 sec	182.4
BIST	Execution	0.026 sec	0.343 sec	0.075
	Total time	20.090 sec	0.453 sec	44.3
Total Test Time		27.786 sec	0.639 sec	43.5

To algorithmically and efficiently reconfigure the FPGA resources under test from the embedded processor core, the BIST architecture, including the TPG and ORA routing to the resources under test, should be constant for all BIST configurations associated with that particular test session. This not only reduces the time required to reconfigure, but more importantly it reduces the size of program memory needed to store the program for BIST reconfiguration. This is critical, because the program memory is usually limited to either the size of the dedicated program memory or half of the RAMs cores in the FPGA. Otherwise, the BIST architecture must be confined to a smaller portion of the FPGA (one fourth of the array, for example), which increases the total number of BIST sessions and the resultant test time.

Concluding Remarks

With the incorporation of embedded cores including RAMs, DSPs, and processors, FPGAs more closely resemble SOC implementations. At the same time, more SOCs are incorporating embedded FPGA cores. The programmability of FPGAs facilitates the implementation of a wide range of applications and, as a result, presents a number of testing solutions as well as a number of testing challenges. For example, FPGAs can be reprogrammed during system-level offline testing to test other components and functions on a PCB [Stroud 2002a]. Similarly, the PLBs and routing resources of an FPGA core can be reprogrammed to test the other embedded cores within SOCs such as RAM and DSP cores [Abramovici 2002]. With algorithmic generation, execution, and diagnosis of BIST configurations from an embedded processor core, a single program can be stored and used for manufacturing testing or incorporated into the system for on-demand BIST and diagnosis of the FPGA core for fault-tolerant applications. Therefore, FPGA testing techniques are becoming increasingly important for a broader range of system applications. FPGA testing challenges continue to increase with the introduction of new cores and architectures. On the other hand, these testing challenges in conjunction with the programmability of FPGAs provide an excellent platform for research and development of new SOC test architectures, strategies, and methodologies, such as silicon debug and diagnosis [Abramovici 2006].

Exercises

12.1

(Configuration) Determine the configuration bits needed to implement the comparison-based ORA shown in Figure 12.9b into the PLB shown in Figure 12.11.

12.2

(Test Pattern Generation) What is the minimum number of PLBs needed to implement one TPG to test the PLB of Figure 12.11?

12.3

(Test Pattern Generation) What is the number of loads on each output of each TPG used to test PLBs assuming the N × N array of PLBs shown in Figure 12.11 and the basic comparison BIST architecture shown in Figure 12.8a with two TPGs constructed from PLBs?

12.4

(Output Response Analysis) How many PLBs are need to implement a complete set of circular comparison ORAs to test a total of N BUTs using the example PLB shown in Figure 12.11?

12.5

(Programmable Logic Blocks) How many test configurations are needed to completely test the example PLB shown here? Specify the test configurations in terms of the values for 25 configuration bits CB₀—CB₂₄ for each test configuration. Assume that exhaustive test patterns are applied to inputs A-D and reset during each test configuration.

12.6

(Diagnosis) Given the following set of ORA results from a circular comparison of 10 BUTs (B₀–B₉), determine the faulty BUTs:

ORA	O₀₁	O₁₂	O₂₃	O₃₄	O₄₅	O₅₆	O₆₇	O₇₈	O₈₉	O₉₀
Results	1	0	0	1	1	0	1	0	0	1

12.7

(Routing Resources) Determine the number of test configurations in terms of the configuration bits C₀–C₇ and associated input test vectors to test the two stages of multiplexer PIPs shown here:

Acknowledgments

The author wishes to acknowledge the assistance of Mustafa Ali, Bobby Dixon, Lee Lerner, and Daniel Milton of the Auburn University Department of Electrical and Computer Engineering Built-In Self-Test Laboratory during the preparation of this chapter. In addition, the author would like to acknowledge Professor Wen-Ben Jone of the Department of Electrical and Computer Engineering of the University of Cincinnati, Cincinnati, Ohio, for his excellent comments and ideas for the enhancement of this chapter.

References

Books

Overview of FPGAs

Testing Approaches

BIST of Programmable Resources

Embedded Processor-Based Testing

Concluding Remarks

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Field Programmable Gate Array Testing

Create new playlist

Sign In

Sign Up

Chapter 12. Field Programmable Gate Array Testing

About This Chapter

Overview of FPGAs

Architecture

Configuration

The Testing Problem

Testing Approaches

External Testing and Built-In Self-Test

Online and Offline Testing

Application Dependent and Independent Testing

BIST of Programmable Resources

Logic Resources

Programmable Logic Blocks

Input/Output Cells

Specialized Cores

Diagnosis

Interconnect Resources

Embedded Processor-Based Testing

Concluding Remarks

Exercises

Acknowledgments

References

Books

Overview of FPGAs

Testing Approaches

BIST of Programmable Resources

Embedded Processor-Based Testing

Concluding Remarks

Table of Contents for
12. Field Programmable Gate Array Testing