,

10

Programmable and Configurable Analog Neuromorphic ICs

images

A fundamental capability of any system, whether conventional or neuromorphic, is the ability to have long-term memories whether for program control, parameter storage, or general configurability. This chapter provides an overview of a programmable analog technology based on floating-gate circuits for reconfigurable platforms that can be used to implement such systems. It covers basic concepts of floating-gate devices, capacitor-based circuits, and charge modification mechanisms that underlie this configurable analog technology. It also discusses the extension of these techniques to program large arrays of floating-gate devices. The analog programmability afforded by this technology opens up possibilities to a wide range of programmable signal processing approaches (e.g., image processing) enabled through configurable analog platforms, such as large-scale field programmable analog arrays (FPAA).

10.1   Introduction

Over the last decade, floating-gate circuit approaches have progressed from a few foundational academic results (Hasler et al. 1995; Shibata and Ohmi 1992) to a stable circuit and system technology with both academic and industrial applications. This programmable analog technology empowers analog signal processing approaches. An analog technology that is programmable can enable analog components to be seen as nearly as user-friendly as configurable digital options. This approach allows power-efficient computing for analog signal processing that is 1000–10,000 times more efficient than custom digital computation, making a range of portable applications not possible for over a decade a possibility today (Mead 1990).

The goal of this chapter is to develop general understanding of these programmable analog techniques. The next few sections detail the basic concepts of programmable analog technology. Section 10.2 discusses the basic concepts for floating-gate devices. Section 10.3 describes capacitor-based circuits, which are the basis of floating-gate circuit approaches. Section 10.4 describes the basic mechanisms for modifying the charge on a floating-gate device, and therefore making this analog technology programmable. Section 10.5 describes the techniques to extend these techniques to program an array of floating-gate devices, where each could be performing different computations.

In particular, many of these techniques are valuable, given the development and availability of large-scale field programmable analog arrays (FPAAs) that enable applications for nonintegrated-circuit designers.

10.2   Floating-Gate Circuit Basics

Figure 10.1 shows the layout, cross section, and circuit symbol for the floating-gate p-channel FET (pFET) device (Hasler and Lande 2001). A floating gate is a polysilicon gate surrounded by silicon dioxide. Charge on the floating gate is stored permanently, providing a long-term memory, because it is completely surrounded by a high-quality insulator. The layout shows that the floating gate is a polysilicon layer that has no contacts to other layers. This floating gate can be the gate of a metal oxide semiconductor field effect transistor (MOSFET) and can be capacitively connected to other layers. In circuit terms, a floating gate occurs when there is no DC path to a fixed potential. No DC path implies only capacitive connections to the floating node, as seen in Figure 10.1.

The floating-gate voltage, determined by the charge stored on the floating gate, can modulate a channel between a source and drain, and therefore, can be used in computation. As a result, the floating-gate device can be viewed as a single transistor with one or several control gates where the designer controls the coupling into the surface potential. Floating-gate devices can compute a wide range of static and dynamic translinear functions by the particular choice of capacitive couplings into floating-gate devices (Minch et al. 2001).

10.3   Floating-Gate Circuits Enabling Capacitive Circuits

Floating-gate circuits provide IC designers with a practical, capacitor–based technology, since capacitors, rather than resistors, are a natural result of a MOS process. Programmable floating-gate circuit techniques, along with long-time retention characterization, have shown improved performance for operational transconductance amplifiers (OTA) (Srinivasan et al. 2007), voltage references (Srinivasan et al. 2008), filters (Graham et al. 2007), and data converters (Ozalevli et al. 2008b) as well as a range of other analog circuit approaches. Figure 10.2 shows key basic capacitor circuit elements. Figure 10.2a shows the capacitive equivalent to a resistive voltage divider. The resulting expression is as expected from a capacitive divider, except that we have an additional voltage, Vcharge, that is set by the charge (Q) at the output node, as Vcharge = Q/(C1 + C2). Figure 10.2b shows the capacitor circuit for feedback around an amplifier. As expected, the closed loop for this amplifier is −C1/C2; the output of this amplifier also has a voltage term due to the charge stored (Q) at the ‘−’ input terminal, where Vcharge = Q/C2.

images

Figure 10.1 Layout, cross section, and circuit diagram of the floating-gate pFET in a standard double-poly, n-well process: The cross section corresponds to the horizontal line slicing through the layout view. The pFET transistor is the standard pFET transistor in the n-well process. The gate input capacitively couples to the floating gate by either a poly–poly capacitor, a diffused linear capacitor, or a MOS capacitor, as seen in the circuit diagram (not explicitly shown in the other two figures). Between Vtun and the floating gate is our symbol for a tunneling junction – a capacitor with an added arrow designating the direction of charge flow. © 2005 IEEE. Reprinted, with permission, from Hasler (2005)

Figure 10.2c shows a more realistic circuit model for the amplifier circuit in Figure 10.2b. This circuit description, which includes parasitic capacitances, is described as a single pole and zero system, assuming the amplifier sets a single pole (i.e., the amplifier has a frequency independent transconductance). In general, we can describe the amplifier as a transconductance amplifier, assuming the gain set by the capacitors is lower than the amplifier gain. Increasing Cw increases the input linear range; typically, Cw is larger than C1 and C2, where Cw models the input capacitance of the amplifier, explicitly drawn capacitance, and parasitic capacitances. Increasing the function CwCL/C2 proportionally increases the signal-to-noise ratio (SNR) (in signal power); therefore, unlike many filters where output noise and SNR are set by the load capacitance (kT/C thermal noise), this function allows for lower drawn capacitances for a given noise floor. When we improve the linear range of this amplifier, we simultaneously improve the SNR of the amplifier. These circuit approaches extend transconductance-C filter approaches to allow some parameters to be set by the ratio of capacitors (Graham et al. 2004), such as bandpass gain, the linear range, the noise level, and bandwidth. These approaches, coupled with the array programming techniques discussed in Section 10.4, result in an accurate low-power filter technology.

images

Figure 10.2 Basic capacitor circuits. (a) Capacitive divider circuit. (b) Capacitive feedback around an amplifier. The assumption is that the amplifier has MOS inputs; therefore inputs are practically only capacitive. (c) Assuming amplifier has a finite Gm, we identify the key parameters (bandwidth, signal-to-noise ratio, and input linear range) for this amplifier. (d) Circuit modification for direct measurement of capacitive sensors. © 2005 IEEE. Reprinted, with permission, from Hasler (2005)

Figure 10.2d shows a slight extension of the other circuits toward measuring changes in capacitive sensors (i.e., microelectromechanical systems (MEMs) sensors). Analyzing this circuit with an ideal amplifier with gain Av, we get

images

where Cw is the capacitance at the ‘−’ amplifier input, including the amplifier input capacitance. This circuit configuration attenuates the effect of sensor capacitance and Cw by the amplifier gain, an effect that is only a DC term, assuming V1 remains constant. For example, for Csensor = 1 pF, maximum ΔCsensor = 2 fF (typical of some MEMs sensors), and Av = 1000, we choose C2 = 20 fF for a maximum Vout change of 1 V, resulting in an output offset voltage of 0.25 V. The constant part of Csensor, as well as Cw, increases the linear range of the closed-loop circuit.

The circuit is still a first-order system (assuming a frequency independent amplifier with transconductance Gm) with its transfer function in the Laplace domain described by

images

It has the same time constant (τ) as the amplifier in Figure 10.2c, and the zero, due to capacitive feedthrough, is typically at much higher frequency responses than the amplifier bandwidth. Typically, Csensor and CL (the output load capacitance, as in Figure 10.2c) are roughly equal in size (CL might be larger), and are larger than C2. For the example above, with CL = Csensor = 1pF, the resulting bandwidth will be as shown in the following table:

images

The resulting output noise (entire band) is

images

where n is the equivalent number of devices contributing to the amplifier noise, and VIC is the ratio of the transconductance (Gm to the differential pair transistor’s bias current. For the example above, with a typical low-noise amplifier stage with input transistors operating with subthreshold currents result in 0.5 mV total noise, resulting in an SNR of roughly 66 dB between the maximum capacitor deflection and the minimum deflection. For our example circuit, the maximum capacitance change of 2 fF gives an output of 1 V, where a 1 aF change is at the 0 dB SNR level; by restricting the bandwidth of interest or making the amplifier bandwidth larger than the bandwidth of interest, the resulting sensitivity will increase. In practice, a bank of capacitors that can be switched into the circuit can be used to alter C2, and therefore the dynamic range and noise of these signals. Figure 10.2d shows the switching between gain levels; the switch is not at the charge storage node because an MOS switch on the ‘−’ terminal increases the leakage current at this node, decreasing hold time. These results have been experimentally verified through use of variable MEMs capacitor devices. In one particular system, a 100 aF capacitor change was observed resulting in 37.5 mV change for an amplifier with noise significantly less than 1 mV; therefore, 3 fF change resulted in a 1.13 V output swing and 3 aF change resulted in a 1 mV output swing (0 dB SNR point). The bandwidth of the amplifier was greater than 1 MHz.

10.4   Modifying Floating-Gate Charge

The floating-gate charge is modified by applying large voltages across a silicon–oxide capacitor to tunnel electrons through the oxide or by adding electrons using hot-electron injection. Although the physics of floating-gate circuits are discussed extensively elsewhere (Hasler et al. 1995, 1999; Kucic et al. 2001), they are briefly reviewed here.

10.4.1   Electron Tunneling

Charge is added to the floating gate by removing electrons using electron tunneling. Increasing the voltage across this tunneling capacitor, either by increasing the tunneling voltage (Vtun) or decreasing the floating-gate voltage, increases the effective electric field across the oxide, thereby increasing the probability of the electron tunneling through the barrier (Figure 10.3d). Starting from the classic model of electron tunneling, given as

images

where ε0 is a fundamental parameter derived from a Wentzel–Kramers–Brillouin (WKB) solution of Schrödinger’s equation, tox is the thickness of the oxide dielectric, and Vox is the voltage across the dielectric, we can derive an approximate model for the electron tunneling current around a given voltage across the oxide (tunneling voltage minus floating-gate voltage) as (Hasler et al. 1995, 1999)

images

where Vx is a tunneling device-dependent parameter that is a function of the bias voltage across the oxide.

10.4.2   pFET Hot-Electron Injection

pFET hot-electron injection is used to add electrons (remove charge) to the floating-gate. pFET hot-electron injection is used because it cannot be eliminated from a complementary metal oxide semiconductor (CMOS) process without adversely affecting basic transistor operation, and therefore will be available in all commercial CMOS processes. One might wonder how pFETs, where the current carriers are holes, inject hot electrons onto the floating gate. Figure 10.3e shows the band diagram of a pFET operating under bias conditions that are favorable for hot-electron injection. The hot-hole impact ionization creates electrons at the drain edge of the drain-to-channel depletion region, due to the high electric fields there. These electrons travel back into the channel region, gaining energy as they go. When their kinetic energy exceeds that of the silicon–silicon-dioxide barrier, they can be injected into the oxide and transported to the floating gate. To inject an electron onto a floating gate, the MOSFET must have a high electric field region (>10 V/μm) to accelerate channel electrons to energies above the silicon–silicon-dioxide barrier, and in that region the oxide electric field must transport the electrons that surmount the barrier to the floating gate. As a result, the subthreshold MOSFET injection current is proportional to the source current, and is the exponential of a smooth function of the drain-to-channel potential (Φdc); the product of these two circuit variables is the key aspect necessary to build outer-product learning rules. The first-principled model presented next is derived from basic physics that shows the resulting exponential functions derived in Duffy and Hasler (2003).

images

Figure 10.3 Approach to modifying floating-gate charge. (a) Basic circuit representation for a floating-gate device. (b) A programmable floating-gate differential pair. Both the input transistors as well as the current source transistors are programmed. Practically, it may be desirable to develop approaches to program the offset voltage for the amplifier, as well as the value for the current source. (c) Current–voltage curves from a programmed pFET transistor. We modify charge by a complimentary combination of electron tunneling (weaker pFET transistor) and hot-electron injection (stronger pFET transistor). (d) Basic picture of electron tunneling in Si–SiO2 system (e) Basic picture of pFET hot-electron injection. Some holes moving through the channel gain sufficient energy to create an impact ionization event, generating electrons that increase in energy moving toward the channel. Some of these electrons will have sufficient energy to surmount the Si–SiO2 barrier and arrive at the gate terminal. © 2005 IEEE. Reprinted, with permission, from Hasler (2005)

A simplified model for pFET injection that is useful for hand calculations relates the hot-electron injection current for a channel current (Is) and drain-to-source (ΔVds) voltage as

images

where Iinj0 is the injection current when the pFET is operating with a channel current reference (Is0), where Is = Is0 at this reference current, and a drain-to-source voltage, Vinj is a device and bias dependent parameter, and α is images. Typical values for Vinj in a 0.5 μm CMOS process are 100–250 mV.

Choosing the appropriate model for simulation is critical for these devices. For example, when simulating a set of floating-gate devices that will be programmed, one typically does not need to implement the tunneling and injection currents, but rather make sure at the beginning of the simulation that the floating-gate voltages/bias currents are set correctly based upon the behavior of the particular programming scheme. In this mode of operation, one can set the floating-gate voltage through a very large resistor; for the total capacitance at a floating-gate node of 100 fF (a small device), a resistor of 1026 Ω is consistent with the typical room temperature voltage drop of 4 μV over a 10-year period for a 10 nm oxide (Srinivasan et al. 2005). In some cases, transistor equivalent circuits can be used to simulate adaptive floating-gate elements, such as the capacitively coupled current conveyor (C4) second-order section circuit and the synapse element (Graham et al. 2004; Srinivasan et al. 2005); these techniques tend to be useful circuits in their own right for applications requiring fast adaptation rates.

10.5   Accurate Programming of Programmable Analog Devices

The charge modification schemes, along with their detailed modeling, opens the door for accurate programming of a large number of floating-gate devices being utilized by a diverse set of circuits. Figure 10.4a shows the starting point for the story of automatically programming a large array of floating-gate elements. Figure 10.4a illustrates how programmable devices are accessed, which is defined here as Prog or program mode, and how computation is performed using these elements, which is defined as Run mode. Going from Run mode to Prog mode, means electrically reconfiguring all circuits such that each floating-gate device is configured into a two-dimensional mesh array of devices with the drain and gate lines moving in orthogonal directions. Individual elements are isolated (access to an individual gate and drain line) in a large matrix using peripheral control circuitry (Hasler and Lande 2001; Kucic et al. 2001). A standard technique like this is necessary when working with thousands and millions of floating-gate elements on a single die.

This programming scheme minimizes interaction between floating-gate devices in an array during the programming operation. Other elements are switched to a separate voltage to ensure that those devices will not inject. A device is programmed by increasing the output current using hot-electron injection, and erased by decreasing the output current using electron tunneling. Because of the poorer selectivity, tunneling is used primarily for erasing and for rough programming steps. This scheme performs injection over a fixed time window (from 1 μs to 10 s and longer) using drain-to-source voltage based on the actual and target currents. Most rapid programming techniques use pulse widths in the 10–100 μs range, which potentially enable the programming of large arrays of floating-gate elements in mass production. Developing an efficient algorithm for pFET programming requires discussing the dependencies of the gate currents as a function of the drain-to-source voltage. This scheme also measures results at the circuit’s operating condition for optimal tuning of the operating circuit (no compensation circuitry needed). Once programmed, the floating-gate devices retain their channel current in a nonvolatile manner. A custom programming board (PCB board) programs large floating-gate arrays around these standards (Kucic et al. 2001; Serrano et al. 2004), and on-going work is continuing to move all of these blocks on-chip using row-parallel programming techniques (Kucic 2004). These approaches have been used by over 40 researchers on hundreds of IC projects. The limiting factor for rapid programming of these devices is the speed and accuracy of the current measurements, rather than the hot-electron injection physics.

images

Figure 10.4 Programming of large number of floating-gate elements. (a) Infrastructure takes any number of floating-gate elements on a chip during Run mode, and reconfigures these devices into a regular array of floating-gate elements. (b) Hot electron injection requires both channel current (subthreshold) and high electric field; therefore in an array of devices a single element can be accessed using an ANDing scheme, both for measurement and for programming. (c) Experimental measurements for programming a sequence of functions of different amplitudes. The corresponding percentage error is plotted below the data; the experimental error (for this example bounded between 0.3% and 0.7%) does not correlate with the experimental waveform. © 2005 IEEE. Reprinted, with permission, from Hasler (2005)

How accurately can these devices be programmed? In the end, the accuracy is limited by change in voltage (ΔV) due to one electron (q) moving off of the total capacitance (CT) at the floating node, expressed as

images

For CT of 16 fF (a small device), the voltage change from one electron is 10 μV. For a voltage source based on the floating-gate voltage for a 2 V swing, the accuracy is roughly 0.0005% or 17–18 bits. For a subthreshold current source, the accuracy is roughly 0.05% (11 bits) over the entire subthreshold current range (typically 6–8 orders of magnitude). The error decreases inversely proportional to an increasing CT.

Therefore, the real question is how accurately can the programming algorithm achieve this theoretical limit. The first limitation is the accuracy of measuring the quantity to be programmed; for example, if we only have 8 bit accuracy to measure the output current we want to program, then we cannot expect to achieve significantly higher programming performance through that measurement. Second, is the limitation of the programming algorithm and associated circuitry, including parasitic elements; these approaches are designed to minimize the effect of parasitic elements by design. On the other hand, due to factors like capacitor mismatch, a fine-tuning programming step is often used to improve the effects due to these mismatches (Bandyopadhyay et al. 2005; Graham et al. 2004). Finally, CT can set the thermal noise (kT/C noise) for the previous stage; a 16 fF capacitor will set a total noise for a single subthreshold device as 0.25 mV, an error which if not addressed in the resulting circuit, will be 25 times greater than the programming accuracy. Figure 10.4c shows measurements of programming accuracy from an array of floating-gate devices (CT ≈ 100 fF). Typical experimental results show that devices can be programmed within 0.5–0.1% accuracy over 3.5 decades of target currents, and devices can be programmed to at least as good as 1% accuracy over six decades of target currents, in CMOS processes ranging from 0.5 μm to 0.25 μm CMOS processes (Bandyopadhyay et al. 2006). More recently, the circuitry for programming arbitrary size floating-gate arrays has been fully integrated on chip throughout the pA to μA range (Basu and Hasler 2011).

10.6   Scaling of Programmable Analog Approaches

The scaling down of the oxide thicknesses with CMOS processes makes one question the long-term potential of these analog approaches. From Eq. (10.4), we notice that for identical I0 parameters, the tunneling leakage current is roughly constant for constant field scaling, that is, the power supply and oxide thickness scale are at the same rate, and therefore, these processes should all see minimal floating-gate charge loss at room temperature over a 10-year period. Unfortunately, these expressions are derived for a triangular barrier, and at low oxide voltages, voltages smaller than the equivalent 3.04 eV Si–SiO2 barrier, we have a trapezoidal barrier, and the scaling no longer holds, an issue that becomes important for processes at 0.18 μm and below. For typical 0.18 μm processes, the SiO2 oxide thickness is roughly 5 nm, which correspondingly is seen as the smallest oxide thickness that satisfies the typical 10-year lifetime requirements; most current electrically erasable programmable read only memory (EEPROM) processes use oxide thicknesses of 5 nm or larger.

The logical initial conclusion states that floating-gate circuits are not practical below 0.18 μm CMOS. Fortunately, there are multiple options for this technology to scale down to smaller processes. First, starting in the 0.35 μm and 0.25 μm CMOS processes, modern CMOS processes offer transistors with multiple oxide thicknesses. One application is developing circuits that can interface to larger external voltages (i.e., 5 V, 3.3 V, 2.5 V); these devices (both transistors and capacitors) will give at least one oxide layer at least as thick as 5 nm, and therefore allowing storage devices. These devices will be somewhat larger than the scaled down devices, but only in the active channel area, because the lithography improves the other transistor dimensions and parasitics as well. Because the overlap capacitance of the floating-gate field effect transistor (FET) effectively reduces its maximum gain, these devices do not need long channel lengths; designing for the minimum channel length or smaller is reasonable for these devices, even for precision analog circuits.

Further, the small oxide thicknesses in SiO2 will be limited to roughly 1.3 nm before gate currents are too large for basic CMOS transistor operation. Therefore, the maximum improvement in capacitance will be a factor of three, and the maximum improvement in area will be a factor of nine. In practice, the area improvement will only be a factor of two less than it would be if all devices scale exactly with the process. Given that the capacitance typically sets the SNR for the device, this capacitance factor is less than a 1-bit loss of resolution compared to ideal scaling. Movement toward higher dielectric materials with low leakage in a given CMOS process will equally improve both standard digital CMOS as well as programmable analog CMOS. (‘Leakage’ here refers to transistor source-drain current when the transistor is off, not the substrate leakage; see Section 11.2.8.)

Second, small-oxide devices naturally enable adaptive floating-gate properties even within the typical process voltages. They can have an equilibrium between tunneling junctions and injection elements, as well as tunneling junction on the drain side, and directly obtain properties that have been demonstrated in the autozeroing floating-gate amplifier circuits (Hasler et al. 1996) and adaptive filter circuits (Kucic et al. 2001). The resulting gate leakage currents become a useful circuit property, not an effect to work around. Therefore, it can be concluded that not only will these programmable analog devices be available as CMOS processes scale to smaller dimensions, but they will be possible with a wider range of devices to choose from.

10.7   Low-Power Analog Signal Processing

The last three decades have seen the impact of rapid improvement in silicon IC processing and the accompanying digital processing resulting from this technology. This has resulted in a significant increase in processing and/or significant decrease in the power consumption. With the need for low-power portable and/or autonomous sensor systems, getting as much computation for a fixed power budget becomes more and more critical. Gene’s law, named for Gene Frantz (Frantz 2000), postulates that the computational efficiency (operations versus power) for the specific case of digital signal processing microprocessors (Figure 10.5a) doubles roughly every 18 months due to advances using decreased feature size. Even with these remarkable improvements in digital computation, many digital signal processing algorithms are unreachable in real time within the power budget constraints required for many portable applications. Further, recent results (Marr et al. 2012) show that digital multiply and accumulate operations (MACs) are not improving computational efficiency with improved IC processes, further indicating the need for other approaches and improvements.

images

Figure 10.5 Motivation for using programmable analog technology. (a) Plot of computational efficiency (power consumption/millions of multiply-accumulate operations (MMAC)) for digital signal processing (DSP) microprocessors, the extrapolated fit to these data points (Gene’s Law, Frantz 2000), and the resulting efficiency for a programmable analog system. The typical factor of 10,000 in efficiency improvement between analog and digital signal processing systems enables using low-power computational systems that might be available in 20 years. © 2005 IEEE. Reprinted, with permission, from Hall et al. (2005). (b) Illustration of the trade-offs in cooperative analog–digital signal processing. The question of where to put this boundary line between analog signal processing (ASP) and DSP strongly depends upon the particular requirements of an application

Mead (1990) hypothesized that analog computation could be a factor of 10,000 times more efficient than custom digital processing, for low to moderate (i.e., 6–12 bit SNR) resolution signals. Sarpeskar later quantitatively identified that the cost of analog systems will be lower than the cost of digital systems for low to moderate SNR levels (Sarpeshkar 1997), SNR levels typical of most analog input (e.g., sensory) systems. Thus, the potential of analog signal processing has been seen for a while, but only recently become practical to implement.

This section presents a high-level overview of the range of available programmable analog signal processing technologies, where many of these technologies have proven Mead’s original hypothesis. These approaches are enabled through a programmable analog technology in standard CMOS (Hasler and Lande 2001). These dense analog programmable and reconfigurable elements enable power efficient analog signal processing that is typically a factor of 1000–10,000 more efficient compared to custom digital approaches, as well as a precision analog design approach that scales with digital technology scaling. The result of this technology, shown graphically in Figure 10.5a, enables, in current technology, low-power computational systems that might be available through custom digital technology in 20 years, assuming digital performance scales at the same rate as over the past 20 years. This increase in efficiency can be used for reducing the power requirements of a given problem or addressing computational problems that are considered intractable by the current digital road map. The advance possible through programmable analog signal processing would be greater than the advances in DSP chips from the first marketed DSP chip to today’s ICs.

The range of analog signal processing functions available results in many potential opportunities to incorporate these analog signal processing systems with digital signal processing systems for improved overall system performance. Figure 10.5b illustrates one trade-off between analog and digital signal processing. This model assumes the typical model of signals coming from real-world sensors, which are analog in nature, that need to be utilized by digital computers. One approach is to put an analog-to-digital converter (ADC) as close to the analog sensor signals as possible, to take advantage of the computational flexibility available in digital numeric processors. An alternate approach is to perform some of the computations using analog signal processing, requiring simpler ADC, and reducing the computational load on the digital processors that follow. This new approach is called cooperative analog-digital signal processing (CADSP) and promotes a new vision for the way signal processing systems are designed and implemented, promising substantially reduced power consumption in signal processing systems (Hasler and Anderson 2002).

10.8   Low-Power Comparisons to Digital Approaches: Analog Computing in Memory

Our starting point is a programmable analog transistor that allows for simultaneous nonvolatile storage (i.e., an analog weight), computation (i.e., compute a product between this stored weight and inputs), and analog programming that does not affect the computation, can adapt due to correlations of input signals, and is fabricated in standard CMOS processes; several circuits have been discussed in detail elsewhere (Hasler and Lande 2001). Accurate and efficient programming of these programmable transistors enables the analog signal processing approaches. For example, in a 0.5 μm CM OS process, programming a target current within 0.2% is achievable over a current range from 150 pA to 1.5 μA (Bandyopadhyay et al. 2005). Hundreds to thousands of elements can be programmed to this accuracy in a second. The resulting programmed currents show less than 0.1% change over 10 years at room temperature (300 K). Many circuit topologies work well over a wide temperature range.

images

Figure 10.6 Computational efficiency for large-scale analog signal processing. (a) Comparison of the computing in memory approach for analog computing arrays with standard digital computation requiring a memory and processing units. In the complexity and power of accessing two columns of digital data for processing, we perform a full matrix computation on an incoming vector of data. (b) Illustration of the computational efficiency through a vector-matrix multiplication (VMM) computation. Using vector–vector multiplication (one programmable transistor per box), we compare experimental and simulation results throughout the region of subthreshold bias currents. From this experimental data, this analog VMM has a 4 millions of multiply-accumulate operations (MMAC)/μW computational efficiency (primarily due to the node capacitance), which compares favorably to the best DSP IC values between 4 to 10 MMAC/mW. Figure (b) © 2004 IEEE. Reprinted, with permission, from Chawla et al. (2004)

Figure 10.6 shows a comparison between programmable analog computation with traditional digital computation. The concept of analog computing arrays were introduced as parallel analog signal processing computations performed through programmable analog transistors, which appear similar to a few modified EEPROM cells, and therefore corresponds closely to EEPROM densities. Figure 10.6a shows a general block-diagram of the analog computing array, with comparison to traditional digital computation. Unlike digital memory, each cell acts as a multiplier that multiplies the analog input signal to that cell by the stored analog value. Performing the computation in the memory cells themselves avoids the throughput bottlenecks found in most signal processing systems. This computational approach can be extended to many other signal processing operations and algorithms in a straightforward manner. Therefore, a full parallel computation is possible with the same circuit complexity and power dissipation as the digital memory needed to store this array of digital coefficients at 4-bit accuracy. Further, in the complexity required to simply read out two columns of data to be sent to the digital processor, an entire matrix–vector multiplication as well as potentially more complex computations are performed. The trade-off is the possibly reduced precision of the computation and the requirement for floating- gate programming.

Figure 10.6b shows experimental data from a 128 × 32 vector matrix multiplier (VMM), built using these analog computing array approaches in 0.5 μm CMOS in 0.83 mm2 area (Chawla et al. 2004). The resulting analog computational analog efficiency, defined as the ratio of bandwidth and power dissipation (i.e., MHz/μW), was measured at 4 MMAC/μW; the power efficiency is tied to the node capacitance, and therefore values greater than 20 MMAC/μW can be achieved in this process (Chawla et al. 2004; Hasler and Dugger 2005). For comparison, the most power-efficient DSP processors (i.e., TI 54C series or DSP factory series) have power efficiencies of 1–8 MMAC/mW (best cases); therefore, the analog approach is 1000 times more computationally efficient than the resulting digital approach. Other analog signal processing techniques have further shown computational power efficiency improvements of between 300–1000-fold over digital approaches (Graham et al. 2007; Hasler et al. 2005; Ozalevli et al. 2008a; Peng et al. 2007). Further such approaches resulted in significant power efficiencies in commercial end products (Dugger et al. 2012). Recent results show greater improvement in power efficiency based on neural-inspired approaches enabling wordspotting computation through a dendritic-rich neuron network (George and Hasler 2011; Ramakrishnan et al. 2012).

10.9   Analog Programming at Digital Complexity: Large-Scale Field Programmable Analog Arrays

The last decades have seen the development of large-scale FPAA technology (Hall et al. 2002, 2004). FPAAs have historically had few programmable elements and limited interconnect capabilities, primarily targeted as a replacement for a few op-amp ICs on a board (Anadigm 2003a, 2003b; Fas 1999; Lee and Gulak 1991; Quan et al. 1998). Similar to the evolution in configurable logic from programmable logic devices (PLD) to FPGAs, FPAA chips (Figure 10.7) are moving toward a large number of reconfigurable analog blocks, providing a reconfigurable platform that can be used to implement a number of different applications. The benefits of rapid prototyping for analog circuits would be significant in the design and testing of analog systems.

FPAAs, like FPGAs, allow for rapid prototyping of hardware solutions. A typical fabric is composed of several computational analog blocks (CABs); currently the largest FPAAs utilize 100 CABs with over 100,000 programmable analog parameters, where thousands of CABs are possible using state-of-the-art CMOS technologies. In the FPAA block, the switching device is an analog programmable element that can act as an ideal switch, variable resistor, current source, and/or configurable computational element in a single analog programmable memory element. The resulting chip can be programmed using a high-level digital control interface, typical of FPGA devices, and offers input/output (I/O) pins that connect to the array of CAB elements. Using these analog signal processing techniques for a system design, signal compression is not only essential for efficient transmission from the sensor, but also for efficient digital transmission between the ICs for low-power operation.

Several generations of FPAA devices have been innovated through the last few years including a family of large-scale FPAA devices of CABs and floating-gate routing useful for computation (Basu et al. 2010a, 2010b; Schlottmann et al. 2012a) as well as further architecture revisions enabling partial rapid reconfigurability (Schlottmann et al. 2012b) and integration with low-power configurable digital blocks (Wunderlich et al. 2013). These approaches also developed a USB-powered hardware platform (Koziol et al. 2010), simple enough to be used in class development (Hasler et al. 2011), as well as a wide range of software tools enabling high-level Simulink system design that compiles directly to working hardware (Schlottmann and Hasler 2012). FPAAs have been built for a range of signal processing applications (Ramakrishnan and Hasler 2013; Schlottmann and Hasler 2011; Schlottmann et al. 2012b), as well as neural interfacing applications (Zbrzeski et al. 2010).

images

Figure 10.7 Illustration of large-scale field programmable analog arrays (FPAA). (a) Most previous FPAA ICs are closer to programmable logic devices (PLDs). (b) Large-scale FPAAs have a computational complexity of analog arrays similar to field-programmable gate array (FPGAs), including multiple levels of routing. (c) Top figure shows a die photo of one large-scale FPAA composed of 64 computational analog blocks (CABs) and 150,000 analog programmable elements. The components in one example CAB are shown in the bottom figure of (c). It contains a 4 × 4 matrix multiplier, three wide-linear range OTAs (WR_OTA), three fixed-value capacitors, a capacitively coupled current conveyor C4 second-order section (SOS) filter, a peak detector, and two pFET transistors. (d) A typical subband system. The top figures show how the input signal is first bandpass filtered through a C4 SOS filter before the SOS output is buffered by the WR_OTA and then the magnitude of the subband is output from the peak detector. This sequence is analogous to taking a discrete Fourier transform of the signal. The experimental data are taken from an FPAA system. The input waveform is an amplitude modulated signal with 1.8 KHz and 10.0 KHz components. The output of the peak detector is shown with (dotted line connecting capacitor to output in top figure of (d)) and without an integrating capacitor added to the output stage. Figure (d) © 2005 IEEE. Reprinted, with permission, from Hall et al. (2005)

10.10 Applications of Complex Analog Signal Processing

This section looks at examples of analog signal processing applications where the IC system utilizes a few milliwatts of power. In particular, we will discuss analog transform imagers, CMOS imagers allowing for programmable analog signal processing on the image plane, and continuous-time analog adaptive filters and classifiers. Other interesting applications have been in enhancing speech in noise (Anderson et al. 2002; Yoo et al. 2002), analog front-end systems for speech recognition (Smith et al. 2002; Smith and Hasler 2002), and arbitrary waveform generators and modulators (Chawla et al. 2005). Applications have been proposed in beam-forming, adaptive equalization, radar and ultrasound imaging, software radio, and image recognition.

10.10.1   Analog Transform Imagers

Figure 10.8 shows one example of utilizing programmable analog techniques for an improved combination of CMOS imaging and resulting programmable signal processing with a relatively high fill factor (Hasler et al. 2003; Lee et al. 2005). The transform imager chip is capable of computing a range of matrix transforms on an incoming image. This approach allows for retina and higher-level bio-inspired computation in a programmable architecture that still possesses similar high fill-factor pixels to those of active pixel sensor (APS) imagers. This imager is capable of programmable matrix operations on the image. Each pixel is composed of a photodiode sensor element and a multiplier. The resulting data-flow architecture directly allows computation of spatial transforms, motion computations, and stereo computations. Figures 10.8b and 10.8c show a comparison of using this imager as a single chip solution with using a standard digital implementation for JPEG compression, critical for picture and video enabled handsets, resulting in orders of magnitude improvement in power dissipation, and therefore capable of enabling video streaming on handheld devices.

10.10.2   Adaptive Filters and Classifiers

By continuously programming analog transistors, due to continuous input signals, the transistor strength adapts due to input signal correlations (Hasler and Lande 2001). Adaptive devices enable the development of adaptive signal processing algorithms, like adaptive filters and neural networks, in a dense, low-power, analog signal processing architecture. Weight adaptation is implemented by continuously operating the programming mechanisms, such that the low-frequency circuit adapts to a steady state dependent upon a correlation of input signals and error signals. Hasler and Dugger (2005) reported on a least mean squares (LMS) node based upon an LMS synapse built from continuously adapting programmable transistors; Figure 10.9 summarizes their results. Most other outer-product learning algorithms (supervised or unsupervised) are straightforward extensions of an LMS algorithm. This approach enables large scale, on-chip learning networks by enabling a range of signal processing algorithms. Figure 10.9b shows the testing setup to characterize the weight adaptation behavior of a two-input LMS network over a range of input signal correlations; the experimental data (Figure 10.9c, 0.5 μm process) closely agrees with the ideal analytic expressions, proving the performance of this system. Using this adaptive floating-gate LMS node, arrays are being built on the order of 128 × 128 synapses in a 2 mm × 2 mm area (0.35 μm process) that operate at under 1 mW of power at bandwidths over 1MHz; a custom digital processor or bank of DSP processors performing similar computations on signals of comparable bandwidth would consume 3–10 W. These techniques can be extended to other programmable and adaptive classifiers, like Vector quantization or Gaussian Mixtures (Hasler et al. 2002a; Peng et al. 2005).

images

Figure 10.8 Analog technology applied to imaging signal processing applications. (a) Floorplan of the matrix transform imager. Each pixel processor multiplies the incoming input with the measured image sensor result, and outputs a current proportional to this result. The image output rate will be the same as the time to scan a given image. This approach allows arbitrary separable matrix image transforms that are programmable. © 2002 IEEE. Reprinted, with permission, from Hasler et al. (2002a). (b) In a standard joint photographic experts group (JPEG) implementation, the image acquired by a standard complementary metal oxide semiconductor/charge-coupled device (CMOS/CCD) imager is converted to digital values using an analog-digital converter (ADC) before the block discrete cosine transform (DCT) is computed on each image using a digital signal processor (DSP). Each of these blocks burn anywhere from less than 0.1 W to 1 W. (c) A comparison of a standard image processing system with the transform imager for JPEG computations. The JPEG compression is one representative example for the transform imager, where the matrices are programmed for block DCT transforms. The analog computing array (ACA) ADCs are used to read off the resulting transformed image, but these ADCs could be controlled with variable bit rate based on the information on each channel to result in significant power savings. The transform imager would require roughly 1 mW as a single-chip solution

images

Figure 10.9 An adaptive floating-gate node. (a) Block diagram of a single adaptive floating-gate node. Either supervised algorithms or unsupervised one-layer networks can be built in this architecture. (b) Experimental setup for examining least mean squares (LMS) behavior in a two-input node. A scaling operation followed by application of a rotation matrix to an orthogonal signal-space basis of harmonically related sinusoids yields the system input signals; the fundamental sinusoid is chosen as the target. (c) Measured data show steady-state weight dependence on the parameter, θ, of the two-dimensional input mixing-matrix. As expected theoretically, a cosine curve is obtained for the first weight, and a sine curve for the second weight. Figures (a) and (c) © 2005 IEEE. Reprinted, with permission, from Hasler and Dugger (2005)

10.11 Discussion

Floating-gate technology provides a compact device fabricated in standard CMOS processes, that simultaneously provides long-term (nonvolatile) storage, computation, automated system programmability in a common architecture independent of functionality, and adaptation in a single device. Such devices allow for digital routing as well as programmable computational elements, sometimes in the same structure. This approach has been used both in custom ICs and in reconfigurable architectures to build complex signal processing systems including programmable and adaptive filters, multipliers, amplification, matrix and array signal operations, and Fourier processing. The alternatives to floating-gate technology for dense storage in analog computing systems are a DAC per parameter or complex dynamic storage; when the number of parameters becomes significant (e.g., more than 30), these approaches have huge system level impacts (e.g., Schlottmann and Hasler 2012).

Floating-gate transistors are the foundation of most commercial nonvolatile memories, including EEPROM and flash memories. Current EEPROM devices already store 4 bits (i.e., 16 levels) in a single transistor occupying 100 nm × 100 nm in area in a 32 nm process (Li et al. 2009; Marotta et al. 2010), with smaller technology nodes reaching production as of this writing. As a result, floating-gate circuit approaches can be expected to also scale to smaller process nodes, at similar array densities to EEPROM devices; current research efforts are actively working in such modern IC processes.

Floating-gate devices and circuits are used by multiple groups for multineuron neuromorphic implementations (Brink et al. 2013; Schemmel et al. 2004), including parameter storage, capacitive-based circuits, and adaptive learning networks; the use of these approaches results in the densest synaptic array ICs currently available (Brink et al. 2013).

One critical aspect of floating-gate circuits is the accurate and rapid programming of large numbers of floating-gate devices (Basu and Hasler 2011). The approach presented in this chapter enables, by construction, methods that minimize the effect of capacitive mismatch for programming accurate systems. A limiting factor in programming floating-gate devices is accurate current measurement over a wide range (i.e., pA to μA), which has been improved dramatically through the use of fully on-chip circuitry with all-digital interfaces for IC programming (Basu and Hasler 2011). Such infrastructure development has also served to facilitate the adoption of this technology by application specialists in the neuromorphic and signal processing communities.

Turning away from nonvolatile storage, Chapter 11 describes the digital configuration of analog currents and voltages, as a way of including a small number of configurable but volatile chip parameters using standard CMOS technology.

References

Anadigm. 2003a. Anadigm Company Fact Sheet Anadigm. http://www.anadigm.com/Prs_15.asp/.

Anadigm. 2003b. Anadigm FPAA Family Overview Anadigm. http://www.anadigm.com/Supp_05.asp/.

Anderson D, Hasler P, Ellis R, Yoo H, Graham D, and Hans M. 2002. A low-power system for audio noise suppression: a cooperative analog-digital signal processing approach. Proc. 2002 IEEE 10th Digital Signal Processing Workshop, and the 2nd Signal Processing Education Workshop, pp. 327–332.

Bandyopadhyay A, Serrano G, and Hasler P. 2005. Programming analog computational memory elements to 0.2% accuracy over 3.5 decades using a predictive method. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) 3, pp. 2148– 2151.

Bandyopadhyay A, Serrano G, and Hasler P. 2006. Adaptive algorithm using hot-electron injection for programming analog computational memory elements within 0.2 percent of accuracy over 3.5 decades. IEEE J. Solid-State Circuits 41(9), 2107–2114.

Basu A and Hasler PE. 2011. A fully integrated architecture for fast and accurate programming of floating gates over six decades of current. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19(6), 953–959.

Basu A, Brink S, Schlottmann C, Ramakrishnan S, Petre C, Koziol S, Baskaya F, Twigg CM and Hasler P. 2010a. A floating-gate based field programmable analog array. IEEE J. Solid-State Circuits 45(9), 1781–1794.

Basu A, Ramakrishnan S, Petre C, Koziol S, Brink S, and Hasler PE. 2010b. Neural dynamics in reconfigurable silicon. IEEE Trans. Biomed. Circuits Syst. 4(5), 311–319.

Brink S, Nease S, Hasler P, Ramakrishnan S, Wunderlich R, Basu A, and Degnan B. 2013. A learning-enabled neuron array IC based upon transistor channel models of biological phenomena. IEEE Trans. Biomed. Circuits Syst. 7(1), 71–81.

Chawla R, Bandyopadhyay A, Srinivasan V, and Hasler P. 2004. A 531 nW/MHz, 128 × 32 current-mode programmable analog vector-matrix multiplier with over two decades of linearity. Proc. IEEE Custom Integrated Circuits Conf. (CICC), pp. 651–654.

Chawla R, Twigg CM, and Hasler P. 2005. An analog modulator/demodulator using a programmable arbitrary waveform generator. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), pp. 6106–6109.

Duffy C and Hasler P. 2003. Modeling hot-electron injection in pFETs. J. Comput. Electronics 2, 317–322.

Dugger J, Smith PD, Kucic M, and Hasler P. 2012. An analog adaptive beamforming circuit for audio noise reduction. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 5293–5296.

Fast Analog Solutions Ltd. 1999. TRAC020LH Datasheet: Totally Re-configurable Analog Circuit - TRAC® issue 2, Oldham, UK.

Frantz G. 2000. Digital signal processor trends. IEEE Micro 20(6), 52–59.

George S and Hasler P. 2011. HMM classifier using biophysically based CMOS dendrites for wordspotting. Proc. IEEE Biomed. Circuits Syst. Conf. (BIOCAS), pp. 281–284.

Graham DW, Smith PD, Ellis R, Chawla R, and Hasler PE. 2004. A programmable bandpass array using floating-gate elements. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) I, pp. 97–100.

Graham DW, Hasler P, Chawla R, and Smith PD. 2007. A low-power, programmable bandpass filter section for higher-order filter applications. IEEE Trans. Circuits Syst. I: Regular Papers 54(6), 1165–1176.

Hall T, Hasler P, and Anderson DV. 2002. Field-programmable analog arrays: a floating-gate approach. In: Field-Programmable Logic and Applications: Reconfigurable Computing Is Going Mainstream (eds. Glesner M, Zipf P, and Renovell M). Lectures Notes in Computer Science, vol. 2438. Springer Berlin, Heidelberg. pp. 424–433.

Hall T, Twigg C, Hasler P, and Anderson DV. 2004. Application performance of elements in a floating-gate FPAA. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) 2, pp. 589–592.

Hall T, Twigg C, Gray JD, Hasler P, and Anderson DV. 2005. Large-scale field-programmable analog arrays for analog signal processing. IEEE Trans. Circuits and Syst. I 52(11), 2298–2307.

Hasler P. 2005. Floating-gate devices, circuits, and systems. Proc. Fifth Int. Workshop on System-on-Chip for Real-Time Applications, pp. 482–487.

Hasler P and Anderson DV. 2002. Cooperative analog-digital signal processing. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) IV, pp. 3972–3975.

Hasler P and Dugger J. 2005. An analog floating-gate node for supervised learning. IEEE Trans. Circuits Syst. I: Regular Papers 52(5), 835–845.

Hasler P and Lande TS. 2001. Overview of floating-gate devices, circuits, and systems. IEEE Trans. Circuits Syst. II 48(1), 1–3.

Hasler P, Diorio C, Minch BA, and Mead CA. 1995. Single transistor learning synapses. In: Advances in Neural Information Processing Systems 7 (NIPS) (eds. Tesauro G, Touretzky D, and Leen T). MIT Press, Cambridge, MA. pp. 817–824.

Hasler P, Minch BA, Diorio C, and Mead CA. 1996. An autozeroing amplifier using pFET hot-electron injection. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) III, pp. 325–328.

Hasler P, Minch BA, and Diorio C. 1999. Adaptive circuits using pFET floating-gate devices. Proc. 20th Anniversary Conf. Adv. Res. VLSI, pp. 215–229, Atlanta, GA.

Hasler P, Smith P, Duffy C, Gordon C, Dugger J, and Anderson DV. 2002a. A floating-gate vector-quantizer. Proc. 45th Int. Midwest Symp. Circuits Syst. 1, Tulsa, OK. pp. 196–199.

Hasler P, Bandyopadhyay A, and Smith P. 2000b. A matrix transform imager allowing high fill factor. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) III, pp. 337–340.

Hasler P, Bandyopadhyay A, and Anderson DV. 2003. High-fill factor imagers for neuromorphic processing enabled by floating-gate circuits. EURASIP J. Adv. Signal Process. 2003, 676–689.

Hasler P, Smith PD, Graham D, Ellis R, and Anderson DV. 2005. Analog floating–gate, on–chip auditory sensing system interfaces. IEEE Sensors J. 5(5), 1027–1034.

Hasler P, Scholttmann C, and Koziol S. 2011. FPAA chips and tools as the center of an design-based analog systems education. Proc. IEEE Int. Conf. Microelectronic Syst. Education, pp. 47–51.

Koziol S, Schlottmann C, Basu A, Brink S, Petre C, Degnan B, Ramakrishnan S, Hasler P, and Balavoine A. 2010. Hardware and software infrastructure for a family of floating-gate based FPAAs. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), pp. 2794–2797.

Kucic M. 2004. Analog Computing Arrays. PhD thesis Georgia Institute of Technology Atlanta, GA.

Kucic M, Hasler P, Dugger J, and Anderson DV. 2001. Programmable and adaptive analog filters using arrays of floating-gate circuits. In: Proceedings of 2001 Conference on Advanced Research in VLSI (eds. Brunvand E and Myers C). IEEE. pp. 148–162.

Lee J, Bandyopadhyay A, Baskaya IF, Robucci R, and Hasler P. 2005. Image processing system using a programmable transform imager. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) 5, pp. 101–104.

Lee KFE and Gulak PG. 1991. A CMOS field-programmable analog array. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 186–188.

Li Y, Lee S, Fong Y, Pan F, Kuo TC, Park J, Samaddar T, Nguyen HT, Mui ML, Htoo K, Kamei T, Higashitani M, Yero E, Kwon G, Kliza P, Wan J, Kaneko T, Maejima H, Shiga H, Hamada M, Fujita N, Kanebako K, Tam E, Koh A, Lu I, Kuo CCH, Pham T, Huynh J, Nguyen Q, Chibvongodze H, Watanabe M, Oowada K, Shah G, Woo B, Gao R, Chan J, Lan J, Hong P, Peng L, Das D, Ghosh D, Kalluru V, Kulkarni S, Cernea RA, Huynh S, Pantelakis D, Wang CM, and Quader K. 2009. A 16 Gb 3-bit per cell (X3) NAND flash memory on 56 nm technology with 8 MB/s write rate. IEEE J. Solid-State Circuits 44(1), 195–207.

Marotta GG, Macerola A, D’Alessandro A, Torsi A, Cerafogli C, Lattaro C, Musilli C, Rivers D, Sirizotti E, Paolini F, Imondi G, Naso G, Santin G, Botticchio L, De Santis L, Pilolli L, Gallese ML, Incarnati M, Tiburzi M, Conenna P, Perugini S, Moschiano V, Di Francesco W, Goldman M, Haid C, Di Cicco D, Orlandi D, Rori F, Rossini M, Vali T, Ghodsi R, and Roohparvar F. 2010. A 3 bit/cell 32 Gb NAND flash memory at 34 nm with 6 MB/s program throughput and with dynamic 2 b/cell blocks configuration mode for a program throughput increase up to 13 MB/s. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 444–445.

Marr B, Degnan B, Hasler P, and Anderson D. 2012. Scaling energy per operation via an asynchronous pipeline. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(1), 147–151.

Mead CA. 1990. Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–36.

Minch BA, Hasler P and Diorio C. 2001. Multiple-input translinear element networks. IEEE Trans. Circuits Syst. II 48(1), 20–28.

Ozalevli E, Huang W, Hasler PE, and Anderson DV. 2008a. A reconfigurable mixed-signal VLSI implementation of distributed arithmetic used for finite impulse response filtering. IEEE Trans. Circuits Syst. I: Regular Papers 55(2), 510–521.

Ozalevli E, Lo HJ, and Hasler PE. 2008b. Binary-weighted digital-to-analog converter design using floating- gate voltage references. IEEE Trans. Circuits Syst. I: Regular Papers 55(4), 990–998.

Peng SY, Minch B, and Hasler P. 2005. Programmable floating-gate bump circuit with variable width. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) 5, pp. 4341–4344.

Peng SY, Hasler P, and Anderson DV. 2007. An analog programmable multi-dimensional radial basis function based classifier. IEEE Trans. Circuits Syst. I: Regular Papers 54(10), 2148–2158.

Quan X, Embabi SHK, and Sanchez-Sinencio E. 1998. A current-mode based field programmable analog array architecture for signal processing applications. Proc. IEEE Custom Integrated Circuits Conf. (CICC), Santa Clara, CA, pp. 277–280.

Ramakrishnan S and Hasler J. 2013. A compact programmable analog classifier using a VMM + WTA network. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 2538–2542.

Ramakrishnan S, Wunderlich R, and Hasler PE. 2012. Neuron array with plastic synapses and programmable dendrites. Proc. IEEE Biomed. Circuits Syst. Conf. (BIOCAS), pp. 400–403.

Sarpeshkar R 1997. Efficient Precise Computation with Noisy Components: Extrapolating From an Electronic Cochlea to the Brain. PhD thesis. California Institute of Technology, Pasadena, CA.

Schemmel J, Meier K, and Mueller E. 2004. A new VLSI model of neural microcircuits including spike time dependent plasticity. Proc. IEEE Int. Joint Conf. Neural Networks (IJCNN), 3, pp. 1711–1716.

Schlottmann CR and Hasler P. 2012. FPAA empowering cooperative analog-digital signal processing. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 5301–5304.

Schlottmann CR and Hasler P. 2011. A highly dense, low power, programmable analog vector-matrix multiplier: the FPAA implementation. IEEE J. Emerg. Select. Top. Circuits Syst. 1(3), 403–41.

Schlottmann CR, Abramson D, and Hasler P. 2012a. A MITE-based translinear FPAA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2(1), 1–9.

Schlottmann CR, Shapero S, Nease S, and Hasler P. 2012b. A digitally enhanced dynamically reconfigurable analog platform for low-power signal processing. IEEE J. Solid-State Circuits 47(9), 2174–2184.

Serrano G, Smith P, Lo HJ, Chawla R, Hall T, Twigg C, and Hasler P. 2004. Automatic rapid programming of large arrays of floating-gate elements. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) I, pp. 373–376.

Shibata T and Ohmi T. 1992. A functional MOS transistor featuring gate-level weighted sum and threshold operations. IEEE Trans. Elect. Devices 39(6), 1444–1455.

Smith PD and Hasler P. 2002. Analog speech recognition project Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) 4, pp. 3988–3991.

Smith PD, Kucic M, Ellis R, Hasler P, and Anderson DV. 2002. Cepstrum frequency encoding in analog floating-gate circuitry. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) IV, pp. 671–674.

Srinivasan V, Dugger J, and Hasler P. 2005. An adaptive analog synapse circuit that implements the least-mean-square learning algorithm. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) 5, 4441–4444.

Srinivasan V, Serrano GJ, Gray J, and Hasler P. 2007. A precision CMOS amplifier using floating-gate transistors for offset cancellation. IEEE J. Solid-State Circuits 42(2), 280–291.

Srinivasan V, Serrano GJ, Twigg CM, and Hasler P. 2008. A floating-gate-based programmable CMOS reference. IEEE Trans. Circuits Syst. I: Regular Papers 55(11), 3448–3456.

Wunderlich RB, Adil F, and Hasler P. 2013. Floating gate-based field programmable mixed-signal array. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(8), 1496–1504.

Yoo H, Anderson DV, and Hasler P. 2002. Continuous–time audio noise suppression and real-time implementation Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) IV, pp. 3980–3983.

Zbrzeski A, Hasler P, Kölbl F, Syed E, Lewis N, and Renaud S. 2010. A programmable bioamplifier on FPAA for in vivo neural recording. Proc. IEEE Biomed. Circuits Syst. Conf. (BIOCAS), pp. 114–117.

__________

Parts of the text were taken from Hasler (2005). Reprinted with permission from IEEE.
_________________________________________________________________________________
Event-Based Neuromorphic Systems, First Edition.
Edited by Shih-Chii Liu, Tobi Delbruck, Giacomo Indiveri, Adrian Whatley, and Rodney Douglas. © 2015 John Wiley & Sons, Ltd. Published 2015 by John Wiley & Sons, Ltd.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset