Ken Takeuchi1 and An Chen2
1Chuo University, Japan
2GLOBALFOUNDRIES Inc., USA
Ferroelectric materials have been utilized in commercial memories, for example, ferroelectric random access memory (FRAM), where a transistor-accessed ferroelectric capacitor store information in polarization directions [1–3]. Ferroelectric dielectrics can also be integrated in the gate stack of a field effect transistor (FET) whose channel conductance can be switched by modulation of the ferroelectric polarization, also known as a “Ferroelectric FET” (FeFET) [3–9]. Recently, the concept of ferroelectric tunnel junction (FTJ) was proposed as a memory element based on a very thin layer of ferroelectric tunnel barrier between two metal electrodes [10,11]. The tunneling current can be switched by ferroelectric reversal of the thin tunnel barrier. Figure 6.1 illustrates the three types of ferroelectric memories. Both FRAM and FTJ as memory devices combine an access transistor and a storage node, that is, 1-transistor-1-capacitor (1T1C) for FRAM and 1-transistor-1-resistor (1T1R) for FTJ. However, FeFET is an 1T memory, similar to Flash memory devices. FTJ will be discussed in detail in Chapter 9; this chapter focuses on FeFET.
FeFET has attracted a lot of attention for both memory and logic applications. The nonvolatile memory applications of FeFET are based on the remnant polarization in absence of an external field and the reversibility of polarization direction under an applied field. By coupling the ferroelectric polarization directly to the channel of a FET, FeFET enables a capacitor-less ferroelectric memory with simplified device design and nondestructive readout. However, analysis has also shown that FeFET retention can be degraded by the presence of a depolarization field and gate leakage [7]. The performance of FeFET is strongly affected by the interface between ferroelectric gate and semiconductor channel. Insertion of a buffer layer (e.g., high-κ), that is, a metal/ferroelectric/insulator/semiconductor, may improve the interface properties.
For stand-alone memory applications, a ferroelectric NAND (Fe-NAND) flash memory has been developed to achieve low power consumption, high reliability, and high scalability [17–21]. The Fe-NAND flash memory is composed of serially connected MFIS (Metal Ferroelectric Insulator Semiconductor) transistors.
For logic applications, it has been suggested that a FET with a ferroelectric gate stack may achieve negative capacitance and a sub-threshold slope below 60 mV/dec [12–15]. A FeFET-based SRAM with self-adjusted VTH was also developed to improve the static noise margin in a scaled SRAM with low VDD [16].
This chapter is organized in three main sections. Fe-NAND will be discussed in detail as an example of FeFET applications in standalone memory in Section 6.2. FeFET-based SRAM will be covered in Section 6.3. Section 6.4 addresses some system-level considerations with Fe-NAND memory in solid state drive (SSD) systems.
Recently, SSD based on NAND flash memories was adopted in mobile and PC applications. SSD provides exceptional bandwidth, random I/O performance superior to a hard disk drive (HDD), power saving, and improved system reliability. SSD is also promising in enterprise applications including data centers [22,23]. In the last five years, as the data traffic on the internet rapidly increased, the power consumption at data centers in the United States doubled to a level corresponding to the output of five nuclear power plants. The Fe-NAND flash memory is suitable for enterprise SSD owing to its low power operation and high endurance.
The Fe-NAND flash memory is made up of serially connected MFIS, as shown in Figure 6.2a. The MFIS device in Figure 6.2b utilizes a ferroelectric SrBi2Ta2O9 (SBT) and Hf-Al-O (HAO) buffer layer in the gate stack to overcome the retention loss issues in FeFET [6]. The SBT/HAO interface is chemically stable and gate leakage is suppressed to below 1 nA/cm2. The HAO/Si interface has high quality as the channel of the transistor. Data retention is measured up to 33 days.
Ferroelectric FETs are in principle scalable below 10 nm to crystal unit-cell size because data are stored in the polarization directions of a ferroelectric gate insulator. The program and erase operation conditions are illustrated in Figure 6.3c. With program/erase pulses of 6 V amplitude and 10 μs width, a 0.5 V VTH window is realized (Figure 6.3). This is significantly lower than the operation voltage of a conventional floating-gate NAND (FG-NAND) flash (∼ 20 V), which contributes to the lower operation power of Fe-NAND. Fe-NAND also achieved much longer cycling endurance, for example, 108 cycles in Figure 6.2d in comparison to ∼ 104 cycles in FG-NAND flash [17].
While the lower operation voltages of Fe-NAND helps to reduce operation power, the difference between the program voltage and the read voltage also becomes smaller and consequently read/program disturbance may increase. To reduce disturbance, a conventional NAND flash memory could adopt a negative VTH cell scheme [24], where the middle between the VTHs of erased and programmed cells is negative, as shown in Figure 6.4a. However, if the negative VTH cell scheme is applied to the Fe-NAND flash memory, the data retention drastically degrades due to the depolarization field in the ferroelectric layer. Therefore, a zero VTH memory cell scheme has been proposed for Fe-NAND flash memory to achieve both long retention and strong resistance against read/program disturbance, as shown in Figure 6.4b. By adopting this scheme, the measured VTH shift due to read disturb, program disturb, and retention loss is reduced by 32, 24 and 10%, respectively [19]. In comparison, the negative VTH cell scheme results in VTH shift as much as 192% due to retention loss. The positive VTH cell scheme, where the middle between the VTHs of erased and programmed cells is positive, suffers severe read and program disturbance.
A Fe-NAND flash memory with a nonvolatile (NV) page buffer is also proposed [18]. A critical problem of SSD is slow random write. The write unit in a NAND flash memory is a page with typical size of 4–8 KB, which is usually composed of memory cells sharing a word-line. A page is written only once to avoid a program disturb. The large page size is acceptable for digital camera, MP3 player, and camcorder application because their data size is typically over 1 MB and multiple pages are sequentially programmed. However, for a PC and data center application, the minimum write unit of the operating system (OS) is a sector of 512 Bytes. 50% of data written by OS are less than eight sectors, that is, 4 KB. A random write of data smaller than a page size frequently happens. In the case when an OS writes one sector in SSD, the remaining 80% of the page becomes garbage. As garbage accumulates, a garbage collection is performed to increase workable memory capacity. The garbage collection takes as much as 100 ms [25], which is 100 times longer than a page programming time of ∼ 800 μs and thus causes serious performance degradation. As memory cell size scales down, more cells are connected to each word-line and the page size increases, which widens the discrepancy between the page size and sector size and causes more SSD performance degradation.
To solve the random write issue, a batch write algorithm is proposed, as shown in Figure 6.5. A page buffer in the Fe-NAND flash memory temporarily stores program data. To avoid a random write, the memory programming starts only after the data to be programmed accumulate to the page size in a page buffer. In Step 1, when the OS issues one sector (512 Bytes) “write” command to the SSD, data are stored in page buffers without programming to memory cells. The NAND controller reports to the OS that the write is completed although no memory cell programming actually occurs. In Step 2, when the OS issues the second sector “write” command, the second data are also temporarily stored in the page buffer. The process continues until the data in the page buffer reach the page size (Step 3), and then memory cell programming starts in Step 4. As the logical address issued by the OS and the actually written NAND physical address are different, the NAND controller updates the logical–physical address mapping table. The batch write algorithm eliminates the data fragmentation of a page during the random write. Considering the sequential write of the SSD is over ten times faster than the random write [26], the batch write algorithm can significantly increase the SSD speed.
One problem with the batch write algorithm is data loss during a power outage. If a power outage happens in Step 4 of Figure 6.5, all data in the page buffers are lost since page buffers are volatile latch circuits. The computer system would fail because the OS recognizes that the program in steps 1–3 has been completed successfully. High reliability against a power outage is essential in an enterprise SSD. To solve this problem, a nonvolatile (NV) page buffer is proposed. The NV-page buffer consists of a volatile latch and a NV-latch, as shown in Figure 6.6. The NV-latch consists of one NMOS and one ferroelectric NMOS. The NV-latch is realized with no additional process because it has the same gate stack structure as memory cells. The area penalty for the NV-latch is less than 1% of the die size as only two transistors are added. In addition, unlike memory cells where the well is biased at 6 V, the well of the ferroelectric transistor in the NV-latch is fixed at 0 V and thus is located in the same well as the other NMOS. In normal operations, the memory cell is programmed based on data in the latch, similar to the conventional NAND flash memory. When the power outage occurs, data in the latch is transferred to the NV-latch to avoid a data loss. When power is restored, the data in the NV-latch is copied to the latch and the memory cell programming is performed based on the data in the latch. An important benefit of the Fe-NAND is that the NV-page buffer can be implemented in the peripheral circuits without any additional process cost because the Fe-FET structure is already implemented for the memory cells.
The active power consumption of digital circuits is proportional to f × C × VDD2; therefore, decreasing supply voltage (VDD) is essential for reducing the power consumption of CPU and system on chip (SoC). To realize a low voltage/power CPU and SoC, a very low supply voltage (e.g., 0.5 V) is required, which however presents a challenge for SRAM noise margin. As shown in Figure 6.7, the static noise margin (SNM) of SRAM is represented by the diagonal length of the largest square in the butterfly curves of two inverters in a SRAM cell. As VDD decreases, SNM is also reduced. At a high VDD, SNM is so large that the hold/read of the SRAM is stable. As the feature size of SRAM decreases to sub-30 nm, the VTH variation due to random dopant fluctuation (RDF) significantly increases. At VDD of 0.5 V, the SNM of conventional SRAM decreases to almost zero and the conventional SRAM can no longer operate as shown in Figure 6.7 [27]. Because decreasing VDD of SRAM is difficult in the nano-scale CMOS, the VDD of CPU and SoC cannot be further lowered.
To overcome this problem, a 6T-SRAM with Fe-FETs has been proposed [16]. It has the same structure as a conventional SRAM but uses six FeFETs instead. The Fe-FET has the same structure as the MFIS in Section 6.2 with a ferroelectric SBT layer integrated in a gate stack of standard CMOS transistors with the metal gate (Pt) and a high-κ HfAlO buffer layer between the SBT and the Si substrate.
The VTH of Fe-FETs changes by controlling the electric field between the gate and the body/channel, as shown in Figure 6.8. For example, VGB of 0.5 V applied on a NMOS forces positive polarization near the channel and decreases VTH, and a reversed VGB would increase VTH. The opposite trend VTH self-adjustment also occurs in PMOS.
By biasing the bodies of the NMOS and the PMOS to VDD and VSS, respectively, the VTH of Fe-FETs automatically shifts to increase SNM. Figure 6.9 illustrates SRAM storing data “0” (a) and data “1” (b), with the directions of VTH adjustment of the pull-up and pull-down transistors marked. The VTH adjustments shift the I-V characteristics of the left and right inverters. For example, when the SRAM holds data “0,” the trip point of the right inverter decreases and that of the left inverter increases. As a result, the storage nodes V1 and V2 become more likely to hold 0 V and 0.5 V, respectively. Therefore, holding data “0” become more stable, that is, SNM is improved. A similar analysis can be done on data “1.”
Monte Carlo simulation shows 60% increase of SNM, as shown in Figure 6.10 [16]. The measured SNM on a fabricated FeFET SRAM chip is plotted in (a) for data (V1, V2) = (Low, High) and (b) for data (V1, V2) = (High, Low). A large SNM of 1.46 V is demonstrated, which is even larger than the 1.27 V SNM of an ideal inverter with infinitely steep slope [16].
The on-current of the read NMOS determines the read speed. During the read operation, the VTH of the read transistor is automatically set to the low value. As a result, the read cell current increases to enhance the read speed, as shown in Figure 6.11.
The sub-threshold current of the off-state FETs shown in Figure 6.12 determines the leakage of the SRAM cell. During the stand-by operation, the VTH of these transistors is automatically set to high values to reduce leakage current. Leakage reduction of 42% is demonstrated by measurement [16].
The enlarged SNM in the ferroelectric SRAM realizes the VDD scaling of 0.11 V and decreases the active power by 32% [16]. Since the transistor count is minimized to six, similar to a conventional SRAM, the ferroelectric SRAM cell also achieves the smallest cell size.
The write speed in SSD can be enhanced by increasing the number of NAND chips (NNAND) written in parallel; however, NNAND is limited by the chip power consumption in the given SSD power budget because current is also increased in proportion to NNAND [23]. Therefore, minimizing NAND power consumption is essential for realizing high-speed SSD. As the feature size of NAND flash memory decreases, the total bit-line capacitance increases and contributes significantly to the power consumption. The best strategy to decrease power consumption is by reducing VCC. However, in conventional FG-NAND, decreasing VCC below 2 V results in an increasing total power consumption because more charge pump stages are needed (to boost VCC to 20 V for programming). Increasing power consumed by charge pumps surpasses the power saving by lowering VCC.
Fe-NAND can be programmed with a much lower voltage, which helps to suppress the power consumption of the charge pump. As shown in Figure 6.13, NNAND of the Fe-NAND is maximized around VCC = 1.0 V. Higher VCC increases the bit-line charging current and the total power consumption, which results in lower NNAND under power constraint. Lower VCC increases the required charge pump stages and the total power consumption, which also reduces NNAND. The optimal NNAND of the Fe-NAND is 6.9 times higher than that of the FG-NAND. Notice that the NNAND of FG-NAND optimizes at higher VCC because of the higher programming voltage of FG-NAND.
The improved NNAND in FeNAND translates directly into a better performance of the FeNAND-based SSD. As shown in Figure 6.14, Fe-NAND Flash SSD achieves a throughput of 9.5 GB/s at VCC = 1.0 V, significantly higher than that of FG-NAND SSD (below 2 GB/s).
FeFET has demonstrated promising potential for both standalone flash memory applications and embedded logic/SRAM applications. The Fe-NAND flash memory discussed in this chapter achieves a much lower operation voltage (∼ 6 V) than that of conventional FG-NAND (∼ 20 V), which not only reduces power consumption but also improves reliability (from 104 to 108 cycles). For SRAM/logic applications, the ferroelectric SRAM has a unique configuration to adjust VTH by biasing the bodies of NMOS and PMOS to VDD and VSS. During the read and the hold, the VTH of Fe-FETs automatically changes to increase SNM by 60%. In the standby mode, the VTH of the FeFET SRAM cell increases to decrease the leakage current by 42%. During reading, the VTH of the read transistor decreases to increase the cell read current for a fast reading speed. During writing, the VTH of the SRAM cell dynamically changes and assists the cell data to flip. The enlarged SNM enables VDD reduction by 0.11 V, which decreases the active power by 32%. The ferroelectric SRAM also minimizes footprint.
SSD based on Fe-NAND benefits significantly from the low operation voltage of Fe-NAND. A lower supply voltage can be used without incurring higher power consumption from charge pumps. The 86% power reduction in Fe-NAND increases the number of NAND chips written in parallel in SSD by 6.9 times and enhances the SSD performance up to 9.5 GB/s.
The performance of FeFET devices and systems hinges on high-quality materials and integration technologies. The SBT/HAO gate stack utilized in the MFIS and ferroelectric SRAM structures in this chapter plays a critical role to enable functional FeFETs with good data retention. Integration of ferroelectric materials in a standard CMOS process has always been a challenge. A recent discovery of ferroelectric property in Si-doped HfO2 allows fabrication of FeFETs at a highly scaled node in the state of the art CMOS process [30]. A 28 nm FeFET with TiN/Si:HfO2/SiO2/Si gate stack has been demonstrated [31]. A 0.9 V memory window was achieved with ±5 V 20 ns program/erase pulses. This fabrication-friendly material may provide a promising solution for ferroelectric devices including FeFET.
Table 6.1 summarizes some key parameters of FeFET memories reported in the literature.
Table 6.1 Summary of key FeFET memory parameters
Parameter | Value | |
Device structure | 1-Transistor (1 T) | |
Feature size F | Demonstrated | 28 nm [31] |
Projected | The same as CMOS transistor | |
Cell size | Demonstrated | 4F2 [32] |
Projected | 4F2 | |
W/E speed | Demonstrated | 10 ns [33] |
Projected | Ferroelectric switching time<100 ps | |
W/E operation voltage | Demonstrated | ±5 V (20 ns pulse) [31] |
Projected | — | |
Write energy per bit | Demonstrated | 1 fJ [34] |
Projected | 0.1 fJ | |
Cycling endurance | Demonstrated | 1012 [6] |
Projected | > 1012 | |
Retention | Demonstrated | 33 d [35] |
Projected | > 10 yr |