Figure 1.1 |
History of semiconductor transistors and logic styles |
2 |
Figure 1.2 |
Evolutionary paths of the microelectronics industry |
3 |
Figure 1.3 |
Interconnect system composed of groups of local, semi-global, and global layers. The metal layers in each group are typically of different thickness |
4 |
Figure 1.4 |
Repeaters are inserted at specific distances to improve the interconnect delay |
5 |
Figure 1.5 |
Interconnect shielding to improve signal integrity, (A) single-sided shielding, and (B) double-sided shielding. The shield and signal lines are, respectively, illustrated by the gray and white color |
5 |
Figure 1.6 |
Cross-section of a joint MOS (JMOS) inverter |
7 |
Figure 1.7 |
Reduction in wirelength where the original 2-D circuit is composed of two and four tiers |
8 |
Figure 1.8 |
Heterogeneous 3-D SoC comprising sensors and processing tiers |
9 |
Figure 2.1 |
Three-dimensional stacked inverter |
16 |
Figure 2.2 |
Examples of SiP technologies. (A) Wire-bonded SiP, (B) solder balls at the perimeter of the tiers, (C) area array vertical interconnects, and (D) interconnects on the faces of the SiP |
17 |
Figure 2.3 |
Different communication schemes for 3-D ICs. (A) Short TSVs, (B) inductive coupling, and (C) capacitive coupling |
18 |
Figure 2.4 |
Typical SoP, which can include both SiP and SoCs |
19 |
Figure 2.5 |
Manufacturing and design challenges for 3-D integration |
20 |
Figure 2.6 |
System miniaturization through the integration of sophisticated 3-D ICs |
21 |
Figure 2.7 |
Wire-bonded SiP. (A) Dissimilar dies with multiple row bonding, (B) wire-bonded stack delimited by spacer, (C) SiP with die-to-die and die-to-package wire bonding, and (D) top view of wire-bonded SiP |
22 |
Figure 2.8 |
SiP with peripheral connections. (A) solder balls, (B) through hole via and spacers, and (C) through hole via in a PCB frame structure |
24 |
Figure 2.9 |
Basic manufacturing phases of an SiP. (A) Interposer bumping and solder ball deposition, (B) die attachment, (C) tier stacking, and (D) epoxy underfill for enhanced reliability |
25 |
Figure 2.10 |
Cross-section of the SiP after removing the mold. (A) The SiP encapsulated in epoxy resin, (B) sawing to expose the metal traces, and (C) sawing to expose the bonding wires |
28 |
Figure 2.11 |
Interposer-based 2.5-D systems where (A) ICs are mounted on only one face of the interposer (single side), and (B) ICs are attached to both sides of the interposer (double side) |
29 |
Figure 2.12 |
Embedded interposer within a package substrate enabling multi-IC integration |
31 |
Figure 2.13 |
An SiP system comprising (A) a top IC with Cu pillars and a bottom IC with solder bumps on a TSH interposer, and (B) the dimensions of several components (not shown to scale) |
33 |
Figure 3.1 |
Typical interconnects paths for (A) wire-bonded SiP, (B) SiP with solder balls, and (C) 3-D IC with TSVs |
38 |
Figure 3.2 |
Cross-section of a stacked 3-D IC with a planarized heat shield to avoid degradation of the transistor characteristics on the first layer due to the temperature of the fabrication processes |
40 |
Figure 3.3 |
Cross-section of a device level stacked 3-D IC with a PMOS device on the bottom layer and an NMOS device in recrystallized silicon on the second layer |
40 |
Figure 3.4 |
Processing steps for laterally crystallized TFT based on Ge-seeding. (A) Deposition of amorphous silicon, (B) creating seeding windows, (C) deposition of seeding materials, (D) producing silicon islands, and (E) processing of TFTs |
41 |
Figure 3.5 |
Processing steps for vertical and lateral growth of 3-D SOI devices. (A) Definition of SOI islands, (B) silicon dioxide deposition, (C) formation of SEG window, (D) silicon growth within the SEG window, (E) etching of redundant silicon with CMP, (F) definition of upper device layer, (G) deposition of upper layer, (H) formation of SOI islands on the upper tier |
43 |
Figure 3.6 |
Basic processing steps for a 3-D inverter utilizing the local clustering approach. (A) Oxide deposition, (B) wafer patterning and active area definition, (C) low temperature oxide deposition, (D) deposition of nitride film, (E) via formation at the drain side, (F) etching of the nitride film, (G) boron doping, (H) active area definition, (I) gate oxide growth by thermal oxidation, (J) deposition of doped polysilicon |
44 |
Figure 3.7 |
Sequential process for fabricating monolithic 3-D circuits, where (A) the SOI devices in the first layer are manufactured with standard SOI processes, (B) molecular bonding allows the transfer of a high quality substrate, (C) the devices in the upper layers are formed, and (D) metal contacts connect the device layers |
45 |
Figure 3.8 |
Monolithically stacked devices where the interlayer contacts (3-D contact) and the standard metal contacts (tungsten plug) connecting the devices are illustrated. The 3-D contact has similar traits to a standard contact connecting two metal layers |
46 |
Figure 3.9 |
Typical fabrication steps for a 3-D IC process. (A) Wafer preparation, (B) TSV etching, (C) wafer thinning, bumping, and handle wafer attachment, (D) wafer bonding, and (E) handle wafer removal |
49 |
Figure 3.10 |
The cavity alignment method, (A) the cavity template is aligned and bonded to the substrate, (B) the individual tiers are placed in the cavity through compression, (C) the 3-D stack is assmbled through thermal compression, and (D) the cavity template is removed |
52 |
Figure 3.11 |
Process for face-to-face bonding and substrate assembly, removing the need for TSVs |
53 |
Figure 3.12 |
Metal-to-metal bonding; (A) square bumps, and (B) conic bumps for improved bonding quality |
54 |
Figure 3.13 |
Capacitively coupled 3-D IC. The large plate capacitors are utilized for power transfer, while the small plate capacitors provide signal propagation |
56 |
Figure 3.14 |
Inductively coupled 3-D ICs. Galvanic connections may be used for power delivery |
57 |
Figure 3.15 |
Basic steps of a via-last manufacturing process (not to scale) |
59 |
Figure 3.16 |
Basic steps of a via-first manufacturing process (not to scale) |
59 |
Figure 3.17 |
Basic steps of a via-middle manufacturing process (not to scale) |
60 |
Figure 3.18 |
TSV formation and filling after FEOL (wafer thinning) and BEOL (via-last approach) |
61 |
Figure 3.19 |
TSV shapes. (A) Straight and (B) tapered |
62 |
Figure 3.20 |
The scallops formed due to the time multiplexed nature of the BOSCH process |
62 |
Figure 3.21 |
Poor TSV filling resulting in void formation, (A) large void at the bottom, and (B) seam void |
63 |
Figure 3.22 |
Structure of partial TSV and related materials |
64 |
Figure 4.1 |
Early TSV from patents filed by (A) William Shockley, and (B) Merlin Smith and Emanuel Stern of IBM |
68 |
Figure 4.2 |
Equivalent π-model of a TSV |
69 |
Figure 4.3 |
Top view of a TSV in silicon depicting the oxide layer, TiCu seed layer, and copper TSV |
71 |
Figure 4.4 |
3-D via structure. (A) 3-D via with top and bottom copper landings, and (B) equivalent structure without metal landings |
72 |
Figure 4.5 |
Current profile due to the proximity effect for (A) currents propagating in opposite directions, and (B) currents flowing in the same direction |
76 |
Figure 4.6 |
Cross-sectional view of different CMOS technologies with TSVs depicting the formation of a depletion region around the TSV in (A) bulk CMOS, and (B) bulk CMOS with a p+ buried layer. The TSVs in either PD-SOI (shown in (C) top) or FD-SOI (shown in (C) bottom) reveal minimal formation of a depletion region |
80 |
Figure 4.7 |
Ratio of the total TSV capacitance to the oxide capacitance as a function of applied voltage Vg |
81 |
Figure 4.8 |
Physical parameters and materials used in the compact models of a TSV for a single device layer, as listed in Table 4.6. (A) Side view, and (B) top view |
83 |
Figure 4.9 |
Resistance of a cylindrical 3-D via at DC, 1 GHz, and 2 GHz |
86 |
Figure 4.10 |
Per cent error as a function of frequency for the resistance of a 3-D via (a.r.=aspect ratio) |
87 |
Figure 4.11 |
Self-inductance L11 of a cylindrical 3-D via |
93 |
Figure 4.12 |
Mutual inductance L21 of a cylindrical 3-D via with a 20 µm diameter |
94 |
Figure 4.13 |
Mutual inductance L21 between two 3-D vias with different lengths (D=10 µm, and 3Lg=Ly) |
95 |
Figure 4.14 |
Capacitance of a cylindrical 3-D via over a ground plane |
101 |
Figure 4.15 |
Coupling capacitance between two 3-D vias over a ground plane (D=20 µm) |
103 |
Figure 4.16 |
Frequency range applicable to Q3D models and closed-form inductance expressions |
106 |
Figure 4.17 |
Critical dimensions of a 3-D via over a ground plane for the MITLL 3-D process |
108 |
Figure 4.18 |
Circuit model for RLC extraction of (A) two 3-D vias, and (B) two 3-D vias with a shield via between two signal vias |
111 |
Figure 4.19 |
Effect of a return path on the loop inductance. (A) Return path placed on 3-D via 2, and (B) return path placed on 3-D via 3 |
114 |
Figure 5.1 |
Heterogeneous 3-D integrated circuit |
120 |
Figure 5.2 |
Model of noise coupling from TSV to a victim device through a silicon substrate. (A) General model, and (B) reduced model |
121 |
Figure 5.3 |
Noise coupling from a TSV to a victim device. (A) Short-circuit Ge substrate model, and (B) open circuit GaAs substrate model |
123 |
Figure 5.4 |
Isolation efficiency of a noise coupled system for different substrate materials. (A) Silicon, (B) germanium, and (C) gallium arsenide |
125 |
Figure 5.5 |
Equivalent small-signal model of a noise coupled system |
126 |
Figure 5.6 |
Resistance and inductance versus line width of the ground network. The ground network is composed of copper-based interconnects |
128 |
Figure 5.7 |
Isolation efficiency of a noise coupled system as a function of the line width of the ground network for different substrate materials. (A) Silicon, (B) germanium, and (C) gallium arsenide |
129 |
Figure 5.8 |
Distance from aggressor module “A” on tier m to victim module “V” on tier n |
130 |
Figure 5.9 |
Effect of distance between an aggressor and victim on the isolation efficiency for a Ge substrate. The resonant frequency is observed at the peak isolation efficiency due to the increasing reactance of the ground network |
130 |
Figure 5.10 |
Keep out region around an aggressor TSV. The victim modules (Victim) should be placed outside this region |
131 |
Figure 5.11 |
Isolation efficiency versus frequency and radius of keep out region for different substrate materials. (A) Si, (B) Ge, and (C) GaAs |
132 |
Figure 5.12 |
Keep out region around aggressor TSV for Nmax=−40dB. The victim modules should be placed on the isolation efficiency surface below the base surface |
133 |
Figure 5.13 |
Comparison between SPICE model and extracted transfer function for different substrate materials. (A) Si, (B) Ge, and (C) GaAs |
134 |
Figure 6.1 |
An inductive link between the transmitter and receiver circuits including the coupled on-chip inductors |
139 |
Figure 6.2 |
Equivalent circuit of an inductive link including the parasitic resistance and capacitance of the on-chip inductors |
139 |
Figure 6.3 |
Model for pulse modulation. (A) Current of the transmitter modeled as a Gaussian pulse and (B) voltage induced on the receiver |
140 |
Figure 6.4 |
Coupling efficiency for distance X and decreasing outer diameter dout |
141 |
Figure 6.5 |
Square spiral on-chip inductor with n=7 turns, illustrating the geometric parameters |
142 |
Figure 6.6 |
Flow diagram for the design of the coils in an inductive link under power, performance, and area constraints |
144 |
Figure 6.7 |
Transceiver circuit of a synchronous inductive link |
146 |
Figure 6.8 |
Transceiver circuit of an asynchronous inductive link |
147 |
Figure 6.9 |
Block diagram of an inductive coupling scheme with burst transmission |
148 |
Figure 6.10 |
Efficiency of TSV and inductive interfaces with higher multiplexing density |
151 |
Figure 6.11 |
Top view of a structure comprising an inductive link and the return path through a power delivery network placed in different locations |
154 |
Figure 6.12 |
Noise induced by an inductive pair for varying δc |
154 |
Figure 6.13 |
Array of inductive links and P/G loops connected to C4 supply pads. The power and ground lines are depicted, respectively, by solid and dashed lines |
156 |
Figure 6.14 |
Parasitic noise induced on a power wire depending upon the distance of the wire from the inductor |
157 |
Figure 6.15 |
Power delivery network topologies. (A) Interdigitated P/G–P/G topology, (B) paired type-I P/G–P/G topology, and (C) paired type-II P/P–G/G topology |
157 |
Figure 6.16 |
Wireless power transmission for standard inductive coupling |
160 |
Figure 7.1 |
An example of the method used to determine the distribution of the interconnect length. Group NA includes one gate, group NB includes the gates located at a distance smaller than l (encircled by the dashed curve), and NC is the group of gates at distance l from group NA (encircled by the solid curve). In this example, l=4 (the distance is measured in gate pitches) |
164 |
Figure 7.2 |
An example of the method used to determine the interconnect length distribution in 3-D circuits. (A) Partial Manhattan hemisphere, and (B) cross-section of the partial Manhattan hemisphere along e-e′. The gates in NB and NC are shown, respectively, with light and dark gray tones |
167 |
Figure 7.3 |
Example of starting and nonstarting gates. Gates P and Q can be starting gates while S is a nonstarting gate |
167 |
Figure 7.4 |
Possible vertical interconnections for two cells with each cell containing n gates |
169 |
Figure 7.5 |
Interconnect length distribution for a 2-D and 3-D IC |
170 |
Figure 7.6 |
Variation of gate pitch, total interconnect length, and interconnect power consumption with the number of tiers |
171 |
Figure 8.1 |
Cross-sectional view. (A) TSV middle, and (B) TSV last |
176 |
Figure 8.2 |
Processing flow for TSV middle and TSV last considered in terms of cost and complexity |
177 |
Figure 8.3 |
Comparison of TSV lithography cost for different TSV process flows and geometries. The difference in cost between TSV middle and TSV last is due to different process equipment |
178 |
Figure 8.4 |
Comparison of processing cost of the TSV etching step for different TSV geometries. The processing cost is normalized to the cost of etching a 5×50 TSV middle structure |
179 |
Figure 8.5 |
Comparison of processing cost to deposit the TSV oxide liner for different TSV geometries. The processing cost is normalized to the cost of the liner deposition for a 5×50 TSV middle structure. In the case of TSV middle, the oxide liner at the field of the wafer is removed by CMP. For TSV last, no CMP polishing of the liner is necessary |
180 |
Figure 8.6 |
Comparison of in-via oxide liner etch processing cost for different TSV last geometries. The TSVs with a smaller diameter require longer liner etch processing time. The processing cost is normalized to the process cost of a 5×50 TSV last flow |
181 |
Figure 8.7 |
Cost comparison of barrier seed process for TSV middle geometries. Non-PVD deposition approaches can be applied to TSV sizes of 3×50 and 2×40. The processing cost is normalized to the process cost of a 5×50 TSV middle flow |
182 |
Figure 8.8 |
Cost comparison of barrier seed process for TSV last geometries. PVD processing is applied for all TSV last sizes. The processing cost is normalized to the process cost of 5×50 TSV middle flow |
182 |
Figure 8.9 |
Cost comparison of TSV Cu plating process for both TSV middle and TSV last flows for different TSV geometries. The cost comparison considers processing and material costs and is normalized to the process cost of the 5×50 TSV middle (POR) flow |
183 |
Figure 8.10 |
Cost of Cu CMP for different Cu overburden thicknesses. The fine Cu polish step is the dominant cost component for Cu thicknesses up to 2,000 nm |
183 |
Figure 8.11 |
CMP benchmark of deposition materials used in TSV processing. For each material, a thickness of 100 nm is considered to estimate polishing time and slurry consumption |
184 |
Figure 8.12 |
Cost benchmark of backside processing steps for TSV middle and TSV last flows |
186 |
Figure 8.13 |
Benchmark of overall processing costs for different TSV geometries for both TSV middle and TSV last flows |
186 |
Figure 8.14 |
Cost benchmark of the 5×50 and 10×100 TSV flows. For the 5×50 TSV middle, polishing the oxide liner increases the cost of the CMP step. For the 5×50 TSV last process, the liner deposition, liner etch steps, and backside CMP are the primary cost differentiators. For the 10×100 TSV middle process, polishing the oxide liner increases the overall processing cost by up to 9% as compared to 10×100 TSV last flow |
187 |
Figure 8.15 |
Cost benchmark of TSV middle processing flows for different TSV geometries. Note that the TSV size and pitch are scaled while maintaining the TSV processing cost |
188 |
Figure 8.16 |
System integrated on top of an interposer substrate. The TSVs connect the interposer to the package substrate |
189 |
Figure 8.17 |
Comparison of processing costs per wafer for different features of an interposer substrate. All of the processing costs are normalized to the wafer cost of processing a 10 µm × 100 µm TSV middle flow |
190 |
Figure 8.18 |
Different interposer configurations: (A) single metal layer over a power metal plane, (B) two thick metal layers, and (C) MIM capacitor between power and ground metal planes with two thick metal layers |
192 |
Figure 8.19 |
Comparison of wafer-level processing cost for different interposer structures. The cost of each component is normalized to the cost of the 10 µm × 100 µm TSV |
192 |
Figure 8.20 |
3-D stacking approaches: vertical stack of three active dice, (A) D2W or W2W stacking, and (B) 2.5-D interposer-based stacking |
193 |
Figure 8.21 |
3-D integration technologies. (A) Three die stack. The stacking interface between the dice is microbumps. TSVs are fabricated on die 1 and die 2 to enable vertical signal propagation. (B) Three active dice on an interposer substrate. The active dice are stacked using microbumps. The TSVs are fabricated within the interposer die to provide access to the backside. The interposer is connected to the package substrate (not shown) by Cu pillars |
194 |
Figure 8.22 |
Comparison of processing cost per wafer to enable 3-D stacking. The features are processed either on the active dice and/or on the interposer substrate. For the die pick and place step, processing of 541 die/wafer is assumed |
195 |
Figure 8.23 |
Cost comparison of different stacking approaches and components. The cost of the compound yield losses is illustrated for each stacking approach. An area of 10 mm×10 mm is considered for the active dice |
196 |
Figure 8.24 |
Process cost and total cost of a 3-D system per die area as a function of active die size. Three different 3-D integration approaches are considered: D2W, W2W, and 2.5-D interposer |
197 |
Figure 8.25 |
Effect of interposer processing yield and test fault coverage on the cost of an interposer-based 2.5-D system |
198 |
Figure 8.26 |
Cost of 2.5-D interposer-based system in terms of the size of the stacked active die, interposer die processing yield (YINT), and fault coverage (FC) of interposer prestack testing. (A) YINT=99%, FC=100%, (B) YINT=99%, FC=50%, (C) YINT=99%, FC=0%, (D) YINT=90%, FC=100%, (E) YINT=90%, FC=50%, (F) YINT=90%, FC=0%, (G) YINT=80%, FC=100%, (H) YINT=80%, FC=50%, and (I) YINT=80%, FC=0% |
199 |
Figure 9.1 |
Example of positive (shown with solid lines) and negative (shown with dashed lines) step lines for block b |
205 |
Figure 9.2 |
Example of SP representation, where (A) is a group of blocks comprising a floorplan, (B) positive step lines for these blocks, and (C) negative step lines for the blocks |
206 |
Figure 9.3 |
Example of a net bounding box connecting pins from blocks a and c. The HPWL metric is the half length of the perimeter of the net bounding box. The solid line shows a possible net route to connect pins of blocks a and c marked by the solid squares |
207 |
Figure 9.4 |
Example of computing slack where (A) the blocks are floorplanned in left-to-right and top-to-bottom manner, and (B) the blocks are floorplanned in right-to-left and bottom-to-top mode |
208 |
Figure 9.5 |
Upper bound of area and volume for two- and three-dimensional slicing floorplans (F) depicted, respectively, by the solid and dashed curve for different shape aspect ratios. Vtotal (Atotal) and Vmax(Amax) are, respectively, the total and maximum volume (area) of a 3-D (2-D) system |
209 |
Figure 9.6 |
Floorplanning strategies for 3-D ICs. (A) Single step approach, and (B) multistep approach |
211 |
Figure 9.7 |
Different metrics to determine the length of a 3-D net, (A) the classic HPWL metric including only the pins of the net in all tiers, (B) an extended bounding box including the TSV locations, (C) the bounding box of the segments of the net within tier 2, and (D) the bounding box of the segment of the net belonging to tier 3 |
215 |
Figure 9.8 |
Flow of two stage floorplanning methods considering the TSV locations |
216 |
Figure 9.9 |
Whitespace within the bounding box of the intertier net can be used for placing a TSV without increasing the wirelength. This whitespace defines the candidate TSV islands. The whitespace outside the bounding box describes noncandidate TSV islands, as placing a TSV into these regions increases the wirelength |
218 |
Figure 9.10 |
A two tier floorplan with three intertier connected nets, (A) the blocks and pins, (B) the virtual die with the projection of the bounding box of each net, and (C) the routed nets and corresponding TSV island are shown. The notation pi,j is the pin of net i in tier j. The pins connected by each net are also indicated in the figure |
220 |
Figure 9.11 |
A three tier circuit, (A) the independent feasible region for a two pin net starting from tier 1 and terminating in tier 3 is shown by the dashed rectangle, (B) the allowed row (intertier) and column (intratier) connections are depicted with dashed lines, and (C) a potential route for this net is shown by the solid line. The dots illustrate available locations for buffers in each row (tier) |
222 |
Figure 9.12 |
Design flow of microarchitectural floorplanning process for 3-D microprocessors |
225 |
Figure 9.13 |
Two force directed placement processes, (A) the TSVs and circuit cells are placed simultaneously, and (B) the TSVs are placed prior to the circuit cells and behave as placement obstacles |
232 |
Figure 9.14 |
TSV assignment based on the MST of a net, where the closest TSV to the shortest edge of the net is inscribed by the dotted eclipse |
233 |
Figure 9.15 |
Analytic placement process for 3-D circuits considering number of TSVs and wirelength |
238 |
Figure 9.16 |
Process for determining available whitespace (WS), which is illustrated by the white regions |
240 |
Figure 9.17 |
Block placement of an SOP. (A) Initial placement, and (B) increase in the total area in the x and y directions to extend the area of the whitespace |
240 |
Figure 9.18 |
Layout of supercells. Supercells have the same heigth and varying width. The space around the supercells is used for buffers and TSVs |
243 |
Figure 9.19 |
An example of computing the matrices of a two tier grid, (A) route counts, and (B) routing density |
243 |
Figure 9.20 |
Channel alignment procedure to create intertier routing channels |
246 |
Figure 9.21 |
Pseudocode of 3-D routing algorithm targeting reductions in both performance and temperature |
247 |
Figure 9.22 |
An SOP consisting of n tiers. The vertical dashed lines correspond to vias between the routing layers, and the thick vertical solid lines correspond to through silicon vias that penetrate the device layers |
248 |
Figure 9.23 |
Stages of a 3-D global routing algorithm |
249 |
Figure 9.24 |
Layout windows with different area markers; (A) layout window for tier 1, and (B) layout window for tier 2 (the windows are not on the same scale) |
251 |
Figure 10.1 |
Global interconnect structures for impedance extraction. (A) Three parallel metal lines over a ground plane in a 2-D circuit, and (B) three parallel metal lines sandwiched between two ground planes in a 3-D circuit |
254 |
Figure 10.2 |
A three-tier FDSOI 3-D circuit. Tiers one and two are face-to-face bonded, while tiers two and three are face-to-back bonded |
255 |
Figure 10.3 |
Capacitance extraction for an intertier via structure, (A) intertier via surrounded by orthogonal metal layers, and (B) capacitance values for different via sizes and spacing values. The same dielectric material is assumed for all of the layers (i.e., εd = εi = εSiO2) |
256 |
Figure 10.4 |
Capacitance extraction for an intertier via structure, (A) intertier via through layers of dielectric and the bonding interface, surrounded by eight intertier vias, and (B) capacitance values for different via sizes and spacings |
257 |
Figure 10.5 |
Capacitance extraction for an intertier via structure, (A) intertier via through silicon substrate, surrounded by a thin insulator layer, and (B) capacitance for different via sizes and thicknesses of the insulator layer |
258 |
Figure 10.6 |
Two terminal intertier interconnect with single via and corresponding electrical model |
258 |
Figure 10.7 |
An example of interconnect sizing. (A) An interconnect of minimum width, Wmin, (B) uniform interconnect sizing W>Wmin, and (C) nonuniform interconnect sizing W=f(l) |
260 |
Figure 10.8 |
SPICE measurements of 50% propagation delay of a 600 μm line versus the via location l1 for different values of r21. The interconnect parameters are r1=79.5 Ω/mm, rv1=5.7 Ω/mm, cv1=6 pF/mm, c2=439 fF/mm, c12=1.45, lv=20 μm, and n=2. The driver resistance and load capacitance are, resepctively, RS=50 Ω and CL=50 fF |
262 |
Figure 10.9 |
SPICE measurements of the 50% propagation delay for a 600 μm line versus the via location l1 for different values of r21. The interconnect parameters are r1=79.5 Ω/mm, rv1=5.7 Ω/mm, cv1=6 pF/mm, c2=439 fF/mm, c12=0.46, lv=20 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=50 Ω and CL=50 fF |
263 |
Figure 10.10 |
Decrease in the delay improvement caused by the nonoptimal placement of the intertier via for a 500 μm interconnect. The interconnect parameters are r1=23.5 Ω/mm, rv1=270 Ω/mm, cv1=270 fF/mm, c2=287 fF/mm, lv=15 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=30 Ω and CL=100 fF |
265 |
Figure 10.11 |
Decrease in the delay improvement due to the nonoptimal placement of the intertier via for a 500 μm interconnect. The interconnect parameters are r1=23.5 Ω/mm, rv1=6.7 Ω/mm, cv1=270 fF/mm, c2=287 fF/mm, lv=15 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=100 Ω and CL=100 fF |
266 |
Figure 10.12 |
Intertier interconnect consisting of m segments connecting two circuits located n tiers apart |
267 |
Figure 10.13 |
Intertier interconnect model composed of a set of nonuniform distributed RC segments |
268 |
Figure 10.14 |
Case (iii) of the two terminal net heuristic. The allowed interval is iteratively decreased ensuring the optimum via location is eventually determined |
271 |
Figure 10.15 |
A subset of interconnect instances depicted by the dashed lines for case (iv) of the via placement heuristic. The interconnect traverses eight tiers and has a length L=1.455 mm. The resistance rj and capacitance cj of each interconnect segment range, respectively, from 10 to 50 Ω/mm and 100 to 500 fF/mm |
271 |
Figure 10.16 |
Pseudocode of the proposed two terminal net via placement algorithm |
273 |
Figure 10.17 |
Average and maximum improvement in delay for different range of interconnect segment resistance and capacitance ratios. The vias are placed either at the center of the allowed intervals or randomly, as explained in the legend of the diagram |
276 |
Figure 10.18 |
Comparison of the average Elmore delay based on wire sizing and optimum via placement techniques. The instance where the optimum via placement outperforms wire sizing (and vice versa) is also depicted |
277 |
Figure 10.19 |
NAPC for minimum width and wire segments of equal length, wire sizing, and wire segments of equal length, and minimum width and optimum via placement, yielding segments of different length |
278 |
Figure 11.1 |
Intertier interconnect tree. (A) Typical intertier interconnect tree, and (B) intervals and directions that the intertier via can be placed |
282 |
Figure 11.2 |
Different intertier via moves. (A) Type-1 move (allowed), (B) type-2 move (allowed), and (C) type-3 move (prohibited) |
283 |
Figure 11.3 |
Simple interconnect tree, illustrating a critical path (w3=1) and on path and off path intertier vias |
286 |
Figure 11.4 |
Pseudocode of the Interconnect Tree Via Placement Algorithm (ITVPA) |
287 |
Figure 11.5 |
Pseudocode of the near-optimal Single Critical Sink interconnect tree Via Placement Algorithm (SCSVPA) |
288 |
Figure 11.6 |
A symmetric tree including two intertier vias. The interconnect parameters per tier are r1=10.98 Ω/mm, r2=11.97 Ω/mm, r3=96.31 Ω/mm, c1=147.89 fF/mm, c2=202 fF/mm, and c3=388.51 fF/mm, and the allowed interval ldi,v2=75 μm |
290 |
Figure 12.1 |
Cross-session of a 3-D stack illustrating (A) the variety of materials, including the package and heat sink, which increase the complexity of the thermal analysis process, and (B) a thermal circuit used to model the flow of heat along the z-direction |
297 |
Figure 12.2 |
Schematic of a cross-section of a 3-D system with intertier liquid cooling through (A) microfluidic channels, and (B) through a micropin array |
300 |
Figure 12.3 |
Example of the duality of thermal and electrical systems |
304 |
Figure 12.4 |
Thermal model of a 3-D circuit where 1-D heat transfer is assumed. Each layer is assumed homogeneous with a single thermal conductivity |
305 |
Figure 12.5 |
Increase in temperature in a 3-D circuit for different number of tiers and power densities |
307 |
Figure 12.6 |
Different vertical heat transfer paths within a 3-D IC |
308 |
Figure 12.7 |
Maximum temperature versus power density for 3-D ICs, SOI, and bulk CMOS. The difference among the curves for the 3-D ICs is that the first curve (3-D horizontal and vertical) includes thermal paths with a horizontal interconnect segment, while the second curve includes only continuous vertical flow of heat through the wires |
310 |
Figure 12.8 |
Unit tile (or cell) including a thermal resistor in each x, y, z-direction. A thermal capacitor models the heat capacity of the tile and a heat source qx,y,z for the power consumed by the devices or the joule heating of the wires within this cell |
311 |
Figure 12.9 |
Thermal model of a 3-D IC. (A) A 3-D tile stack, (B) one pillar of the stack, and (C) an equivalent thermal resistive network. R1 and Rp correspond, respectively, to the thermal resistance of the thick silicon substrate of the first tier and the thermal resistance of the package |
312 |
Figure 12.10 |
Cross-section of a cell including a TSV within the silicon substrate |
314 |
Figure 12.11 |
Simulation setup for determining the thermal conductivity of the cell shown in Fig. 12.10 along (A) the xy-plane, and (B) along the z-direction |
314 |
Figure 12.12 |
A segment of a three tier 3-D IC with a TTSV, where (A) is the geometric structure, and (B) is the cross-section of a TTSV of this segment. The area of the circuit is denoted by A0. The three main paths of heat transfer are depicted by the dashed lines |
316 |
Figure 12.13 |
Thermal model of a TTSV in a three tier circuit, extendible to n tiers, where double notation is used to demonstrate that the model can be extended to a 3-D stack of n tiers |
318 |
Figure 12.14 |
Maximum rise in temperature in a three tier 3-D circuit for different dielectric liner thicknesses, where DTSV=10 μm. The other parameters are tSiO2=7 μm, tb=1 μm, tSi2=tSi3=45 μm, k1=1.3, and k2=0.55 |
319 |
Figure 12.15 |
Neighboring cells bending the isothermal curves due to the TSVs |
319 |
Figure 12.16 |
Schematic of a tapered TSV |
320 |
Figure 12.17 |
Thermal model of microchannel with conductive and convective thermal resistances |
322 |
Figure 12.18 |
Schematic illustration of the thermal wake effect, which leads to an exponential decay of the temperature downstream from the channel due to the heated cells located upstream. The transfer of heat occurs both downstream and transverse to the flow within the channel |
324 |
Figure 12.19 |
A four tier 3-D circuit discretized into a mesh |
325 |
Figure 12.20 |
Traditional V-cycles of multigrid methods with coarsening and refining stages |
326 |
Figure 12.21 |
Coarsening process excluding the BEOL layers in the z-direction to ensure that valuable physical information is not lost, improving the overall efficiency and accuracy of the multigrid technique |
327 |
Figure 12.22 |
Principle of power blurring method |
328 |
Figure 13.1 |
Cost function of the temperature |
335 |
Figure 13.2 |
A bucket structure example for a two tier circuit consisting of 12 blocks. (A) A two tier 3-D IC, (B) a 2×2 bucket structure imposed on a 3-D IC, and (C) the resulting bucket index |
335 |
Figure 13.3 |
Intertier moves. (A) An initial placement, (B) a z-neighbor swap between blocks a and h, and (C) a z-neighbor move for block l from the first tier to the second tier |
336 |
Figure 13.4 |
Three stage floorplanning process based on the force directed method |
340 |
Figure 13.5 |
Transition from a continuous 3-D space to discrete tiers. Block 2 is assigned to either the lower or upper tier, which results in different overlaps |
342 |
Figure 13.6 |
Mapping of a task graph onto physical PEs within a 3-D NoC |
343 |
Figure 13.7 |
Temperature balancing heuristic where (A) the tasks are sorted in descending power and assigned to super-tasks, (B) the temperature of each core, and (C) the super-tasks assigned to the super-cores |
348 |
Figure 13.8 |
First order thermal model, where each core is thermally modeled by a node with power Pi, specific heat Ci, and inter- and intratier thermal resistances |
349 |
Figure 13.9 |
3-D CMP consisting of a single four core tier with three tiers of SRAM and one tier of MRAM |
354 |
Figure 13.10 |
Dynamic thermal management schemes for a 3-D CMP employing a mixture of SRAM, MRAM, and DVFS, (A) SRAM-1 GHz core, (B) SRAM-3 GHz core, (C) SRAM-core DVFS, (D) hybrid-3 GHz core, and (E) hybrid-core DVFS |
355 |
Figure 13.11 |
Cross-sectional view of a 3-D ultra-thin system with peripheral copper TSVs |
358 |
Figure 13.12 |
Cross-sectional view of a two tier structure with a spatial heat source to evaluate the effects of the metal grid/plate and thickness of the adhesive materials on the thermal behavior of the structure (not to scale) |
359 |
Figure 13.13 |
Average temperature of a circuit surrounded by resistors used as heating elements where different means such as a TSV or metal ring are used to thermally isolate the circuit |
360 |
Figure 13.14 |
Thermal conductivity versus thermal via density |
366 |
Figure 13.15 |
Multi-level routing process with thermal via planning |
368 |
Figure 13.16 |
Heat propagation paths within a 3-D grid |
369 |
Figure 13.17 |
Routing grid for a two tier 3-D IC. Each horizontal edge of the grid is associated with a horizontal wire capacity. Each vertical edge is associated with an intertier via capacity |
372 |
Figure 13.18 |
Effect of a thermal wire on the routing capacity of each grid cell. vi and vj denote the capacity of the intertier vias for, respectively, cell i and j. The horizontal cell capacity is equal to the width of the cell boundary |
373 |
Figure 13.19 |
Flowchart of a temperature aware 3-D global routing technique |
373 |
Figure 13.20 |
Floorplan of a 3-D MPSoC, (A) cores and L2 caches are placed in separate tiers, and (B) cores and caches share the same tier |
375 |
Figure 14.1 |
Heat propagation from one tier spreading into a second stacked tier |
382 |
Figure 14.2 |
Physical layout, (A) on-chip resistive heater, (B) on-chip four-point resistive thermal sensor, and (C) overlay of the resistive heater and resistive thermal sensor |
384 |
Figure 14.3 |
Physical layout, (A) back metal resistive heater and (B) back metal four-point resistive thermal sensor |
385 |
Figure 14.4 |
Microphotograph of the test circuit depicting the back metal pattern with an overlay indicating the location of the on-chip thermal test sites |
386 |
Figure 14.5 |
Placement of thermal heaters and sensors, respectively, in metals 2 and 3 in the two stacked device planes. The placement of the back metal heaters and sensors is also shown |
387 |
Figure 14.6 |
Calibration of (A) on-chip thermal sensors, and (B) back metal thermal sensors |
389 |
Figure 14.7 |
Experimental results for the different test conditions. Each label describes the device plane, site location of the heater, and whether active cooling is applied |
390 |
Figure 14.8 |
Structure of the 3-D test circuit consisting of two silicon tiers and one back metal layer. Each tier has two separately controlled heaters (H1 and H2). The back metal is connected to WTop using thermal through silicon vias |
403 |
Figure 14.9 |
Comparison of temperatures for a horizontal path (length=1,300 μm) |
404 |
Figure 14.10 |
Comparison of temperatures for a vertical path (length=10 μm) |
404 |
Figure 14.11 |
Comparison of temperatures for a diagonal path (length=1,300 μm) |
405 |
Figure 14.12 |
Comparison of thermal resistance per unit length for a horizontal path (length=1,300 μm) |
405 |
Figure 14.13 |
Comparison of thermal resistance per unit length for a vertical path (length=1,300 μm) |
406 |
Figure 14.14 |
Comparison of thermal resistance per unit length for a diagonal path (length=1,300 μm) |
406 |
Figure 14.15 |
Simulated temperature at the WTop site 1 sensor for four densities of TSVs placed between the WTop site 1 heater/sensor pair and the back metal |
407 |
Figure 15.1 |
A data path depicting a pair of sequentially-adjacent registers |
411 |
Figure 15.2 |
Simple example of the MMM clock synthesis method where a clock tree is generated. (A) Without look-ahead, and (B) with look-ahead. An xy-cut leads to larger skew in (A) than a yx-cut in (B) |
412 |
Figure 15.3 |
TRR where the core of the region is a Manhattan arc and the boundary points are at a radius distance from the core |
414 |
Figure 15.4 |
Merging segment ms(u) for node u that is the parent node of nodes a and b based on TRRa and TRRb |
415 |
Figure 15.5 |
TRR with a core point. The placement location of the parent node p, pl(p) (which is known from the previous iteration) and radius equal to the wirelength of edge eu. The segment of ms(u) within the TRR is the thick line and represents the set of valid placement locations for node u |
416 |
Figure 15.6 |
Example of the DME method for a tree with eight sinks. (A) to (C) Bottom-up phase where the recursive derivation of the merging segments is accomplished and (D) to (F) top-down phase where the exact placement of each internal node is determined |
416 |
Figure 15.7 |
Two-dimensional four level H-tree |
418 |
Figure 15.8 |
Buffered and symmetric clock tree that drives a grid, where each unit grid constitutes a local clock network modeled as a lumped capacitor Cl_seg |
418 |
Figure 15.9 |
Global 3-D clock distribution networks based on planar symmetric H-trees, where during normal operation (A) one H-tree and multiple TSVs distribute the clock signal, and (B) two H-trees and a root TSV distribute the clock signal |
419 |
Figure 15.10 |
Cross-section of a 3-D stack of five tiers with one dedicated clock tier and four logic tiers |
420 |
Figure 15.11 |
Two clock delivery networks, (A) the networks are shorted only at the initial stages of the clock distribution, and (B) TSVs connect the clock networks at the lower levels of the clock network hierarchy |
422 |
Figure 15.12 |
Multi-TSV clock tree with 13 sinks and three TSVs spanning two tiers |
424 |
Figure 15.13 |
Several abstract trees for a set of eight sinks generated by the MMM-TB algorithm for different bounds of TSVs, (A) is the 2-D view of these trees and the dashed lines denote TSVs, (B) a 3-D view of the same trees, and (C) the resulting connection topologies where the gray rectangles refer to a TSV |
425 |
Figure 15.14 |
Pseudocode of the z-cut procedure for the MMM-TB algorithm |
426 |
Figure 15.15 |
A set of sinks S={a, b, c} where the effect of the recursive z-cuts in the MMM-TB algorithm is exemplified. (A) Two z-cuts are successively applied, (B) the source is in tier 3 and z-cut1 is followed by z-cut2, (C) the source is in tier 2 and the sinks in this tier are first extracted, and (D) the source is in tier 1 and z-cut2 is followed by z-cut1 |
427 |
Figure 15.16 |
Examples of merging segments for two intertier nodes u, v merged with node p. (A) An unbuffered tree, and (B) a buffered tree |
428 |
Figure 15.17 |
A tree with four sinks embedded in two tiers for different cases of embedding the internal nodes x1 and x2 and the root node sr and the resulting number of TSVs for each case. The notation xi,j (si,j) implies the placement of node xi (si) in tier j. (A) TSV=2, (B) TSV=3, (C) TSV=3, (D) TSV=4, (E) TSV=4, (F) TSV=3, (G) TSV=3, and (H) TSV=2 |
429 |
Figure 15.18 |
Different cases to determine the number of embedding tiers for node x, where the children nodes x1 and x2 are (A) clock sinks, and (C) and (E) are internal nodes. The minimum number of TSV for (A), (C), and (E) are shown, respectively, in (B), (D), and (F) |
430 |
Figure 15.19 |
Three tier clock tree using a single TSV for the intertier connections. This topology is pre-bond testable as each tier includes a network connecting all of the sinks |
432 |
Figure 15.20 |
Pre-bond testable clock tree with multiple TSVs. The buffers are inserted before the TSVs, thereby not changing the capacitance of the tree in tier 1. TGs in tier 2 connect the redundant tree (shown as a dashed line) with the subtrees during pre-bond test. The TGs are switched off after bonding, disconnecting the redundant tree |
433 |
Figure 15.21 |
Portion of a 3-D clock tree consisting of several subtrees STi. The TSVs in (A) are replaced with TSV buffers in (B) to decouple the clock tree in tier 1 from the clock tree in tier 2 |
434 |
Figure 15.22 |
Different cases where TSV and/or clock buffers are inserted. (A) A clock buffer is inserted to balance the delay between the two branches where tdA<tdB, (B) multiple clock buffers are inserted due to long wires or high downstream capacitance, and (C) a TSV buffer is inserted to decouple the downstream clock tree, and a clock buffer is added to counterbalance the delay imbalance caused by the TSV buffer |
434 |
Figure 15.23 |
Self-configured circuit controlling the operation of the TG (N5 and P5) |
439 |
Figure 15.24 |
A two tier clock tree, (A) the initial TSV locations are shown, and (B) TSV1 and TSV3 are relocated within the whitespace. The relocation adds wirelength (shown by the dashed lines) which degrades the performance of the clock tree topology shown in (A) |
440 |
Figure 15.25 |
Pre-clustering stage of the whitespace-aware CTS method, (A) a set of sinks and whitespaces are projected onto a plane, (B) the sinks per tier are located beyond distance β⋅HPWLtier from whitespaces, (C) those sinks within a cluster belong to the same tier, and (D) the root of the subtrees from the clustered sinks in each tier and some non-clustered sinks is depicted |
441 |
Figure 15.26 |
Reconstruction of the merging segments |
442 |
Figure 15.27 |
Different TSV redundancy schemes. (A) Double (N-times) redundancy, (B) 4:2 shared spare topology with two spare TSVs, (C) 4:1 shared spare topology with one spare TSV, and (D) 4:2 shared spare topology with no spare TSVs. |
444 |
Figure 15.28 |
Operation of a TSV TFC, (A) a pair TFC, (B) in pre-bond operation, the redundant tree is connected (shown with solid lines) while the TSVs are not present, (C) in post-bond operation with no defects, the clock signal is transferred by the TSVs, and (D) the TSV2 is defective and part of the redundant tree is used to propagate the clock signal to an adjacent subtree through TG2 and MUX2 |
445 |
Figure 15.29 |
Example of fault tolerant CTS from adjacent TSVs. TSVA and TSVB are within distance rp and form a TFC pair |
446 |
Figure 16.1 |
Three wafers are individually fabricated with an FDSOI process |
450 |
Figure 16.2 |
The second wafer is face-to-face bonded with the first wafer |
450 |
Figure 16.3 |
The 3-D vias are formed and the surface is planarized with chemical mechanical polishing |
451 |
Figure 16.4 |
The backside vias are etched, and the backside metal is deposited on the second wafer |
451 |
Figure 16.5 |
The third wafer is face-to-back bonded with the second wafer and the 3-D vias for that tier are formed |
451 |
Figure 16.6 |
Backside metal is deposited and glass layers are cut to create openings for the pads |
452 |
Figure 16.7 |
Layer thicknesses in the 3-D IC MITLL technology |
453 |
Figure 16.8 |
Block diagram of the 3-D test IC. Each block has an area of approximately 1 mm2. The remaining area is reserved for the I/O pads (the gray shapes) |
455 |
Figure 16.9 |
Block diagram of the logic circuit included in each tier of each block |
455 |
Figure 16.10 |
Physical layout of a pseudorandom number generator |
456 |
Figure 16.11 |
Physical layout of 6×6 crossbar switch with 16-bit wide ports |
456 |
Figure 16.12 |
Cascoded current mirror with an additional control transistor |
457 |
Figure 16.13 |
Four stage cascoded current mirrors |
458 |
Figure 16.14 |
Physical layout of the test circuit. Some decoupling capacitors are highlighted |
459 |
Figure 16.15 |
Two-dimensional H-trees constituting a clock distribution network for a 3-D IC |
460 |
Figure 16.16 |
Different 3-D clock distribution networks within the test circuit. (A) H-trees, (B) H-tree and local rings/meshes, (C) H-tree and global rings, and (D) trunk based |
461 |
Figure 16.17 |
Physical layout of the clock distribution networks in the 3-D IC. (A) H-trees, (B) H-tree and local rings/meshes, (C) H-tree and global rings, and (D) trunk based |
462 |
Figure 16.18 |
Clock signal probes with RF pads |
463 |
Figure 16.19 |
Open drain transistor and circuit model of the probe (includes impedance of RF pads) |
463 |
Figure 16.20 |
Structure of clock signal path from Fig. 16.16A to model the clock skew. The number within each oval represents the number of parallel TSVs between device tiers |
464 |
Figure 16.21 |
Equivalent electrical model of a TSV |
466 |
Figure 16.22 |
Top view of fabricated 3-D test circuit |
467 |
Figure 16.23 |
Magnified view of one block of the fabricated 3-D test circuit |
468 |
Figure 16.24 |
Die assembly of the 3-D test circuit with RF probes |
469 |
Figure 16.25 |
Clock signal input and output waveform from the topology with global rings, as illustrated in Fig. 16.16C |
470 |
Figure 16.26 |
Maximum measured clock skew between two tiers within the different clock distribution networks |
471 |
Figure 16.27 |
Part of the clock distribution networks illustrated in Figs. 16.16A and B. (A) The local clock skew is individually adjusted within each tier for the H-tree topology, and (B) the local skew is simultaneously adjusted for all of the tiers for the local mesh topology |
471 |
Figure 16.28 |
Measured power consumption at 1 GHz of the different circuit blocks |
472 |
Figure 17.1 |
Classification of process variations and an illustration of the physical scale of the disparate sources of variations |
476 |
Figure 17.2 |
Example of intratier and intertier paths. (A) One random variable is required to model D2D variations, and (B) two random variables (one for each tier) are used to model D2D variations for the entire path |
477 |
Figure 17.3 |
Notation used in the delay variability model for 2-D and 3-D circuits. (A) 2-D circuit comprising two critical paths each with three logic gates, and (B) two-tier 3-D circuit contains three critical paths each with three stages, where two paths are intratier paths and one path is an intertier path. Two random variables are required in (B) to model the D2D variations of each tier |
479 |
Figure 17.4 |
Cdf of a 2-D circuit (dashed line), a 3-D circuit with uneven critical path distribution between the two tiers (dashed dotted line), and a 3-D circuit with the same number of critical paths in each tier (dotted lined) |
481 |
Figure 17.5 |
3-D H-tree spanning four tiers. (A) Notation for all of the 64 sinks, and (B) certain sinks used to evaluate clock skew |
483 |
Figure 17.6 |
Elemental circuit to measure the distribution of delay due to variations in the buffer characteristics |
483 |
Figure 17.7 |
Electrical model of a segment of an intertier clock path |
485 |
Figure 17.8 |
Clock paths to sinks u and v where the paths share nu,v buffers |
489 |
Figure 17.9 |
A single via 3-D clock H-tree |
490 |
Figure 17.10 |
σ of skew for increasing number of tiers (tiers) and uncorellated WID variations for both the multi and single via topologies, (A) between sinks in the first tier, and (B) between sinks in the first and topmost tiers |
492 |
Figure 17.11 |
Example multi-group 3-D clock topology |
493 |
Figure 17.12 |
σ of skew for 3-D clock tree topologies. (A) Intratier skew of sink pairs s1,2 and s1,3, and (B) intertier skew of sink pairs s1,6 and s1,7 within a group of data related tiers |
494 |
Figure 17.13 |
Simplified 1-D model of a power distribution network to evaluate global power noise. Rti and Cti denote, respectively, the TSV resistance and capacitance of tier i |
497 |
Figure 17.14 |
Amplitude and frequency of the resonant noise versus the switching current in different tiers |
498 |
Figure 17.15 |
Resonant supply noise and IR drop versus the total resistance of the TSVs |
499 |
Figure 17.16 |
Resonant noise versus the number of tiers |
499 |
Figure 17.17 |
Clock uncertainty between 3-D clock paths. (A) Two paths and flip flops, and (B) corresponding clock signals |
501 |
Figure 17.18 |
Skitter versus length of 3-D clock paths |
504 |
Figure 17.19 |
Skitter for Vn1=90 mV and different Vn2 |
505 |
Figure 17.20 |
Setup skitter versus (Vn2, Vn1). (A) 3-D plot of μJA, (B) contour of μJA, (C) 3-D plot of μJB, (D) contour μJB, (E) contour of σJA, and (F) contour of σJB |
506 |
Figure 17.21 |
Hold skitter versus (Vn1 and Vn2). (A) Contours for σSA, and (B) contours for σSB |
507 |
Figure 17.22 |
Tradeoff between power and maximum allowed setup skitter max(J1,2) |
508 |
Figure 17.23 |
Skitter versus different ϕ (ϕ1=ϕ2). (A) change in μJ1,2, (B) change in σJ1,2, and (C) change in σS1,2 |
509 |
Figure 17.24 |
Skitter J1,2 versus shifted ϕ1 and ϕ2. (A) 3-D plot of σJ1,2 versus (ϕ2=ϕ1) for distribution (A), (B) contour map of σJ1,2 versus (ϕ2=ϕ1) for distribution (A), and (C) contour map of σJ1,2 for distribution (B) |
510 |
Figure 17.25 |
Skitter versus fn. (A) Change in J1,2, and (B) change in S1,2 |
511 |
Figure 17.26 |
Change of fn on delay variations. (A) Mean and standard deviation of buffer delay versus Vdd, and (B) supply voltage to the clock path during propagation of a clock edge |
512 |
Figure 17.27 |
Synthesized 3-D clock tree. (A) Majority of clock buffers in the first tier, (B) majority of clock buffers in the third tier, and (C) regions where the skitter is measured |
513 |
Figure 17.28 |
Normalized number of TSVs and power dissipation for Cases 2 to 4 |
516 |
Figure 18.1 |
Cross-sectional view of power distribution system where several levels of the hierarchy, motherboard, PCB, package, and integrated circuit are shown. The VRM and the decoupling capacitors placed at all levels of the hierarchy are also illustrated |
520 |
Figure 18.2 |
A three tier circuit where DC–DC conversion is integrated in the upper tiers to reduce losses within the power delivery system |
522 |
Figure 18.3 |
Buck converter integrated within a separate tier and connected to the logic tier with TSVs |
522 |
Figure 18.4 |
3-D power delivery system. (A) DC–DC buck converters are integrated within only one tier, and (B) DC–DC converters are integrated in the tiers at both ends of the stack. Two different types of TSVs are noted, those TSVs that distribute a high (off-chip) voltage (VDDH) to the converters and those TSVs which distribute a low (on-chip) voltage (VDDL) downstream from the output of the converters |
523 |
Figure 18.5 |
Equivalent circuit of the on-chip power distribution network of an n tier 3-D circuit, where the total IR drop across the tiers is denoted as Vdrop. Only one buck converter is integrated in one tier and the on-chip power distribution network is modeled as a 1-D network |
524 |
Figure 18.6 |
Equivalent circuit of the on-chip power distribution network of an n tier 3-D circuit, where the total IR drop across the tiers is denoted as V′drop. Two buck converters are integrated within the tiers at both ends of the circuit, each supplying current to half of the tiers of the stack |
524 |
Figure 18.7 |
A converter providing current within a prototype 2-D circuit used to emulate a 3-D system comprising eight tiers, where the TSVs and active loads are connected in a daisy chain |
526 |
Figure 18.8 |
A multi-level power distribution network applied to a three tier circuit where each pair of power levels is mapped to a single tier |
527 |
Figure 18.9 |
Equivalent circuit diagram of a power distribution network of a 3-D circuit, (A) supplied by a single Vdd, and (B) supplied by several pairs of Vdd supplies |
527 |
Figure 18.10 |
A 3-D circuit consisting of two memory tiers and one processor tier |
528 |
Figure 18.11 |
Multi-level power delivery system where two pairs of voltage levels are employed in each tier. (A) All of the circuits are active, (B) right half of the circuit in each tier is inactive (shown in gray), and (C) left half of the circuit in each tier is inactive (shown in gray) |
529 |
Figure 18.12 |
Multi-level power delivery system where each tier is supplied by one pair of voltage levels. (A) All of the circuits are active, (B) the processor is inactive (shown in gray), and (C) the memory tiers are inactive (shown in gray) |
530 |
Figure 18.13 |
A 3-D power distribution netwok (not to scale), (A) the power (ground) meshes are connected by power (ground) TSVs, and (B) the equivalent circuit model of a package pin, TSV, and unit cell including the decoupling capacitance and current source |
531 |
Figure 18.14 |
The segmentation method linking successive unit cells to model an entire power distribution network |
534 |
Figure 18.15 |
Decomposition of a unit cell including both power and ground lines along the x and y directions. The different structures formed by the decomposition process are also illustrated. Two metal layers are utilized for the power distribution network |
534 |
Figure 18.16 |
Decomposed structures and equivalent RLGC lumped sections. The notation of the physical parameters used in Table 18.2 is also defined |
535 |
Figure 18.17 |
Iterative process for electro-thermal analysis |
537 |
Figure 18.18 |
Overview of power grid, (A) a small segment of a power grid, and (B) corresponding electrical model including the parasitic impedance of the package |
538 |
Figure 18.19 |
Cross-sectional view of a TSV. (A) A standard solid TSV, and (B) a CTSV with two layers of metal separated by a dielectric layer |
541 |
Figure 18.20 |
Current paths within a 3-D circuit. (A) Where the TSV is connected to the power lines on both the uppermost (MT) and the first (M1) metal layers, and (B) where the TSV is connected only to the topmost (MT) metal layer |
542 |
Figure 18.21 |
Equivalent circuit of the current flow paths illustrated in Fig. 18.20. (A) The TSV locally distributes current, and (B) only stacks of metal vias supply current to the load |
542 |
Figure 18.22 |
Voltage drop at the current source as a function of the current drawn by the power supply |
543 |
Figure 18.23 |
Voltage drop as a function of distance of the current source from the TSV |
544 |
Figure 18.24 |
Resistive grid to model a segment of a power distribution system. (A) In the uppermost (M6) metal layer, and (B) in the lowest (M1) metal layer |
546 |
Figure 18.25 |
SPICE simulation of the voltage drop on the M1 grid for different nodes with (solid curves) and without (dashed curves) the TSV path. No stacked vias are removed (d=0) |
547 |
Figure 18.26 |
SPICE simulation of the maximum voltage drop on the M1 grid by successively removing the stacked vias (i.e., increasing d) with (dashed curves) and without (solid curves) the TSV path |
548 |
Figure 18.27 |
SPICE simulation of the voltage drop on the M1 grid for different nodes and with no stacked vias removed (d=0) with (solid curves) and without (dashed curves) the TSV path. Only three current sources switch |
548 |
Figure 18.28 |
Nonuniform TSV tapering to address both power supply noise and temperature. (A) Opposite tapering is required to individually satisfy the power supply noise and temperature objectives, and (B) adapting the size of the TSVs across tiers to ensure that both objectives are satisfied |
550 |
Figure 18.29 |
Power supply noise from employing one tier of decoupling capacitance. (A) A 2-D system, (B) a four tier system with no tier for the decoupling capacitance, (C) a decoupling capacitance tier close to the package, and (D) a decoupling capacitance tier on top of the 3-D system |
552 |
Figure 18.30 |
Power supply noise from employing two tiers of decoupling capacitance. (A) A 2-D system, (B) one decoupling capacitance tier is placed next to the package and the second tier between tiers two and three, (C) one decoupling capacitance tier is placed on top of the stack and the second tier between tiers two and three, and (D) both decoupling capacitance tiers are placed on top of the stack |
553 |
Figure 18.31 |
Reconfigurable decoupling capacitance topology where the decoupling capacitor is connected to the power rail even if the sleep transistors are switched off |
554 |
Figure 18.32 |
Always on decoupling capacitance topology. The charge provided to the local circuit blocks flows through the sleep transistors |
555 |
Figure 18.33 |
A daisy chain of buffers switches the sleep transistors on, subsequently ensuring that the current gradually increases, limiting the abrupt current changes within the power grid |
556 |
Figure 18.34 |
Current flow within a three tier stack. Note the current flowing through the TSVs of each tier |
558 |
Figure 18.35 |
Optimization framework for 3-D power distribution networks where both power supply noise and temperature constraints are considered. (A) Optimal sizing process for the middle tier(s) is initially determined, (B) the flowchart of the algorithm, and (C) step by step description of the algorithm |
560 |
Figure 19.1 |
Power distribution network topologies. (A) interdigitated power network on all tiers with the 3-D vias distributing current on the periphery and through the middle of the circuit, (B) interdigitated power network on all tiers with the 3-D vias distributing current on the periphery, and (C) interdigitated power network on tiers 1 and 3 and power/ground planes on tier 2 with the 3-D vias distributing current on the periphery and through the middle of the circuit |
567 |
Figure 19.2 |
Layout of the power distribution network test circuit |
569 |
Figure 19.3 |
Layout of the test circuit containing three interdigitated power and ground networks and test circuits for generating and measuring noise. (A) Overlay of all three device planes, (B) power and ground networks of the bottom tier (tier 1), (C) power and ground networks of the middle tier (tier 2), and (D) power and ground networks of the top tier (tier 3) |
570 |
Figure 19.4 |
Layout of the pattern sequence source for the noise generation circuits. (A) All three device planes, (B) noise generation circuits on the bottom tier (tier 1), (C) noise generation circuits on the middle tier (tier 2), and (D) noise generation circuits on the top tier (tier 3) |
572 |
Figure 19.5 |
Pattern sequence source for the noise generation circuits. (A) Ring oscillator, (B) buffer used for the RO and PRNG, (C) 5-bit PRNG, (D) 6-bit PRNG, (E) 9-bit PRNG, and (F) 10-bit PRNG |
574 |
Figure 19.6 |
Individual components in Figs. 19.5C–F with the corresponding transistor sizes. (A) Inverter, (B) AND gate, (C) OR gate, (D) XNOR gate, (E) 2-to-1 MUX, and (F) D flip-flop. |
576 |
Figure 19.7 |
Layout of the noise generation circuits. (A) All three device planes, (B) noise generation circuits on the bottom tier (tier 1), (C) noise generation circuits on the middle tier (tier 2), and (D) noise generation circuits on the top tier (tier 3) |
580 |
Figure 19.8 |
Schematic view of the (A) current mirror, and (B) switches that vary the total current through the current mirror |
581 |
Figure 19.9 |
Layout of the power and ground noise detection circuits including the control circuit. (A) All three device planes, (B) power and ground sense circuits for the bottom tier (tier 1), (C) power and ground sense circuits for the middle tier (tier 2) and control circuit for all three tiers, and (D) power and ground sense circuits for the top tier (tier 3) |
582 |
Figure 19.10 |
Rotating control logic to manage the RF output pads among the three device planes. The control signals to the RF pads are provided for both the power and ground detection signals for each device plane |
586 |
Figure 19.11 |
Block and I/O pin diagram of the DC and RF pad layout. The numbered rectangles are DC pads providing power and ground, and DC bias points for the current mirrors, reset signals, and electrostatic discharge protection. The light colored squares and rectangles are RF pads used to calibrate the sense circuits (internal to the labeled blocks) and measure noise on the power/ground networks (external to the labeled blocks) |
587 |
Figure 19.12 |
Microphotograph of the wire bonded test circuit |
589 |
Figure 19.13 |
Block level schematic of noise generation and detection circuits |
589 |
Figure 19.14 |
Source follower noise detection circuits detect noise on both the digital (A) power lines, and (B) ground lines |
590 |
Figure 19.15 |
Fabricated test circuit examining noise propagation within three different power distribution networks, and a distributed DC-to-DC rectifier. (A) Microphotograph of the 3-D test circuit, and (B) an enlarged image of Block 1 |
591 |
Figure 19.16 |
S-parameter characterization of the power and ground noise detection circuits |
592 |
Figure 19.17 |
Spectral analysis of the noise generated on the power line of Block 2, (A) board level decoupling capacitance, and (B) without board level decoupling capacitance. |
593 |
Figure 19.18 |
Time domain measurement of the generated noise on the power line of Block 2 without board level decoupling capacitance for a voltage bias on the current mirrors of (A) 0 volts, (B) 0.5 volt, (C) 0.75 volts, and (D) 1 volt |
594 |
Figure 19.19 |
Average noise voltage on the power and ground distribution networks with and without board level decoupling capacitance. (A) Average noise of power network without decoupling capacitance, (B) average noise of power network with decoupling capacitance, (C) average noise of ground network without decoupling capacitance, and (D) average noise of ground network with decoupling capacitance. A total of 4,096 data points are used to calculate the average noise for each topology at each current mirror bias voltage |
595 |
Figure 19.20 |
Peak noise voltage on the power and ground distribution networks with and without board level decoupling capacitance. (A) Peak noise of power network without decoupling capacitance, (B) peak noise of power network with decoupling capacitance, (C) peak noise of ground network without decoupling capacitance, and (D) peak noise of ground network with decoupling capacitance. A single peak data point (from 4,096 points) is determined for each topology at each current mirror bias voltage |
597 |
Figure 19.21 |
Equivalent electrical model of the cables, board, wirebonds, on-chip DC pads, power distribution networks, and TSVs |
601 |
Figure 20.1 |
Taxonomy of 3-D architectures for wire limited circuits |
606 |
Figure 20.2 |
Popular interconnection network topologies, (A) 3-D mesh, and (B) 2-D torus |
607 |
Figure 20.3 |
Different partitioning levels and related design complexity vs the architectural granularity for 3-D microprocessors |
608 |
Figure 20.4 |
An example of different partitions levels for a 3-D microprocessor system at the (A) core, (B) functional unit block (FUB), (C) macrocell, and (D) transistor levels |
608 |
Figure 20.5 |
2-D organization of a cache memory with additional circuitry |
611 |
Figure 20.6 |
2-D and 3-D organization of a 32 Kb cache memory array. Nspd is the number of sets connected to a word line |
611 |
Figure 20.7 |
Word line partitioning onto two tiers of the 2-D cache memory shown in Fig. 20.5 |
612 |
Figure 20.8 |
Bit line partitioning onto two tiers of the 2-D cache memory shown in Fig. 20.5 |
613 |
Figure 20.9 |
Different organizations of a microprocessor system, (A) 2-D baseline system, (B) a second tier with 8 MB SRAM cache memory, (C) a second tier with 32 MB SRAM cache memory, and (D) a second tier with 64 MB DRAM cache memory |
614 |
Figure 20.10 |
Several NoC topologies (not to scale), (A) 2-D IC–2-D NoC, (B) 2-D IC–3-D NoC, (C) 3-D IC–2-D NoC, and (D) 3-D IC–3-D NoC |
617 |
Figure 20.11 |
Typical interconnect structure for intermediate metal layers |
624 |
Figure 20.12 |
Zero-load latency for several network sizes. (A) APE=0.81 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm |
626 |
Figure 20.13 |
Zero-load latency for various network sizes. (A) APE= 0.64 mm2 and ch= 192.5 fF/mm, (B) APE= 2.25 mm2 and ch= 192.5 fF/mm |
627 |
Figure 20.14 |
Improvement in zero-load latency for different network sizes and PE areas (i.e., buss lengths). (A) 2-D IC–3-D NoC, and (B) 3-D IC–2-D NoC |
628 |
Figure 20.15 |
Zero-load latency for various network sizes. (A) APE=1 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm |
628 |
Figure 20.16 |
n3 and np values for minimum zero-load latency for various network sizes. (A) APE=1 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm |
629 |
Figure 20.17 |
Power consumption with delay constraints for several network sizes. (A) APE=1 mm2, ch=332.6 fF/mm, and T0 = 500 ps, and (B) APE=4 mm2, ch=332.6 fF/mm, and T0 = 500 ps |
630 |
Figure 20.18 |
Power consumption with delay constraints for several network sizes. (A) APE= 0.64 mm2, ch= 192.5 fF/mm, and T0=1000 ps, and (B) APE= 2.25 mm2, ch= 192.5 fF/mm, and T0=1000 ps |
631 |
Figure 20.19 |
Power consumption with delay constraints for various network sizes. (A) APE= 1 mm2, ch= 332.6 fF/mm, and T0=500 ps, and (B) APE= 4 mm2, ch= 332.6 fF/mm, and T0=500 ps |
632 |
Figure 20.20 |
An overview of the 3-D NoC simulator |
633 |
Figure 20.21 |
Position of the vertical interconnection links for each tier within a 3-D NoC (each tier is a 6 × 6 mesh), (A) fully connected 3-D NoC, (B) uniform distribution of vertical links, (C) vertical links at the center of the NoC, and (D) vertical links at the periphery of the NoC |
635 |
Figure 20.22 |
Effect of traffic load on the latency of a 2-D and 3-D torus NoC for each type of traffic and XYZ routing |
638 |
Figure 20.23 |
Latency of 64 node 2-D and 3-D meshes and tori NoCs under uniform traffic, XYZ routing, and several traffic loads |
639 |
Figure 20.24 |
Different performance metrics under uniform traffic and a normal traffic load of a 3-D NoC for alternative interconnection topologies with XYZ-OLD routing, (A) 64 network nodes, and (B) 144 network nodes |
640 |
Figure 20.25 |
Several performance metrics under uniform traffic and a low traffic load of a 3-D NoC for alternative interconnection topologies with XYZ routing, (A) a 4×4×4 3-D mesh, and (B) a 6×6×4 3-D mesh |
641 |
Figure 20.26 |
Typical FPGA architecture, (A) 2-D FPGA, (B) 2-D switch box, and (C) 3-D switch box. A routing track can connect three outgoing tracks in a 2-D SB, while in a 3-D SB, a routing track can connect five outgoing routing tracks |
642 |
Figure 20.27 |
Interconnects that span more than one logic block. Li denotes the length of these interconnects and i is the number of LBs traversed by these wires |
643 |
Figure 20.28 |
Interconnect delay for several number of physical tiers, (A) average length wires, and (B) die edge length interconnects |
645 |
Figure 20.29 |
Power dissipated by 2-D and 3-D FPGAs |
646 |
Figure C.1 |
Intertier interconnect consisting of m segments connecting two circuits located n tiers apart |
658 |
Figure D.1 |
Portion of an interconnect tree |
660 |
Figure E.1 |
Modeling spatial correlations using quad-tree partitioning |
662 |