List of Figures

Figure 1.1 History of semiconductor transistors and logic styles 2
Figure 1.2 Evolutionary paths of the microelectronics industry 3
Figure 1.3 Interconnect system composed of groups of local, semi-global, and global layers. The metal layers in each group are typically of different thickness 4
Figure 1.4 Repeaters are inserted at specific distances to improve the interconnect delay 5
Figure 1.5 Interconnect shielding to improve signal integrity, (A) single-sided shielding, and (B) double-sided shielding. The shield and signal lines are, respectively, illustrated by the gray and white color 5
Figure 1.6 Cross-section of a joint MOS (JMOS) inverter 7
Figure 1.7 Reduction in wirelength where the original 2-D circuit is composed of two and four tiers 8
Figure 1.8 Heterogeneous 3-D SoC comprising sensors and processing tiers 9
Figure 2.1 Three-dimensional stacked inverter 16
Figure 2.2 Examples of SiP technologies. (A) Wire-bonded SiP, (B) solder balls at the perimeter of the tiers, (C) area array vertical interconnects, and (D) interconnects on the faces of the SiP 17
Figure 2.3 Different communication schemes for 3-D ICs. (A) Short TSVs, (B) inductive coupling, and (C) capacitive coupling 18
Figure 2.4 Typical SoP, which can include both SiP and SoCs 19
Figure 2.5 Manufacturing and design challenges for 3-D integration 20
Figure 2.6 System miniaturization through the integration of sophisticated 3-D ICs 21
Figure 2.7 Wire-bonded SiP. (A) Dissimilar dies with multiple row bonding, (B) wire-bonded stack delimited by spacer, (C) SiP with die-to-die and die-to-package wire bonding, and (D) top view of wire-bonded SiP 22
Figure 2.8 SiP with peripheral connections. (A) solder balls, (B) through hole via and spacers, and (C) through hole via in a PCB frame structure 24
Figure 2.9 Basic manufacturing phases of an SiP. (A) Interposer bumping and solder ball deposition, (B) die attachment, (C) tier stacking, and (D) epoxy underfill for enhanced reliability 25
Figure 2.10 Cross-section of the SiP after removing the mold. (A) The SiP encapsulated in epoxy resin, (B) sawing to expose the metal traces, and (C) sawing to expose the bonding wires 28
Figure 2.11 Interposer-based 2.5-D systems where (A) ICs are mounted on only one face of the interposer (single side), and (B) ICs are attached to both sides of the interposer (double side) 29
Figure 2.12 Embedded interposer within a package substrate enabling multi-IC integration 31
Figure 2.13 An SiP system comprising (A) a top IC with Cu pillars and a bottom IC with solder bumps on a TSH interposer, and (B) the dimensions of several components (not shown to scale) 33
Figure 3.1 Typical interconnects paths for (A) wire-bonded SiP, (B) SiP with solder balls, and (C) 3-D IC with TSVs 38
Figure 3.2 Cross-section of a stacked 3-D IC with a planarized heat shield to avoid degradation of the transistor characteristics on the first layer due to the temperature of the fabrication processes 40
Figure 3.3 Cross-section of a device level stacked 3-D IC with a PMOS device on the bottom layer and an NMOS device in recrystallized silicon on the second layer 40
Figure 3.4 Processing steps for laterally crystallized TFT based on Ge-seeding. (A) Deposition of amorphous silicon, (B) creating seeding windows, (C) deposition of seeding materials, (D) producing silicon islands, and (E) processing of TFTs 41
Figure 3.5 Processing steps for vertical and lateral growth of 3-D SOI devices. (A) Definition of SOI islands, (B) silicon dioxide deposition, (C) formation of SEG window, (D) silicon growth within the SEG window, (E) etching of redundant silicon with CMP, (F) definition of upper device layer, (G) deposition of upper layer, (H) formation of SOI islands on the upper tier 43
Figure 3.6 Basic processing steps for a 3-D inverter utilizing the local clustering approach. (A) Oxide deposition, (B) wafer patterning and active area definition, (C) low temperature oxide deposition, (D) deposition of nitride film, (E) via formation at the drain side, (F) etching of the nitride film, (G) boron doping, (H) active area definition, (I) gate oxide growth by thermal oxidation, (J) deposition of doped polysilicon 44
Figure 3.7 Sequential process for fabricating monolithic 3-D circuits, where (A) the SOI devices in the first layer are manufactured with standard SOI processes, (B) molecular bonding allows the transfer of a high quality substrate, (C) the devices in the upper layers are formed, and (D) metal contacts connect the device layers 45
Figure 3.8 Monolithically stacked devices where the interlayer contacts (3-D contact) and the standard metal contacts (tungsten plug) connecting the devices are illustrated. The 3-D contact has similar traits to a standard contact connecting two metal layers 46
Figure 3.9 Typical fabrication steps for a 3-D IC process. (A) Wafer preparation, (B) TSV etching, (C) wafer thinning, bumping, and handle wafer attachment, (D) wafer bonding, and (E) handle wafer removal 49
Figure 3.10 The cavity alignment method, (A) the cavity template is aligned and bonded to the substrate, (B) the individual tiers are placed in the cavity through compression, (C) the 3-D stack is assmbled through thermal compression, and (D) the cavity template is removed 52
Figure 3.11 Process for face-to-face bonding and substrate assembly, removing the need for TSVs 53
Figure 3.12 Metal-to-metal bonding; (A) square bumps, and (B) conic bumps for improved bonding quality 54
Figure 3.13 Capacitively coupled 3-D IC. The large plate capacitors are utilized for power transfer, while the small plate capacitors provide signal propagation 56
Figure 3.14 Inductively coupled 3-D ICs. Galvanic connections may be used for power delivery 57
Figure 3.15 Basic steps of a via-last manufacturing process (not to scale) 59
Figure 3.16 Basic steps of a via-first manufacturing process (not to scale) 59
Figure 3.17 Basic steps of a via-middle manufacturing process (not to scale) 60
Figure 3.18 TSV formation and filling after FEOL (wafer thinning) and BEOL (via-last approach) 61
Figure 3.19 TSV shapes. (A) Straight and (B) tapered 62
Figure 3.20 The scallops formed due to the time multiplexed nature of the BOSCH process 62
Figure 3.21 Poor TSV filling resulting in void formation, (A) large void at the bottom, and (B) seam void 63
Figure 3.22 Structure of partial TSV and related materials 64
Figure 4.1 Early TSV from patents filed by (A) William Shockley, and (B) Merlin Smith and Emanuel Stern of IBM 68
Figure 4.2 Equivalent π-model of a TSV 69
Figure 4.3 Top view of a TSV in silicon depicting the oxide layer, TiCu seed layer, and copper TSV 71
Figure 4.4 3-D via structure. (A) 3-D via with top and bottom copper landings, and (B) equivalent structure without metal landings 72
Figure 4.5 Current profile due to the proximity effect for (A) currents propagating in opposite directions, and (B) currents flowing in the same direction 76
Figure 4.6 Cross-sectional view of different CMOS technologies with TSVs depicting the formation of a depletion region around the TSV in (A) bulk CMOS, and (B) bulk CMOS with a p+ buried layer. The TSVs in either PD-SOI (shown in (C) top) or FD-SOI (shown in (C) bottom) reveal minimal formation of a depletion region 80
Figure 4.7 Ratio of the total TSV capacitance to the oxide capacitance as a function of applied voltage Vg 81
Figure 4.8 Physical parameters and materials used in the compact models of a TSV for a single device layer, as listed in Table 4.6. (A) Side view, and (B) top view 83
Figure 4.9 Resistance of a cylindrical 3-D via at DC, 1 GHz, and 2 GHz 86
Figure 4.10 Per cent error as a function of frequency for the resistance of a 3-D via (a.r.=aspect ratio) 87
Figure 4.11 Self-inductance L11 of a cylindrical 3-D via 93
Figure 4.12 Mutual inductance L21 of a cylindrical 3-D via with a 20 µm diameter 94
Figure 4.13 Mutual inductance L21 between two 3-D vias with different lengths (D=10 µm, and 3g=yimage) 95
Figure 4.14 Capacitance of a cylindrical 3-D via over a ground plane 101
Figure 4.15 Coupling capacitance between two 3-D vias over a ground plane (D=20 µm) 103
Figure 4.16 Frequency range applicable to Q3D models and closed-form inductance expressions 106
Figure 4.17 Critical dimensions of a 3-D via over a ground plane for the MITLL 3-D process 108
Figure 4.18 Circuit model for RLC extraction of (A) two 3-D vias, and (B) two 3-D vias with a shield via between two signal vias 111
Figure 4.19 Effect of a return path on the loop inductance. (A) Return path placed on 3-D via 2, and (B) return path placed on 3-D via 3 114
Figure 5.1 Heterogeneous 3-D integrated circuit 120
Figure 5.2 Model of noise coupling from TSV to a victim device through a silicon substrate. (A) General model, and (B) reduced model 121
Figure 5.3 Noise coupling from a TSV to a victim device. (A) Short-circuit Ge substrate model, and (B) open circuit GaAs substrate model 123
Figure 5.4 Isolation efficiency of a noise coupled system for different substrate materials. (A) Silicon, (B) germanium, and (C) gallium arsenide 125
Figure 5.5 Equivalent small-signal model of a noise coupled system 126
Figure 5.6 Resistance and inductance versus line width of the ground network. The ground network is composed of copper-based interconnects 128
Figure 5.7 Isolation efficiency of a noise coupled system as a function of the line width of the ground network for different substrate materials. (A) Silicon, (B) germanium, and (C) gallium arsenide 129
Figure 5.8 Distance from aggressor module “A” on tier m to victim module “V” on tier n 130
Figure 5.9 Effect of distance between an aggressor and victim on the isolation efficiency for a Ge substrate. The resonant frequency is observed at the peak isolation efficiency due to the increasing reactance of the ground network 130
Figure 5.10 Keep out region around an aggressor TSV. The victim modules (Victim) should be placed outside this region 131
Figure 5.11 Isolation efficiency versus frequency and radius of keep out region for different substrate materials. (A) Si, (B) Ge, and (C) GaAs 132
Figure 5.12 Keep out region around aggressor TSV for Nmax=40dBimage. The victim modules should be placed on the isolation efficiency surface below the base surface 133
Figure 5.13 Comparison between SPICE model and extracted transfer function for different substrate materials. (A) Si, (B) Ge, and (C) GaAs 134
Figure 6.1 An inductive link between the transmitter and receiver circuits including the coupled on-chip inductors 139
Figure 6.2 Equivalent circuit of an inductive link including the parasitic resistance and capacitance of the on-chip inductors 139
Figure 6.3 Model for pulse modulation. (A) Current of the transmitter modeled as a Gaussian pulse and (B) voltage induced on the receiver 140
Figure 6.4 Coupling efficiency for distance X and decreasing outer diameter dout 141
Figure 6.5 Square spiral on-chip inductor with n=7 turns, illustrating the geometric parameters 142
Figure 6.6 Flow diagram for the design of the coils in an inductive link under power, performance, and area constraints 144
Figure 6.7 Transceiver circuit of a synchronous inductive link 146
Figure 6.8 Transceiver circuit of an asynchronous inductive link 147
Figure 6.9 Block diagram of an inductive coupling scheme with burst transmission 148
Figure 6.10 Efficiency of TSV and inductive interfaces with higher multiplexing density 151
Figure 6.11 Top view of a structure comprising an inductive link and the return path through a power delivery network placed in different locations 154
Figure 6.12 Noise induced by an inductive pair for varying δc 154
Figure 6.13 Array of inductive links and P/G loops connected to C4 supply pads. The power and ground lines are depicted, respectively, by solid and dashed lines 156
Figure 6.14 Parasitic noise induced on a power wire depending upon the distance of the wire from the inductor 157
Figure 6.15 Power delivery network topologies. (A) Interdigitated P/G–P/G topology, (B) paired type-I P/G–P/G topology, and (C) paired type-II P/P–G/G topology 157
Figure 6.16 Wireless power transmission for standard inductive coupling 160
Figure 7.1 An example of the method used to determine the distribution of the interconnect length. Group NA includes one gate, group NB includes the gates located at a distance smaller than l (encircled by the dashed curve), and NC is the group of gates at distance l from group NA (encircled by the solid curve). In this example, l=4 (the distance is measured in gate pitches) 164
Figure 7.2 An example of the method used to determine the interconnect length distribution in 3-D circuits. (A) Partial Manhattan hemisphere, and (B) cross-section of the partial Manhattan hemisphere along e-e′. The gates in NB and NC are shown, respectively, with light and dark gray tones 167
Figure 7.3 Example of starting and nonstarting gates. Gates P and Q can be starting gates while S is a nonstarting gate 167
Figure 7.4 Possible vertical interconnections for two cells with each cell containing n gates 169
Figure 7.5 Interconnect length distribution for a 2-D and 3-D IC 170
Figure 7.6 Variation of gate pitch, total interconnect length, and interconnect power consumption with the number of tiers 171
Figure 8.1 Cross-sectional view. (A) TSV middle, and (B) TSV last 176
Figure 8.2 Processing flow for TSV middle and TSV last considered in terms of cost and complexity 177
Figure 8.3 Comparison of TSV lithography cost for different TSV process flows and geometries. The difference in cost between TSV middle and TSV last is due to different process equipment 178
Figure 8.4 Comparison of processing cost of the TSV etching step for different TSV geometries. The processing cost is normalized to the cost of etching a 5×50 TSV middle structure 179
Figure 8.5 Comparison of processing cost to deposit the TSV oxide liner for different TSV geometries. The processing cost is normalized to the cost of the liner deposition for a 5×50 TSV middle structure. In the case of TSV middle, the oxide liner at the field of the wafer is removed by CMP. For TSV last, no CMP polishing of the liner is necessary 180
Figure 8.6 Comparison of in-via oxide liner etch processing cost for different TSV last geometries. The TSVs with a smaller diameter require longer liner etch processing time. The processing cost is normalized to the process cost of a 5×50 TSV last flow 181
Figure 8.7 Cost comparison of barrier seed process for TSV middle geometries. Non-PVD deposition approaches can be applied to TSV sizes of 3×50 and 2×40. The processing cost is normalized to the process cost of a 5×50 TSV middle flow 182
Figure 8.8 Cost comparison of barrier seed process for TSV last geometries. PVD processing is applied for all TSV last sizes. The processing cost is normalized to the process cost of 5×50 TSV middle flow 182
Figure 8.9 Cost comparison of TSV Cu plating process for both TSV middle and TSV last flows for different TSV geometries. The cost comparison considers processing and material costs and is normalized to the process cost of the 5×50 TSV middle (POR) flow 183
Figure 8.10 Cost of Cu CMP for different Cu overburden thicknesses. The fine Cu polish step is the dominant cost component for Cu thicknesses up to 2,000 nm 183
Figure 8.11 CMP benchmark of deposition materials used in TSV processing. For each material, a thickness of 100 nm is considered to estimate polishing time and slurry consumption 184
Figure 8.12 Cost benchmark of backside processing steps for TSV middle and TSV last flows 186
Figure 8.13 Benchmark of overall processing costs for different TSV geometries for both TSV middle and TSV last flows 186
Figure 8.14 Cost benchmark of the 5×50 and 10×100 TSV flows. For the 5×50 TSV middle, polishing the oxide liner increases the cost of the CMP step. For the 5×50 TSV last process, the liner deposition, liner etch steps, and backside CMP are the primary cost differentiators. For the 10×100 TSV middle process, polishing the oxide liner increases the overall processing cost by up to 9% as compared to 10×100 TSV last flow 187
Figure 8.15 Cost benchmark of TSV middle processing flows for different TSV geometries. Note that the TSV size and pitch are scaled while maintaining the TSV processing cost 188
Figure 8.16 System integrated on top of an interposer substrate. The TSVs connect the interposer to the package substrate 189
Figure 8.17 Comparison of processing costs per wafer for different features of an interposer substrate. All of the processing costs are normalized to the wafer cost of processing a 10 µm × 100 µm TSV middle flow 190
Figure 8.18 Different interposer configurations: (A) single metal layer over a power metal plane, (B) two thick metal layers, and (C) MIM capacitor between power and ground metal planes with two thick metal layers 192
Figure 8.19 Comparison of wafer-level processing cost for different interposer structures. The cost of each component is normalized to the cost of the 10 µm × 100 µm TSV 192
Figure 8.20 3-D stacking approaches: vertical stack of three active dice, (A) D2W or W2W stacking, and (B) 2.5-D interposer-based stacking 193
Figure 8.21 3-D integration technologies. (A) Three die stack. The stacking interface between the dice is microbumps. TSVs are fabricated on die 1 and die 2 to enable vertical signal propagation. (B) Three active dice on an interposer substrate. The active dice are stacked using microbumps. The TSVs are fabricated within the interposer die to provide access to the backside. The interposer is connected to the package substrate (not shown) by Cu pillars 194
Figure 8.22 Comparison of processing cost per wafer to enable 3-D stacking. The features are processed either on the active dice and/or on the interposer substrate. For the die pick and place step, processing of 541 die/wafer is assumed 195
Figure 8.23 Cost comparison of different stacking approaches and components. The cost of the compound yield losses is illustrated for each stacking approach. An area of 10 mm×10 mm is considered for the active dice 196
Figure 8.24 Process cost and total cost of a 3-D system per die area as a function of active die size. Three different 3-D integration approaches are considered: D2W, W2W, and 2.5-D interposer 197
Figure 8.25 Effect of interposer processing yield and test fault coverage on the cost of an interposer-based 2.5-D system 198
Figure 8.26 Cost of 2.5-D interposer-based system in terms of the size of the stacked active die, interposer die processing yield (YINT), and fault coverage (FC) of interposer prestack testing. (A) YINT=99%, FC=100%, (B) YINT=99%, FC=50%, (C) YINT=99%, FC=0%, (D) YINT=90%, FC=100%, (E) YINT=90%, FC=50%, (F) YINT=90%, FC=0%, (G) YINT=80%, FC=100%, (H) YINT=80%, FC=50%, and (I) YINT=80%, FC=0% 199
Figure 9.1 Example of positive (shown with solid lines) and negative (shown with dashed lines) step lines for block b 205
Figure 9.2 Example of SP representation, where (A) is a group of blocks comprising a floorplan, (B) positive step lines for these blocks, and (C) negative step lines for the blocks 206
Figure 9.3 Example of a net bounding box connecting pins from blocks a and c. The HPWL metric is the half length of the perimeter of the net bounding box. The solid line shows a possible net route to connect pins of blocks a and c marked by the solid squares 207
Figure 9.4 Example of computing slack where (A) the blocks are floorplanned in left-to-right and top-to-bottom manner, and (B) the blocks are floorplanned in right-to-left and bottom-to-top mode 208
Figure 9.5 Upper bound of area and volume for two- and three-dimensional slicing floorplans (F) depicted, respectively, by the solid and dashed curve for different shape aspect ratios. Vtotal (Atotal) and Vmax(Amax) are, respectively, the total and maximum volume (area) of a 3-D (2-D) system 209
Figure 9.6 Floorplanning strategies for 3-D ICs. (A) Single step approach, and (B) multistep approach 211
Figure 9.7 Different metrics to determine the length of a 3-D net, (A) the classic HPWL metric including only the pins of the net in all tiers, (B) an extended bounding box including the TSV locations, (C) the bounding box of the segments of the net within tier 2, and (D) the bounding box of the segment of the net belonging to tier 3 215
Figure 9.8 Flow of two stage floorplanning methods considering the TSV locations 216
Figure 9.9 Whitespace within the bounding box of the intertier net can be used for placing a TSV without increasing the wirelength. This whitespace defines the candidate TSV islands. The whitespace outside the bounding box describes noncandidate TSV islands, as placing a TSV into these regions increases the wirelength 218
Figure 9.10 A two tier floorplan with three intertier connected nets, (A) the blocks and pins, (B) the virtual die with the projection of the bounding box of each net, and (C) the routed nets and corresponding TSV island are shown. The notation pi,j is the pin of net i in tier j. The pins connected by each net are also indicated in the figure 220
Figure 9.11 A three tier circuit, (A) the independent feasible region for a two pin net starting from tier 1 and terminating in tier 3 is shown by the dashed rectangle, (B) the allowed row (intertier) and column (intratier) connections are depicted with dashed lines, and (C) a potential route for this net is shown by the solid line. The dots illustrate available locations for buffers in each row (tier) 222
Figure 9.12 Design flow of microarchitectural floorplanning process for 3-D microprocessors 225
Figure 9.13 Two force directed placement processes, (A) the TSVs and circuit cells are placed simultaneously, and (B) the TSVs are placed prior to the circuit cells and behave as placement obstacles 232
Figure 9.14 TSV assignment based on the MST of a net, where the closest TSV to the shortest edge of the net is inscribed by the dotted eclipse 233
Figure 9.15 Analytic placement process for 3-D circuits considering number of TSVs and wirelength 238
Figure 9.16 Process for determining available whitespace (WS), which is illustrated by the white regions 240
Figure 9.17 Block placement of an SOP. (A) Initial placement, and (B) increase in the total area in the x and y directions to extend the area of the whitespace 240
Figure 9.18 Layout of supercells. Supercells have the same heigth and varying width. The space around the supercells is used for buffers and TSVs 243
Figure 9.19 An example of computing the matrices of a two tier grid, (A) route counts, and (B) routing density 243
Figure 9.20 Channel alignment procedure to create intertier routing channels 246
Figure 9.21 Pseudocode of 3-D routing algorithm targeting reductions in both performance and temperature 247
Figure 9.22 An SOP consisting of n tiers. The vertical dashed lines correspond to vias between the routing layers, and the thick vertical solid lines correspond to through silicon vias that penetrate the device layers 248
Figure 9.23 Stages of a 3-D global routing algorithm 249
Figure 9.24 Layout windows with different area markers; (A) layout window for tier 1, and (B) layout window for tier 2 (the windows are not on the same scale) 251
Figure 10.1 Global interconnect structures for impedance extraction. (A) Three parallel metal lines over a ground plane in a 2-D circuit, and (B) three parallel metal lines sandwiched between two ground planes in a 3-D circuit 254
Figure 10.2 A three-tier FDSOI 3-D circuit. Tiers one and two are face-to-face bonded, while tiers two and three are face-to-back bonded 255
Figure 10.3 Capacitance extraction for an intertier via structure, (A) intertier via surrounded by orthogonal metal layers, and (B) capacitance values for different via sizes and spacing values. The same dielectric material is assumed for all of the layers (i.e., εd = εi = εSiO2) 256
Figure 10.4 Capacitance extraction for an intertier via structure, (A) intertier via through layers of dielectric and the bonding interface, surrounded by eight intertier vias, and (B) capacitance values for different via sizes and spacings 257
Figure 10.5 Capacitance extraction for an intertier via structure, (A) intertier via through silicon substrate, surrounded by a thin insulator layer, and (B) capacitance for different via sizes and thicknesses of the insulator layer 258
Figure 10.6 Two terminal intertier interconnect with single via and corresponding electrical model 258
Figure 10.7 An example of interconnect sizing. (A) An interconnect of minimum width, Wmin, (B) uniform interconnect sizing W>Wmin, and (C) nonuniform interconnect sizing W=f(l) 260
Figure 10.8 SPICE measurements of 50% propagation delay of a 600 μm line versus the via location l1 for different values of r21. The interconnect parameters are r1=79.5 Ω/mm, rv1=5.7 Ω/mm, cv1=6 pF/mm, c2=439 fF/mm, c12=1.45, lv=20 μm, and n=2. The driver resistance and load capacitance are, resepctively, RS=50 Ω and CL=50 fF 262
Figure 10.9 SPICE measurements of the 50% propagation delay for a 600 μm line versus the via location l1 for different values of r21. The interconnect parameters are r1=79.5 Ω/mm, rv1=5.7 Ω/mm, cv1=6 pF/mm, c2=439 fF/mm, c12=0.46, lv=20 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=50 Ω and CL=50 fF 263
Figure 10.10 Decrease in the delay improvement caused by the nonoptimal placement of the intertier via for a 500 μm interconnect. The interconnect parameters are r1=23.5 Ω/mm, rv1=270 Ω/mm, cv1=270 fF/mm, c2=287 fF/mm, lv=15 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=30 Ω and CL=100 fF 265
Figure 10.11 Decrease in the delay improvement due to the nonoptimal placement of the intertier via for a 500 μm interconnect. The interconnect parameters are r1=23.5 Ω/mm, rv1=6.7 Ω/mm, cv1=270 fF/mm, c2=287 fF/mm, lv=15 μm, and n=2. The driver resistance and load capacitance are, respectively, RS=100 Ω and CL=100 fF 266
Figure 10.12 Intertier interconnect consisting of m segments connecting two circuits located n tiers apart 267
Figure 10.13 Intertier interconnect model composed of a set of nonuniform distributed RC segments 268
Figure 10.14 Case (iii) of the two terminal net heuristic. The allowed interval is iteratively decreased ensuring the optimum via location is eventually determined 271
Figure 10.15 A subset of interconnect instances depicted by the dashed lines for case (iv) of the via placement heuristic. The interconnect traverses eight tiers and has a length L=1.455 mm. The resistance rj and capacitance cj of each interconnect segment range, respectively, from 10 to 50 Ω/mm and 100 to 500 fF/mm 271
Figure 10.16 Pseudocode of the proposed two terminal net via placement algorithm 273
Figure 10.17 Average and maximum improvement in delay for different range of interconnect segment resistance and capacitance ratios. The vias are placed either at the center of the allowed intervals or randomly, as explained in the legend of the diagram 276
Figure 10.18 Comparison of the average Elmore delay based on wire sizing and optimum via placement techniques. The instance where the optimum via placement outperforms wire sizing (and vice versa) is also depicted 277
Figure 10.19 NAPC for minimum width and wire segments of equal length, wire sizing, and wire segments of equal length, and minimum width and optimum via placement, yielding segments of different length 278
Figure 11.1 Intertier interconnect tree. (A) Typical intertier interconnect tree, and (B) intervals and directions that the intertier via can be placed 282
Figure 11.2 Different intertier via moves. (A) Type-1 move (allowed), (B) type-2 move (allowed), and (C) type-3 move (prohibited) 283
Figure 11.3 Simple interconnect tree, illustrating a critical path (w3=1) and on path and off path intertier vias 286
Figure 11.4 Pseudocode of the Interconnect Tree Via Placement Algorithm (ITVPA) 287
Figure 11.5 Pseudocode of the near-optimal Single Critical Sink interconnect tree Via Placement Algorithm (SCSVPA) 288
Figure 11.6 A symmetric tree including two intertier vias. The interconnect parameters per tier are r1=10.98 Ω/mm, r2=11.97 Ω/mm, r3=96.31 Ω/mm, c1=147.89 fF/mm, c2=202 fF/mm, and c3=388.51 fF/mm, and the allowed interval ldi,v2=75 μm 290
Figure 12.1 Cross-session of a 3-D stack illustrating (A) the variety of materials, including the package and heat sink, which increase the complexity of the thermal analysis process, and (B) a thermal circuit used to model the flow of heat along the z-direction 297
Figure 12.2 Schematic of a cross-section of a 3-D system with intertier liquid cooling through (A) microfluidic channels, and (B) through a micropin array 300
Figure 12.3 Example of the duality of thermal and electrical systems 304
Figure 12.4 Thermal model of a 3-D circuit where 1-D heat transfer is assumed. Each layer is assumed homogeneous with a single thermal conductivity 305
Figure 12.5 Increase in temperature in a 3-D circuit for different number of tiers and power densities 307
Figure 12.6 Different vertical heat transfer paths within a 3-D IC 308
Figure 12.7 Maximum temperature versus power density for 3-D ICs, SOI, and bulk CMOS. The difference among the curves for the 3-D ICs is that the first curve (3-D horizontal and vertical) includes thermal paths with a horizontal interconnect segment, while the second curve includes only continuous vertical flow of heat through the wires 310
Figure 12.8 Unit tile (or cell) including a thermal resistor in each x, y, z-direction. A thermal capacitor models the heat capacity of the tile and a heat source qx,y,z for the power consumed by the devices or the joule heating of the wires within this cell 311
Figure 12.9 Thermal model of a 3-D IC. (A) A 3-D tile stack, (B) one pillar of the stack, and (C) an equivalent thermal resistive network. R1 and Rp correspond, respectively, to the thermal resistance of the thick silicon substrate of the first tier and the thermal resistance of the package 312
Figure 12.10 Cross-section of a cell including a TSV within the silicon substrate 314
Figure 12.11 Simulation setup for determining the thermal conductivity of the cell shown in Fig. 12.10 along (A) the xy-plane, and (B) along the z-direction 314
Figure 12.12 A segment of a three tier 3-D IC with a TTSV, where (A) is the geometric structure, and (B) is the cross-section of a TTSV of this segment. The area of the circuit is denoted by A0. The three main paths of heat transfer are depicted by the dashed lines 316
Figure 12.13 Thermal model of a TTSV in a three tier circuit, extendible to n tiers, where double notation is used to demonstrate that the model can be extended to a 3-D stack of n tiers 318
Figure 12.14 Maximum rise in temperature in a three tier 3-D circuit for different dielectric liner thicknesses, where DTSV=10 μm. The other parameters are tSiO2image=7 μm, tb=1 μm, tSi2=tSi3=45 μm, k1=1.3, and k2=0.55 319
Figure 12.15 Neighboring cells bending the isothermal curves due to the TSVs 319
Figure 12.16 Schematic of a tapered TSV 320
Figure 12.17 Thermal model of microchannel with conductive and convective thermal resistances 322
Figure 12.18 Schematic illustration of the thermal wake effect, which leads to an exponential decay of the temperature downstream from the channel due to the heated cells located upstream. The transfer of heat occurs both downstream and transverse to the flow within the channel 324
Figure 12.19 A four tier 3-D circuit discretized into a mesh 325
Figure 12.20 Traditional V-cycles of multigrid methods with coarsening and refining stages 326
Figure 12.21 Coarsening process excluding the BEOL layers in the z-direction to ensure that valuable physical information is not lost, improving the overall efficiency and accuracy of the multigrid technique 327
Figure 12.22 Principle of power blurring method 328
Figure 13.1 Cost function of the temperature 335
Figure 13.2 A bucket structure example for a two tier circuit consisting of 12 blocks. (A) A two tier 3-D IC, (B) a 2×2 bucket structure imposed on a 3-D IC, and (C) the resulting bucket index 335
Figure 13.3 Intertier moves. (A) An initial placement, (B) a z-neighbor swap between blocks a and h, and (C) a z-neighbor move for block l from the first tier to the second tier 336
Figure 13.4 Three stage floorplanning process based on the force directed method 340
Figure 13.5 Transition from a continuous 3-D space to discrete tiers. Block 2 is assigned to either the lower or upper tier, which results in different overlaps 342
Figure 13.6 Mapping of a task graph onto physical PEs within a 3-D NoC 343
Figure 13.7 Temperature balancing heuristic where (A) the tasks are sorted in descending power and assigned to super-tasks, (B) the temperature of each core, and (C) the super-tasks assigned to the super-cores 348
Figure 13.8 First order thermal model, where each core is thermally modeled by a node with power Pi, specific heat Ci, and inter- and intratier thermal resistances 349
Figure 13.9 3-D CMP consisting of a single four core tier with three tiers of SRAM and one tier of MRAM 354
Figure 13.10 Dynamic thermal management schemes for a 3-D CMP employing a mixture of SRAM, MRAM, and DVFS, (A) SRAM-1 GHz core, (B) SRAM-3 GHz core, (C) SRAM-core DVFS, (D) hybrid-3 GHz core, and (E) hybrid-core DVFS 355
Figure 13.11 Cross-sectional view of a 3-D ultra-thin system with peripheral copper TSVs 358
Figure 13.12 Cross-sectional view of a two tier structure with a spatial heat source to evaluate the effects of the metal grid/plate and thickness of the adhesive materials on the thermal behavior of the structure (not to scale) 359
Figure 13.13 Average temperature of a circuit surrounded by resistors used as heating elements where different means such as a TSV or metal ring are used to thermally isolate the circuit 360
Figure 13.14 Thermal conductivity versus thermal via density 366
Figure 13.15 Multi-level routing process with thermal via planning 368
Figure 13.16 Heat propagation paths within a 3-D grid 369
Figure 13.17 Routing grid for a two tier 3-D IC. Each horizontal edge of the grid is associated with a horizontal wire capacity. Each vertical edge is associated with an intertier via capacity 372
Figure 13.18 Effect of a thermal wire on the routing capacity of each grid cell. vi and vj denote the capacity of the intertier vias for, respectively, cell i and j. The horizontal cell capacity is equal to the width of the cell boundary 373
Figure 13.19 Flowchart of a temperature aware 3-D global routing technique 373
Figure 13.20 Floorplan of a 3-D MPSoC, (A) cores and L2 caches are placed in separate tiers, and (B) cores and caches share the same tier 375
Figure 14.1 Heat propagation from one tier spreading into a second stacked tier 382
Figure 14.2 Physical layout, (A) on-chip resistive heater, (B) on-chip four-point resistive thermal sensor, and (C) overlay of the resistive heater and resistive thermal sensor 384
Figure 14.3 Physical layout, (A) back metal resistive heater and (B) back metal four-point resistive thermal sensor 385
Figure 14.4 Microphotograph of the test circuit depicting the back metal pattern with an overlay indicating the location of the on-chip thermal test sites 386
Figure 14.5 Placement of thermal heaters and sensors, respectively, in metals 2 and 3 in the two stacked device planes. The placement of the back metal heaters and sensors is also shown 387
Figure 14.6 Calibration of (A) on-chip thermal sensors, and (B) back metal thermal sensors 389
Figure 14.7 Experimental results for the different test conditions. Each label describes the device plane, site location of the heater, and whether active cooling is applied 390
Figure 14.8 Structure of the 3-D test circuit consisting of two silicon tiers and one back metal layer. Each tier has two separately controlled heaters (H1 and H2). The back metal is connected to WTop using thermal through silicon vias 403
Figure 14.9 Comparison of temperatures for a horizontal path (length=1,300 μm) 404
Figure 14.10 Comparison of temperatures for a vertical path (length=10 μm) 404
Figure 14.11 Comparison of temperatures for a diagonal path (length=1,300 μm) 405
Figure 14.12 Comparison of thermal resistance per unit length for a horizontal path (length=1,300 μm) 405
Figure 14.13 Comparison of thermal resistance per unit length for a vertical path (length=1,300 μm) 406
Figure 14.14 Comparison of thermal resistance per unit length for a diagonal path (length=1,300 μm) 406
Figure 14.15 Simulated temperature at the WTop site 1 sensor for four densities of TSVs placed between the WTop site 1 heater/sensor pair and the back metal 407
Figure 15.1 A data path depicting a pair of sequentially-adjacent registers 411
Figure 15.2 Simple example of the MMM clock synthesis method where a clock tree is generated. (A) Without look-ahead, and (B) with look-ahead. An xy-cut leads to larger skew in (A) than a yx-cut in (B) 412
Figure 15.3 TRR where the core of the region is a Manhattan arc and the boundary points are at a radius distance from the core 414
Figure 15.4 Merging segment ms(u) for node u that is the parent node of nodes a and b based on TRRa and TRRb 415
Figure 15.5 TRR with a core point. The placement location of the parent node p, pl(p) (which is known from the previous iteration) and radius equal to the wirelength of edge eu. The segment of ms(u) within the TRR is the thick line and represents the set of valid placement locations for node u 416
Figure 15.6 Example of the DME method for a tree with eight sinks. (A) to (C) Bottom-up phase where the recursive derivation of the merging segments is accomplished and (D) to (F) top-down phase where the exact placement of each internal node is determined 416
Figure 15.7 Two-dimensional four level H-tree 418
Figure 15.8 Buffered and symmetric clock tree that drives a grid, where each unit grid constitutes a local clock network modeled as a lumped capacitor Cl_seg 418
Figure 15.9 Global 3-D clock distribution networks based on planar symmetric H-trees, where during normal operation (A) one H-tree and multiple TSVs distribute the clock signal, and (B) two H-trees and a root TSV distribute the clock signal 419
Figure 15.10 Cross-section of a 3-D stack of five tiers with one dedicated clock tier and four logic tiers 420
Figure 15.11 Two clock delivery networks, (A) the networks are shorted only at the initial stages of the clock distribution, and (B) TSVs connect the clock networks at the lower levels of the clock network hierarchy 422
Figure 15.12 Multi-TSV clock tree with 13 sinks and three TSVs spanning two tiers 424
Figure 15.13 Several abstract trees for a set of eight sinks generated by the MMM-TB algorithm for different bounds of TSVs, (A) is the 2-D view of these trees and the dashed lines denote TSVs, (B) a 3-D view of the same trees, and (C) the resulting connection topologies where the gray rectangles refer to a TSV 425
Figure 15.14 Pseudocode of the z-cut procedure for the MMM-TB algorithm 426
Figure 15.15 A set of sinks S={a, b, c} where the effect of the recursive z-cuts in the MMM-TB algorithm is exemplified. (A) Two z-cuts are successively applied, (B) the source is in tier 3 and z-cut1 is followed by z-cut2, (C) the source is in tier 2 and the sinks in this tier are first extracted, and (D) the source is in tier 1 and z-cut2 is followed by z-cut1 427
Figure 15.16 Examples of merging segments for two intertier nodes u, v merged with node p. (A) An unbuffered tree, and (B) a buffered tree 428
Figure 15.17 A tree with four sinks embedded in two tiers for different cases of embedding the internal nodes x1 and x2 and the root node sr and the resulting number of TSVs for each case. The notation xi,j (si,j) implies the placement of node xi (si) in tier j. (A) TSV=2, (B) TSV=3, (C) TSV=3, (D) TSV=4, (E) TSV=4, (F) TSV=3, (G) TSV=3, and (H) TSV=2 429
Figure 15.18 Different cases to determine the number of embedding tiers for node x, where the children nodes x1 and x2 are (A) clock sinks, and (C) and (E) are internal nodes. The minimum number of TSV for (A), (C), and (E) are shown, respectively, in (B), (D), and (F) 430
Figure 15.19 Three tier clock tree using a single TSV for the intertier connections. This topology is pre-bond testable as each tier includes a network connecting all of the sinks 432
Figure 15.20 Pre-bond testable clock tree with multiple TSVs. The buffers are inserted before the TSVs, thereby not changing the capacitance of the tree in tier 1. TGs in tier 2 connect the redundant tree (shown as a dashed line) with the subtrees during pre-bond test. The TGs are switched off after bonding, disconnecting the redundant tree 433
Figure 15.21 Portion of a 3-D clock tree consisting of several subtrees STi. The TSVs in (A) are replaced with TSV buffers in (B) to decouple the clock tree in tier 1 from the clock tree in tier 2 434
Figure 15.22 Different cases where TSV and/or clock buffers are inserted. (A) A clock buffer is inserted to balance the delay between the two branches where tdA<tdB, (B) multiple clock buffers are inserted due to long wires or high downstream capacitance, and (C) a TSV buffer is inserted to decouple the downstream clock tree, and a clock buffer is added to counterbalance the delay imbalance caused by the TSV buffer 434
Figure 15.23 Self-configured circuit controlling the operation of the TG (N5 and P5) 439
Figure 15.24 A two tier clock tree, (A) the initial TSV locations are shown, and (B) TSV1 and TSV3 are relocated within the whitespace. The relocation adds wirelength (shown by the dashed lines) which degrades the performance of the clock tree topology shown in (A) 440
Figure 15.25 Pre-clustering stage of the whitespace-aware CTS method, (A) a set of sinks and whitespaces are projected onto a plane, (B) the sinks per tier are located beyond distance β⋅HPWLtier from whitespaces, (C) those sinks within a cluster belong to the same tier, and (D) the root of the subtrees from the clustered sinks in each tier and some non-clustered sinks is depicted 441
Figure 15.26 Reconstruction of the merging segments 442
Figure 15.27 Different TSV redundancy schemes. (A) Double (N-times) redundancy, (B) 4:2 shared spare topology with two spare TSVs, (C) 4:1 shared spare topology with one spare TSV, and (D) 4:2 shared spare topology with no spare TSVs. 444
Figure 15.28 Operation of a TSV TFC, (A) a pair TFC, (B) in pre-bond operation, the redundant tree is connected (shown with solid lines) while the TSVs are not present, (C) in post-bond operation with no defects, the clock signal is transferred by the TSVs, and (D) the TSV2 is defective and part of the redundant tree is used to propagate the clock signal to an adjacent subtree through TG2 and MUX2 445
Figure 15.29 Example of fault tolerant CTS from adjacent TSVs. TSVA and TSVB are within distance rp and form a TFC pair 446
Figure 16.1 Three wafers are individually fabricated with an FDSOI process 450
Figure 16.2 The second wafer is face-to-face bonded with the first wafer 450
Figure 16.3 The 3-D vias are formed and the surface is planarized with chemical mechanical polishing 451
Figure 16.4 The backside vias are etched, and the backside metal is deposited on the second wafer 451
Figure 16.5 The third wafer is face-to-back bonded with the second wafer and the 3-D vias for that tier are formed 451
Figure 16.6 Backside metal is deposited and glass layers are cut to create openings for the pads 452
Figure 16.7 Layer thicknesses in the 3-D IC MITLL technology 453
Figure 16.8 Block diagram of the 3-D test IC. Each block has an area of approximately 1 mm2. The remaining area is reserved for the I/O pads (the gray shapes) 455
Figure 16.9 Block diagram of the logic circuit included in each tier of each block 455
Figure 16.10 Physical layout of a pseudorandom number generator 456
Figure 16.11 Physical layout of 6×6 crossbar switch with 16-bit wide ports 456
Figure 16.12 Cascoded current mirror with an additional control transistor 457
Figure 16.13 Four stage cascoded current mirrors 458
Figure 16.14 Physical layout of the test circuit. Some decoupling capacitors are highlighted 459
Figure 16.15 Two-dimensional H-trees constituting a clock distribution network for a 3-D IC 460
Figure 16.16 Different 3-D clock distribution networks within the test circuit. (A) H-trees, (B) H-tree and local rings/meshes, (C) H-tree and global rings, and (D) trunk based 461
Figure 16.17 Physical layout of the clock distribution networks in the 3-D IC. (A) H-trees, (B) H-tree and local rings/meshes, (C) H-tree and global rings, and (D) trunk based 462
Figure 16.18 Clock signal probes with RF pads 463
Figure 16.19 Open drain transistor and circuit model of the probe (includes impedance of RF pads) 463
Figure 16.20 Structure of clock signal path from Fig. 16.16A to model the clock skew. The number within each oval represents the number of parallel TSVs between device tiers 464
Figure 16.21 Equivalent electrical model of a TSV 466
Figure 16.22 Top view of fabricated 3-D test circuit 467
Figure 16.23 Magnified view of one block of the fabricated 3-D test circuit 468
Figure 16.24 Die assembly of the 3-D test circuit with RF probes 469
Figure 16.25 Clock signal input and output waveform from the topology with global rings, as illustrated in Fig. 16.16C 470
Figure 16.26 Maximum measured clock skew between two tiers within the different clock distribution networks 471
Figure 16.27 Part of the clock distribution networks illustrated in Figs. 16.16A and B. (A) The local clock skew is individually adjusted within each tier for the H-tree topology, and (B) the local skew is simultaneously adjusted for all of the tiers for the local mesh topology 471
Figure 16.28 Measured power consumption at 1 GHz of the different circuit blocks 472
Figure 17.1 Classification of process variations and an illustration of the physical scale of the disparate sources of variations 476
Figure 17.2 Example of intratier and intertier paths. (A) One random variable is required to model D2D variations, and (B) two random variables (one for each tier) are used to model D2D variations for the entire path 477
Figure 17.3 Notation used in the delay variability model for 2-D and 3-D circuits. (A) 2-D circuit comprising two critical paths each with three logic gates, and (B) two-tier 3-D circuit contains three critical paths each with three stages, where two paths are intratier paths and one path is an intertier path. Two random variables are required in (B) to model the D2D variations of each tier 479
Figure 17.4 Cdf of a 2-D circuit (dashed line), a 3-D circuit with uneven critical path distribution between the two tiers (dashed dotted line), and a 3-D circuit with the same number of critical paths in each tier (dotted lined) 481
Figure 17.5 3-D H-tree spanning four tiers. (A) Notation for all of the 64 sinks, and (B) certain sinks used to evaluate clock skew 483
Figure 17.6 Elemental circuit to measure the distribution of delay due to variations in the buffer characteristics 483
Figure 17.7 Electrical model of a segment of an intertier clock path 485
Figure 17.8 Clock paths to sinks u and v where the paths share nu,v buffers 489
Figure 17.9 A single via 3-D clock H-tree 490
Figure 17.10 σ of skew for increasing number of tiers (tiers) and uncorellated WID variations for both the multi and single via topologies, (A) between sinks in the first tier, and (B) between sinks in the first and topmost tiers 492
Figure 17.11 Example multi-group 3-D clock topology 493
Figure 17.12 σ of skew for 3-D clock tree topologies. (A) Intratier skew of sink pairs s1,2 and s1,3, and (B) intertier skew of sink pairs s1,6 and s1,7 within a group of data related tiers 494
Figure 17.13 Simplified 1-D model of a power distribution network to evaluate global power noise. Rti and Cti denote, respectively, the TSV resistance and capacitance of tier i 497
Figure 17.14 Amplitude and frequency of the resonant noise versus the switching current in different tiers 498
Figure 17.15 Resonant supply noise and IR drop versus the total resistance of the TSVs 499
Figure 17.16 Resonant noise versus the number of tiers 499
Figure 17.17 Clock uncertainty between 3-D clock paths. (A) Two paths and flip flops, and (B) corresponding clock signals 501
Figure 17.18 Skitter versus length of 3-D clock paths 504
Figure 17.19 Skitter for Vn1=90 mV and different Vn2 505
Figure 17.20 Setup skitter versus (Vn2, Vn1). (A) 3-D plot of μJA, (B) contour of μJA, (C) 3-D plot of μJB, (D) contour μJB, (E) contour of σJA, and (F) contour of σJB 506
Figure 17.21 Hold skitter versus (Vn1 and Vn2). (A) Contours for σSA, and (B) contours for σSB 507
Figure 17.22 Tradeoff between power and maximum allowed setup skitter max(J1,2) 508
Figure 17.23 Skitter versus different ϕ (ϕ1=ϕ2). (A) change in μJ1,2, (B) change in σJ1,2, and (C) change in σS1,2 509
Figure 17.24 Skitter J1,2 versus shifted ϕ1 and ϕ2. (A) 3-D plot of σJ1,2 versus (ϕ2=ϕ1) for distribution (A), (B) contour map of σJ1,2 versus (ϕ2=ϕ1) for distribution (A), and (C) contour map of σJ1,2 for distribution (B) 510
Figure 17.25 Skitter versus fn. (A) Change in J1,2, and (B) change in S1,2 511
Figure 17.26 Change of fn on delay variations. (A) Mean and standard deviation of buffer delay versus Vdd, and (B) supply voltage to the clock path during propagation of a clock edge 512
Figure 17.27 Synthesized 3-D clock tree. (A) Majority of clock buffers in the first tier, (B) majority of clock buffers in the third tier, and (C) regions where the skitter is measured 513
Figure 17.28 Normalized number of TSVs and power dissipation for Cases 2 to 4 516
Figure 18.1 Cross-sectional view of power distribution system where several levels of the hierarchy, motherboard, PCB, package, and integrated circuit are shown. The VRM and the decoupling capacitors placed at all levels of the hierarchy are also illustrated 520
Figure 18.2 A three tier circuit where DC–DC conversion is integrated in the upper tiers to reduce losses within the power delivery system 522
Figure 18.3 Buck converter integrated within a separate tier and connected to the logic tier with TSVs 522
Figure 18.4 3-D power delivery system. (A) DC–DC buck converters are integrated within only one tier, and (B) DC–DC converters are integrated in the tiers at both ends of the stack. Two different types of TSVs are noted, those TSVs that distribute a high (off-chip) voltage (VDDH) to the converters and those TSVs which distribute a low (on-chip) voltage (VDDL) downstream from the output of the converters 523
Figure 18.5 Equivalent circuit of the on-chip power distribution network of an n tier 3-D circuit, where the total IR drop across the tiers is denoted as Vdrop. Only one buck converter is integrated in one tier and the on-chip power distribution network is modeled as a 1-D network 524
Figure 18.6 Equivalent circuit of the on-chip power distribution network of an n tier 3-D circuit, where the total IR drop across the tiers is denoted as Vdrop. Two buck converters are integrated within the tiers at both ends of the circuit, each supplying current to half of the tiers of the stack 524
Figure 18.7 A converter providing current within a prototype 2-D circuit used to emulate a 3-D system comprising eight tiers, where the TSVs and active loads are connected in a daisy chain 526
Figure 18.8 A multi-level power distribution network applied to a three tier circuit where each pair of power levels is mapped to a single tier 527
Figure 18.9 Equivalent circuit diagram of a power distribution network of a 3-D circuit, (A) supplied by a single Vdd, and (B) supplied by several pairs of Vdd supplies 527
Figure 18.10 A 3-D circuit consisting of two memory tiers and one processor tier 528
Figure 18.11 Multi-level power delivery system where two pairs of voltage levels are employed in each tier. (A) All of the circuits are active, (B) right half of the circuit in each tier is inactive (shown in gray), and (C) left half of the circuit in each tier is inactive (shown in gray) 529
Figure 18.12 Multi-level power delivery system where each tier is supplied by one pair of voltage levels. (A) All of the circuits are active, (B) the processor is inactive (shown in gray), and (C) the memory tiers are inactive (shown in gray) 530
Figure 18.13 A 3-D power distribution netwok (not to scale), (A) the power (ground) meshes are connected by power (ground) TSVs, and (B) the equivalent circuit model of a package pin, TSV, and unit cell including the decoupling capacitance and current source 531
Figure 18.14 The segmentation method linking successive unit cells to model an entire power distribution network 534
Figure 18.15 Decomposition of a unit cell including both power and ground lines along the x and y directions. The different structures formed by the decomposition process are also illustrated. Two metal layers are utilized for the power distribution network 534
Figure 18.16 Decomposed structures and equivalent RLGC lumped sections. The notation of the physical parameters used in Table 18.2 is also defined 535
Figure 18.17 Iterative process for electro-thermal analysis 537
Figure 18.18 Overview of power grid, (A) a small segment of a power grid, and (B) corresponding electrical model including the parasitic impedance of the package 538
Figure 18.19 Cross-sectional view of a TSV. (A) A standard solid TSV, and (B) a CTSV with two layers of metal separated by a dielectric layer 541
Figure 18.20 Current paths within a 3-D circuit. (A) Where the TSV is connected to the power lines on both the uppermost (MT) and the first (M1) metal layers, and (B) where the TSV is connected only to the topmost (MT) metal layer 542
Figure 18.21 Equivalent circuit of the current flow paths illustrated in Fig. 18.20. (A) The TSV locally distributes current, and (B) only stacks of metal vias supply current to the load 542
Figure 18.22 Voltage drop at the current source as a function of the current drawn by the power supply 543
Figure 18.23 Voltage drop as a function of distance of the current source from the TSV 544
Figure 18.24 Resistive grid to model a segment of a power distribution system. (A) In the uppermost (M6) metal layer, and (B) in the lowest (M1) metal layer 546
Figure 18.25 SPICE simulation of the voltage drop on the M1 grid for different nodes with (solid curves) and without (dashed curves) the TSV path. No stacked vias are removed (d=0) 547
Figure 18.26 SPICE simulation of the maximum voltage drop on the M1 grid by successively removing the stacked vias (i.e., increasing d) with (dashed curves) and without (solid curves) the TSV path 548
Figure 18.27 SPICE simulation of the voltage drop on the M1 grid for different nodes and with no stacked vias removed (d=0) with (solid curves) and without (dashed curves) the TSV path. Only three current sources switch 548
Figure 18.28 Nonuniform TSV tapering to address both power supply noise and temperature. (A) Opposite tapering is required to individually satisfy the power supply noise and temperature objectives, and (B) adapting the size of the TSVs across tiers to ensure that both objectives are satisfied 550
Figure 18.29 Power supply noise from employing one tier of decoupling capacitance. (A) A 2-D system, (B) a four tier system with no tier for the decoupling capacitance, (C) a decoupling capacitance tier close to the package, and (D) a decoupling capacitance tier on top of the 3-D system 552
Figure 18.30 Power supply noise from employing two tiers of decoupling capacitance. (A) A 2-D system, (B) one decoupling capacitance tier is placed next to the package and the second tier between tiers two and three, (C) one decoupling capacitance tier is placed on top of the stack and the second tier between tiers two and three, and (D) both decoupling capacitance tiers are placed on top of the stack 553
Figure 18.31 Reconfigurable decoupling capacitance topology where the decoupling capacitor is connected to the power rail even if the sleep transistors are switched off 554
Figure 18.32 Always on decoupling capacitance topology. The charge provided to the local circuit blocks flows through the sleep transistors 555
Figure 18.33 A daisy chain of buffers switches the sleep transistors on, subsequently ensuring that the current gradually increases, limiting the abrupt current changes within the power grid 556
Figure 18.34 Current flow within a three tier stack. Note the current flowing through the TSVs of each tier 558
Figure 18.35 Optimization framework for 3-D power distribution networks where both power supply noise and temperature constraints are considered. (A) Optimal sizing process for the middle tier(s) is initially determined, (B) the flowchart of the algorithm, and (C) step by step description of the algorithm 560
Figure 19.1 Power distribution network topologies. (A) interdigitated power network on all tiers with the 3-D vias distributing current on the periphery and through the middle of the circuit, (B) interdigitated power network on all tiers with the 3-D vias distributing current on the periphery, and (C) interdigitated power network on tiers 1 and 3 and power/ground planes on tier 2 with the 3-D vias distributing current on the periphery and through the middle of the circuit 567
Figure 19.2 Layout of the power distribution network test circuit 569
Figure 19.3 Layout of the test circuit containing three interdigitated power and ground networks and test circuits for generating and measuring noise. (A) Overlay of all three device planes, (B) power and ground networks of the bottom tier (tier 1), (C) power and ground networks of the middle tier (tier 2), and (D) power and ground networks of the top tier (tier 3) 570
Figure 19.4 Layout of the pattern sequence source for the noise generation circuits. (A) All three device planes, (B) noise generation circuits on the bottom tier (tier 1), (C) noise generation circuits on the middle tier (tier 2), and (D) noise generation circuits on the top tier (tier 3) 572
Figure 19.5 Pattern sequence source for the noise generation circuits. (A) Ring oscillator, (B) buffer used for the RO and PRNG, (C) 5-bit PRNG, (D) 6-bit PRNG, (E) 9-bit PRNG, and (F) 10-bit PRNG 574
Figure 19.6 Individual components in Figs. 19.5C–F with the corresponding transistor sizes. (A) Inverter, (B) AND gate, (C) OR gate, (D) XNOR gate, (E) 2-to-1 MUX, and (F) D flip-flop. 576
Figure 19.7 Layout of the noise generation circuits. (A) All three device planes, (B) noise generation circuits on the bottom tier (tier 1), (C) noise generation circuits on the middle tier (tier 2), and (D) noise generation circuits on the top tier (tier 3) 580
Figure 19.8 Schematic view of the (A) current mirror, and (B) switches that vary the total current through the current mirror 581
Figure 19.9 Layout of the power and ground noise detection circuits including the control circuit. (A) All three device planes, (B) power and ground sense circuits for the bottom tier (tier 1), (C) power and ground sense circuits for the middle tier (tier 2) and control circuit for all three tiers, and (D) power and ground sense circuits for the top tier (tier 3) 582
Figure 19.10 Rotating control logic to manage the RF output pads among the three device planes. The control signals to the RF pads are provided for both the power and ground detection signals for each device plane 586
Figure 19.11 Block and I/O pin diagram of the DC and RF pad layout. The numbered rectangles are DC pads providing power and ground, and DC bias points for the current mirrors, reset signals, and electrostatic discharge protection. The light colored squares and rectangles are RF pads used to calibrate the sense circuits (internal to the labeled blocks) and measure noise on the power/ground networks (external to the labeled blocks) 587
Figure 19.12 Microphotograph of the wire bonded test circuit 589
Figure 19.13 Block level schematic of noise generation and detection circuits 589
Figure 19.14 Source follower noise detection circuits detect noise on both the digital (A) power lines, and (B) ground lines 590
Figure 19.15 Fabricated test circuit examining noise propagation within three different power distribution networks, and a distributed DC-to-DC rectifier. (A) Microphotograph of the 3-D test circuit, and (B) an enlarged image of Block 1 591
Figure 19.16 S-parameter characterization of the power and ground noise detection circuits 592
Figure 19.17 Spectral analysis of the noise generated on the power line of Block 2, (A) board level decoupling capacitance, and (B) without board level decoupling capacitance. 593
Figure 19.18 Time domain measurement of the generated noise on the power line of Block 2 without board level decoupling capacitance for a voltage bias on the current mirrors of (A) 0 volts, (B) 0.5 volt, (C) 0.75 volts, and (D) 1 volt 594
Figure 19.19 Average noise voltage on the power and ground distribution networks with and without board level decoupling capacitance. (A) Average noise of power network without decoupling capacitance, (B) average noise of power network with decoupling capacitance, (C) average noise of ground network without decoupling capacitance, and (D) average noise of ground network with decoupling capacitance. A total of 4,096 data points are used to calculate the average noise for each topology at each current mirror bias voltage 595
Figure 19.20 Peak noise voltage on the power and ground distribution networks with and without board level decoupling capacitance. (A) Peak noise of power network without decoupling capacitance, (B) peak noise of power network with decoupling capacitance, (C) peak noise of ground network without decoupling capacitance, and (D) peak noise of ground network with decoupling capacitance. A single peak data point (from 4,096 points) is determined for each topology at each current mirror bias voltage 597
Figure 19.21 Equivalent electrical model of the cables, board, wirebonds, on-chip DC pads, power distribution networks, and TSVs 601
Figure 20.1 Taxonomy of 3-D architectures for wire limited circuits 606
Figure 20.2 Popular interconnection network topologies, (A) 3-D mesh, and (B) 2-D torus 607
Figure 20.3 Different partitioning levels and related design complexity vs the architectural granularity for 3-D microprocessors 608
Figure 20.4 An example of different partitions levels for a 3-D microprocessor system at the (A) core, (B) functional unit block (FUB), (C) macrocell, and (D) transistor levels 608
Figure 20.5 2-D organization of a cache memory with additional circuitry 611
Figure 20.6 2-D and 3-D organization of a 32 Kb cache memory array. Nspd is the number of sets connected to a word line 611
Figure 20.7 Word line partitioning onto two tiers of the 2-D cache memory shown in Fig. 20.5 612
Figure 20.8 Bit line partitioning onto two tiers of the 2-D cache memory shown in Fig. 20.5 613
Figure 20.9 Different organizations of a microprocessor system, (A) 2-D baseline system, (B) a second tier with 8 MB SRAM cache memory, (C) a second tier with 32 MB SRAM cache memory, and (D) a second tier with 64 MB DRAM cache memory 614
Figure 20.10 Several NoC topologies (not to scale), (A) 2-D IC–2-D NoC, (B) 2-D IC–3-D NoC, (C) 3-D IC–2-D NoC, and (D) 3-D IC–3-D NoC 617
Figure 20.11 Typical interconnect structure for intermediate metal layers 624
Figure 20.12 Zero-load latency for several network sizes. (A) APE=0.81 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm 626
Figure 20.13 Zero-load latency for various network sizes. (A) APE= 0.64 mm2 and ch= 192.5 fF/mm, (B) APE= 2.25 mm2 and ch= 192.5 fF/mm 627
Figure 20.14 Improvement in zero-load latency for different network sizes and PE areas (i.e., buss lengths). (A) 2-D IC–3-D NoC, and (B) 3-D IC–2-D NoC 628
Figure 20.15 Zero-load latency for various network sizes. (A) APE=1 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm 628
Figure 20.16 n3 and np values for minimum zero-load latency for various network sizes. (A) APE=1 mm2 and ch=332.6 fF/mm, and (B) APE=4 mm2 and ch=332.6 fF/mm 629
Figure 20.17 Power consumption with delay constraints for several network sizes. (A) APE=1 mm2, ch=332.6 fF/mm, and T0 = 500 ps, and (B) APE=4 mm2, ch=332.6 fF/mm, and T0 = 500 ps 630
Figure 20.18 Power consumption with delay constraints for several network sizes. (A) APE= 0.64 mm2, ch= 192.5 fF/mm, and T0=1000 ps, and (B) APE= 2.25 mm2, ch= 192.5 fF/mm, and T0=1000 ps 631
Figure 20.19 Power consumption with delay constraints for various network sizes. (A) APE= 1 mm2, ch= 332.6 fF/mm, and T0=500 ps, and (B) APE= 4 mm2, ch= 332.6 fF/mm, and T0=500 ps 632
Figure 20.20 An overview of the 3-D NoC simulator 633
Figure 20.21 Position of the vertical interconnection links for each tier within a 3-D NoC (each tier is a 6 × 6 mesh), (A) fully connected 3-D NoC, (B) uniform distribution of vertical links, (C) vertical links at the center of the NoC, and (D) vertical links at the periphery of the NoC 635
Figure 20.22 Effect of traffic load on the latency of a 2-D and 3-D torus NoC for each type of traffic and XYZ routing 638
Figure 20.23 Latency of 64 node 2-D and 3-D meshes and tori NoCs under uniform traffic, XYZ routing, and several traffic loads 639
Figure 20.24 Different performance metrics under uniform traffic and a normal traffic load of a 3-D NoC for alternative interconnection topologies with XYZ-OLD routing, (A) 64 network nodes, and (B) 144 network nodes 640
Figure 20.25 Several performance metrics under uniform traffic and a low traffic load of a 3-D NoC for alternative interconnection topologies with XYZ routing, (A) a 4×4×4 3-D mesh, and (B) a 6×6×4 3-D mesh 641
Figure 20.26 Typical FPGA architecture, (A) 2-D FPGA, (B) 2-D switch box, and (C) 3-D switch box. A routing track can connect three outgoing tracks in a 2-D SB, while in a 3-D SB, a routing track can connect five outgoing routing tracks 642
Figure 20.27 Interconnects that span more than one logic block. Li denotes the length of these interconnects and i is the number of LBs traversed by these wires 643
Figure 20.28 Interconnect delay for several number of physical tiers, (A) average length wires, and (B) die edge length interconnects 645
Figure 20.29 Power dissipated by 2-D and 3-D FPGAs 646
Figure C.1 Intertier interconnect consisting of m segments connecting two circuits located n tiers apart 658
Figure D.1 Portion of an interconnect tree 660
Figure E.1 Modeling spatial correlations using quad-tree partitioning 662
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset