Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

T

Tag

AMD Opteron data cache, B-12 to B-14

ARM Cortex-A8, 115

cache optimization, 79–80

dynamic scheduling, 177

invalidate protocols, 357

memory hierarchy basics, 74

memory hierarchy basics, 77–78

virtual memory fast address translation, B-46

write strategy, B-10

Tag check (TC)

MIPS R4000, C-63

R4000 pipeline, B-62 to B-63

R4000 pipeline structure, C-63

write process, B-10

Tag fields

block identification, B-8

dynamic scheduling, 173, 175

Tail duplication, superblock scheduling, H-21

Tailgating, definition, G-20

Tandem Computers

cluster history, L-62, L-72

faults, D-14

overview, D-12 to D-13

Target address

branch hazards, C-21, C-42

branch penalty reduction, C-22 to C-23

branch-target buffer, 206

control flow instructions, A-17 to A-18

GPU conditional branching, 301

Intel Core i7 branch predictor, 166

MIPS control flow instructions, A-38

MIPS implementation, C-32

MIPS pipeline, C-36, C-37

MIPS R4000, C-25

pipeline branches, C-39

RISC instruction set, C-5

Target channel adapters (TCAs), switch vs. NIC, F-86

Target instructions

branch delay slot scheduling, C-24

as branch-target buffer variation, 206

GPU conditional branching, 301

Task-level parallelism (TLP), definition, 9

TB See Translation buffer (TB)

TB-80 VME rack

example, D-38

MTTF calculation, D-40 to D-41

TC See Tag check (TC)

TCAs See Target channel adapters (TCAs)

TCO See Total Cost of Ownership (TCO)

TCP See Transmission Control Protocol (TCP)

TCP/IP See Transmission Control Protocol/Internet Protocol (TCP/IP)

TDMA See Time division multiple access (TDMA)

TDP See Thermal design power (TDP)

Technology trends

basic considerations, 17–18

performance, 18–19

Teleconferencing, multimedia support, K-17

Temporal locality

blocking, 89–90

cache optimization, B-26

coining of term, L-11

definition, 45, B-2

memory hierarchy design, 72

TERA processor, L-34

Terminate events

exceptions, C-45 to C-46

hardware-based speculation, 188

loop unrolling, 161

Tertiary Disk project

failure statistics, D-13

overview, D-12

system log, D-43

Test-and-set operation, synchronization, 388

Texas Instruments 8847

arithmetic functions, J-58 to J-61

chip comparison, J-58

chip layout, J-59

Texas Instruments ASC

first vector computers, L-44

peak performance vs. start-up overhead, 331

TFLOPS, parallel processing debates, L-57 to L-58

TFT See Thin-film transistor (TFT)

Thacker, Chuck, F-99

Thermal design power (TDP), power trends, 22

Thin-film transistor (TFT), Sanyo VPC-SX500 digital camera, E-19

Thinking Machines, L-44, L-56

Thinking Multiprocessors CM-5, L-60

Think time, transactions, D-16, D-17

Third-level caches See also L3 caches

ILP, 245

interconnection network, F-87

SRAM, 98–99

Thrash, memory hierarchy, B-25

Thread Block

CUDA Threads, 297, 300, 303

definition, 292, 313

Fermi GTX 480 GPU flooplan, 295

function, 294

GPU hardware levels, 296

GPU Memory performance, 332

GPU programming, 289–290

Grid mapping, 293

mapping example, 293

multithreaded SIMD Processor, 294

NVIDIA GPU computational structures, 291

NVIDIA GPU Memory structures, 304

PTX Instructions, 298

Thread Block Scheduler

definition, 292, 309, 313–314

Fermi GTX 480 GPU flooplan, 295

function, 294, 311

GPU, 296

Grid mapping, 293

multithreaded SIMD Processor, 294

Thread-level parallelism (TLP)

advanced directory protocol case study, 420–426

Amdahl’s law and parallel computers, 406–407

centralized shared-memory multiprocessors

basic considerations, 351–352

cache coherence, 352–353

cache coherence enforcement, 354–355

cache coherence example, 357–362

cache coherence extensions, 362–363

invalidate protocol implementation, 356–357

SMP and snooping limitations, 363–364

snooping coherence implementation, 365–366

snooping coherence protocols, 355–356

definition, 9

directory-based cache coherence

case study, 418–420

protocol basics, 380–382

protocol example, 382–386

DSM and directory-based coherence, 378–380

embedded systems, E-15

IBM Power7, 215

from ILP, 4–5

inclusion, 397–398

Intel Core i7 performance/energy efficiency, 401–405

memory consistency models

basic considerations, 392–393

compiler optimization, 396

programming viewpoint, 393–394

relaxed consistency models, 394–395

speculation to hide latency, 396–397

MIMDs, 344–345

multicore processor performance, 400–401

multicore processors and SMT, 404–405

multiprocessing/multithreading-based performance, 398–400

multiprocessor architecture, 346–348

multiprocessor cost effectiveness, 407

multiprocessor performance, 405–406

multiprocessor software development, 407–409

vs. multithreading, 223–224

multithreading history, L-34 to L-35

parallel processing challenges, 349–351

single-chip multicore processor case study, 412–418

Sun T1 multithreading, 226–229

symmetric shared-memory multiprocessor performance

commercial workload, 367–369

commercial workload measurement, 369–374

multiprogramming and OS workload, 374–378

overview, 366–367

synchronization

basic considerations, 386–387

basic hardware primitives, 387–389

locks via coherence, 389–391

Thread Processor

definition, 292, 314

GPU, 315

Thread Processor Registers, definition, 292

Thread Scheduler in a Multithreaded CPU, definition, 292

Thread of SIMD Instructions

characteristics, 295–296

CUDA Thread, 303

definition, 292, 313

Grid mapping, 293

lane recognition, 300

scheduling example, 297

terminology comparison, 314

vector/GPU comparison, 308–309

Thread of Vector Instructions, definition, 292

Three-dimensional space, direct networks, F-38

Three-level cache hierarchy

commercial workloads, 368

ILP, 245

Intel Core i7, 118, 118

Throttling, packets, F-10

Throughput See also Bandwidth

definition, C-3, F-13

disk storage, D-4

Google WSC, 470

ILP, 245

instruction fetch bandwidth, 202

Intel Core i7, 236–237

kernel characteristics, 327

memory banks, 276

multiple lanes, 271

parallelism, 44

performance considerations, 36

performance trends, 18–19

pipelining basics, C-10

precise exceptions, C-60

producer-server model, D-16

vs. response time, D-17

routing comparison, F-54

server benchmarks, 40–41

servers, 7

storage systems, D-16 to D-18

uniprocessors, TLP

basic considerations, 223–226

fine-grained multithreading on Sun T1, 226–229

superscalar SMT, 230–232

and virtual channels, F-93

WSCs, 434

Ticks

cache coherence, 391

processor performance equation, 48–49

Tilera TILE-Gx processors, OCNs, F-3

Time-cost relationship, components, 27–28

Time division multiple access (TDMA), cell phones, E-25

Time of flight

communication latency, I-3 to I-4

interconnection networks, F-13

Timing independent, L-17 to L-18

TI TMS320C6x DSP

architecture, E-9

characteristics, E-8 to E-10

instruction packet, E-10

TI TMS320C55 DSP

architecture, E-7

characteristics, E-7 to E-8

data operands, E-6

TLB See Translation lookaside buffer (TLB)

TLP See Task-level parallelism (TLP); Thread-level parallelism (TLP)

Tomasulo’s algorithm

advantages, 177–178

dynamic scheduling, 170–176

FP unit, 185

loop-based example, 179, 181–183

MIP FP unit, 173

step details, 178, 180

TOP500, L-58

Top Of Stack (TOS) register, ISA operands, A-4

Topology

Bensˆ networks, F-33

centralized switched networks, F-30 to F-34, F-31

definition, F-29

direct networks, F-37

distributed switched networks, F-34 to F-40

interconnection networks, F-21 to F-22, F-44

basic considerations, F-29 to F-30

fault tolerance, F-67

network performance and cost, F-40

network performance effects, F-40 to F-44

rings, F-36

routing/arbitration/switching impact, F-52

system area network history, F-100 to F-101

Torus networks

characteristics, F-36

commercial interconnection networks, F-63

direct networks, F-37

fault tolerance, F-67

IBM Blue Gene/L, F-72 to F-74

NEWS communication, F-43

routing comparison, F-54

system area network history, F-102

TOS See Top Of Stack (TOS) register

Total Cost of Ownership (TCO), WSC case study, 476–479

Total store ordering, relaxed consistency models, 395

Tournament predictors

early schemes, L-27 to L-28

ILP for realizable processors, 216

local/global predictor combinations, 164–166

Toy programs, performance benchmarks, 37

TP See Transaction-processing (TP)

TPC See Transaction Processing Council (TPC)

Trace compaction, basic process, H-19

Trace scheduling

basic approach, H-19 to H-21

overview, H-20

Trace selection, definition, H-19

Tradebeans benchmark, SMT on superscalar processors, 230

Traffic intensity, queuing theory, D-25

Trailer

messages, F-6

packet format, F-7

Transaction components, D-16, D-17, I-38 to I-39

Transaction-processing (TP)

server benchmarks, 41

storage system benchmarks, D-18 to D-19

Transaction Processing Council (TPC)

benchmarks overview, D-18 to D-19, D-19

parallelism, 44

performance results reporting, 41

server benchmarks, 41

TPC-B, shared-memory workloads, 368

TPC-C

file system benchmarking, D-20

IBM eServer p5 processor, 409

multiprocessing/multithreading-based performance, 398

multiprocessor cost effectiveness, 407

single vs. multiple thread executions, 228

Sun T1 multithreading unicore performance, 227–229, 229

WSC services, 441

TPC-D, shared-memory workloads, 368–369

TPC-E, shared-memory workloads, 368–369

Transfers See also Data transfers

as early control flow instruction definition, A-16

Transforms, DSP, E-5

Transient failure, commercial interconnection networks, F-66

Transient faults, storage systems, D-11

Transistors

clock rate considerations, 244

dependability, 33–36

energy and power, 23–26

ILP, 245

performance scaling, 19–21

processor comparisons, 324

processor trends, 2

RISC instructions, A-3

shrinking, 55

static power, 26

technology trends, 17–18

Translation buffer (TB)

virtual memory block identification, B-45

virtual memory fast address translation, B-46

Translation lookaside buffer (TLB)

address translation, B-39

AMD64 paged virtual memory, B-56 to B-57

ARM Cortex-A8, 114–115

cache optimization, 80, B-37

coining of term, L-9

Intel Core i7, 118, 120–121

interconnection network protection, F-86

memory hierarchy, B-48 to B-49

memory hierarchy basics, 78

MIPS64 instructions, K-27

Opteron, B-47

Opteron memory hierarchy, B-57

RISC code size, A-23

shared-memory workloads, 369–370

speculation advantages/disadvantages, 210–211

strided access interactions, 323

Virtual Machines, 110

virtual memory block identification, B-45

virtual memory fast address translation, B-46

virtual memory page size selection, B-47

virtual memory protection, 106–107

Transmission Control Protocol (TCP), congestion management, F-65

Transmission Control Protocol/Internet Protocol (TCP/IP)

ATM, F-79

headers, F-84

internetworking, F-81, F-83 to F-84, F-89

reliance on, F-95

WAN history, F-98

Transmission speed, interconnection network performance, F-13

Transmission time

communication latency, I-3 to I-4

time of flight, F-13 to F-14

Transport latency

time of flight, F-14

topology, F-35 to F-36

Transport layer, definition, F-82

Transputer, F-100

Tree-based barrier, large-scale multiprocessor synchronization, I-19

Tree height reduction, definition, H-11

Trees, MINs with nonblocking, F-34

Trellis codes, definition, E-7

TRIPS Edge processor, F-63

characteristics, F-73

Trojan horses

definition, B-51

segmented virtual memory, B-53

True dependence

finding, H-7 to H-8

loop-level parallelism calculations, 320

vs. name dependence, 153

True sharing misses

commercial workloads, 371, 373

definition, 366–367

multiprogramming workloads, 377

True speedup, multiprocessor performance, 406

TSMC, Stratton, F-3

TSS operating system, L-9

Turbo mode

hardware enhancements, 56

microprocessors, 26

Turing, Alan, L-4, L-19

Turn Model routing algorithm, example calculations, F-47 to F-48

Two-level branch predictors

branch costs, 163

Intel Core i7, 166

tournament predictors, 165

Two-level cache hierarchy

cache optimization, B-31

ILP, 245

Two’s complement, J-7 to J-8

Two-way conflict misses, definition, B-23

Two-way set associativity

ARM Cortex-A8, 233

cache block placement, B-7, B-8

cache miss rates, B-24

cache miss rates vs. size, B-33

cache optimization, B-38

cache organization calculations, B-19 to B-20

commercial workload, 370–373, 371

multiprogramming workload, 374–375

nonblocking cache, 84

Opteron data cache, B-13 to B-14

2:1 cache rule of thumb, B-29

virtual to cache access scenario, B-39

TX-2, L-34, L-49

“Typical” program, instruction set considerations, A-43

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Computer Architecture: A Quantitative Approach

Create new playlist

Sign In

Sign Up

T

Table of Contents for
Computer Architecture: A Quantitative Approach