Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Index

Page references in bold represent figures and tables.

Numbers

2:1 cache rule of thumb, definition, B-29

80x86 See Intel 80x86 processors

A

ABC (Atanasoff Berry Computer), L-5

ABI See Application binary interface (ABI)

Absolute addressing mode, Intel 80x86, K-47

Accelerated Strategic Computing Initiative (ASCI)

ASCI Red, F-100

ASCI White, F-67, F-100

system area network history, F-101

Access 1/Access 2 stages, TI 320C55 DSP, E-7

Access bit

IA-32 descriptor table, B-52

Access time See also Average Memory Access Time (AMAT)

vs. block size, B-28

distributed-memory multiprocessor, 348

DRAM/magnetic disk, D-3

memory hierarchy basics, 77

miss penalties, 218, B-42

NUMA, 348

paging, B-43

shared-memory multiprocessor, 347, 363

slowdown causes, B-3

TLP workloads, 369–370

during write, B-45

WSC memory hierarchy, 444

Access time gap, disk storage, D-3

ACID See Atomicity-consistency-isolation-durability (ACID)

Acknowledgment, packets, F-16

ACM See Association of Computing Machinery (ACM)

ACS project, L-28 to L-29

Active low power modes, WSCs, 472

Ada language, integer division/remainder, J-12

Adaptive routing

definition, F-47

vs. deterministic routing, F-52 to F-55, F-54

network fault tolerance, F-94

and overhead, F-93 to F-94

Adders

carry-lookahead, J-37 to J-41

chip comparison, J-60

full, J-2, J-3

half, J-2

integer division speedup, J-54 to J-58

integer multiplication speedup

even/odd array, J-52

many adders, J-50, J-50 to J-54

multipass array multiplier, J-51

signed-digit addition table, J-54

single adder, J-47 to J-49, J-48 to J-49

Wallace tree, J-53

radix-2 division, J-55

radix-4 division, J-56

radix-4 SRT division, J-57

ripple-carry, J-3, J-3

time/space requirements, J-44

Addition operations

chip comparison, J-61

floating point

denormals, J-26 to J-27

overview, J-21 to J-25

rules, J-24

speedup, J-25 to J-26

integer, speedup

carry-lookahead, J-37 to J-41

carry-lookahead circuit, J-38

carry-lookahead tree, J-40

carry-lookahead tree adder, J-41

carry-select adder, J-43, J-43 to J-44, J-44

carry-skip adder, J-41 to J43, J-42

overview, J-37

ripply-carry addition, J-3

Address aliasing prediction

definition, 213

ideal processor, 214

ILP for realizable processors, 216

Address Coalescing Unit

function, 310

gather-scatter, 329

GPUs, 300

Multithreaded SIMD Processor block diagram, 294

vector processor, 310

Address fault, virtual memory definition, B-42

Addressing modes

comparison, A-11

compiler writing-architecture relationship, A-30

control flow instructions, A-17 to A-18

desktop architectures, K-5

displacement mode, A-10

embedded architectures, K-6

instruction set encoding, A-21

Intel 80x86, K-47 to K-49, K-58 to K-59, K-59 to K-60

Intel 80x86 operands, K-59

ISA, 11–12, A-9 to A-10

MIPS data transfers, A-34

RISC architectures, K-5 to K-6

selection, A-9

VAX, K-66 to K-68, K-71

VAX instruction encoding, K-68 to K-69

Address offset, virtual memory, B-56

Address space

Fermi GPU architecture, 306–307

memory hierarchy, B-48 to B-49, B-57 to B-58

Multimedia SIMD vs. GPUs, 312

SMP/DSM shared memory, 348

virtual memory, B-40 to B-41

Address specifier

instruction set encoding, A-21

VAX instruction encoding, K-68 to K-69

Address stage, TI 320C55 DSP, E-7

Address trace, cache performance, B-4

Address translation

AMD64 paged virtual memory, B-55 to B-56

during indexing, B-36 to B-40

memory hierarchy basics, 77–78

Opteron data TLB, B-47

virtual memory, B-46

virtual memory definition, B-42

virtual memory protection, 106

Administrative costs, WSC vs. datacenters, 455

Adobe Photoshop, multimedia support, K-17

Advanced directory protocol

basic function, 283

case studies, 420–426

Advanced load address table (ALAT)

IA-64 ISA, H-40

vector sparse matrices, G-13

Advanced loads, IA-64 ISA, H-40

Advanced mobile phone service (AMPS), cell phones, E-25

Advanced Research Project Agency See ARPA (Advanced Research Project Agency)

Advanced RISC Machine See ARM (Advanced RISC Machine)

Advanced Simulation and Computing (ASC) program, system area network history, F-101

Advanced Switching Interconnect (ASI), storage area network history, F-103

Advanced Switching SAN, F-67

Advanced Technology Attachment disks See ATA (Advanced Technology Attachment) disks

Advanced Vector Extensions (AVX)

double-precision FP programs, 284

vs. vector architectures, 282

Affine, loop-level parallelism dependences, 318–320, H-6

After rounding rule, J-36

Aggregate bandwidth

definition, F-13

effective bandwidth calculations, F-18 to F-19

interconnection networks, F-89

routing, F-47

shared- vs. switched-media networks, F-22, F-24 to F-25

switched-media networks, F-24

switch microarchitecture, F-56

Aiken, Howard, L-3 to L-4

Airflow

containers, 466

Google WSC server, 467

Airside econimization, WSC cooling systems, 449

Akamai, as Content Delivery Network, 460

ALAT See Advanced load address table (ALAT)

Alewife machine, L-61

ALGOL, L-16

Aliased variables, and compiler technology, A-27 to A-28

Aliases, address translation, B-38

Alignment, memory address interpretation, A-7 to A-8, A-8

Allen, Fran, L-28

Alliant processors, vector processor history, G-26

AltaVista search

cluster history, L-62, L-73

shared-memory workloads, 369, 370

ALUs See Arithmetic-logical units (ALUs)

AMAT See Average Memory Access Time (AMAT)

Amazon

cloud computing, 455

Dynamo, 438, 452

Amazon Elastic Computer Cloud (EC2), 456–457

MapReduce cost calculations, 458–459

price and characteristics, 458

utility computing, L-74

Amazon Simple Storage Service (S3), 456–457

Amazon Web Services (AWS)

cloud computing providers, 471–472

MapReduce cost calculations, 458–460, 459

as utility computing, 456–461

WSC cost-performance, 474

Xen VM, 111

Amdahl, Gene, L-28

Amdahl’s law

computer design principles, 46–48

computer system power consumption case study, 63–64

DRAM, 99

and parallel computers, 406–407

parallel processing calculations, 349–350

pitfalls, 55–56

vs. processor performance equation, 51

scalar performance, 331

software overhead, F-91

VMIPS on Linpack, G-18

WSC processor cost-performance, 472–473

AMD Athlon 64, Itanium 2 comparison, H-43

AMD Barcelona microprocessor, Google WSC server, 467

AMD Fusion, L-52

AMD K-5, L-30

AMD Opteron

address translation, B-38

Amazon Web Services, 457

architecture, 15

cache coherence, 361

data cache example, B-12 to B-15, B-13

Google WSC servers, 468–469

inclusion, 398

manufacturing cost, 62

misses per instruction, B-15

MOESI protocol, 362

multicore processor performance, 400–401

multilevel exclusion, B-35

NetApp FAS6000 filer, D-42

paged virtual memory example, B-54 to B-57

vs. Pentium protection, B-57

real-world server considerations, 52–55

server energy savings, 25

snooping limitations, 363–364

SPEC benchmarks, 43

TLB during address translation, B-47

AMD processors

architecture flaws vs. success, A-45

GPU computing history, L-52

power consumption, F-85

recent advances, L-33

RISC history, L-22

shared-memory multiprogramming workload, 378

terminology, 313–315

tournament predictors, 164

Virtual Machines, 110

VMMs, 129

Amortization of overhead, sorting case study, D-64 to D-67

AMPS See Advanced mobile phone service (AMPS)

Andreessen, Marc, F-98

Android OS, 324

Annulling delayed branch, instructions, K-25

Antenna, radio receiver, E-23

Antialiasing, address translation, B-38

Antidependences

compiler history, L-30 to L-31

definition, 152

finding, H-7 to H-8

loop-level parallelism calculations, 320

MIPS scoreboarding, C-72, C-79

Apogee Software, A-44

Apollo DN 10000, L-30

Apple iPad

ARM Cortex-A8, 114

memory hierarchy basics, 78

Application binary interface (ABI), control flow instructions, A-20

Application layer, definition, F-82

Applied Minds, L-74

Arbitration algorithm

collision detection, F-23

commercial interconnection networks, F-56

examples, F-49

Intel SCCC, F-70

interconnection networks, F-21 to F-22, F-27, F-49 to F-50

network impact, F-52 to F-55

SAN characteristics, F-76

switched-media networks, F-24

switch microarchitecture, F-57 to F-58

switch microarchitecture pipelining, F-60

system area network history, F-100

Architect-compiler writer relationship, A-29 to A-30

Architecturally visible registers, register renaming vs. ROB, 208–209

Architectural Support for Compilers and Operating Systems (ASPLOS), L-11

Architecture See also Computer architecture See also CUDA (Compute Unified Device Architecture) See also Instruction set architecture (ISA) See also Vector architectures

compiler writer-architect relationship, A-29 to A-30

definition, 15

heterogeneous, 262

microarchitecture, 15–16, 247–254

stack, A-3, A-27, A-44 to A-45

Areal density, disk storage, D-2

Argument pointer, VAX, K-71

Arithmetic intensity

as FP operation, 286, 286–288

Roofline model, 326, 326–327

Arithmetic/logical instructions

desktop RISCs, K-11, K-22

embedded RISCs, K-15, K-24

Intel 80x86, K-49, K-53

SPARC, K-31

VAX, B-73

Arithmetic-logical units (ALUs)

ARM Cortex-A8, 234, 236

basic MIPS pipeline, C-36

branch condition evaluation, A-19

data forwarding, C-40 to C-41

data hazards requiring stalls, C-19 to C-20

data hazard stall minimization, C-17 to C-19

DSP media extensions, E-10

effective address cycle, C-6

hardware-based execution, 185

hardware-based speculation, 200–201, 201

IA-64 instructions, H-35

immediate operands, A-12

integer division, J-54

integer multiplication, J-48

integer shifting over zeros, J-45 to J-46

Intel Core i7, 238

ISA operands, A-4 to A-5

ISA performance and efficiency prediction, 241

load interlocks, C-39

microarchitectural techniques case study, 253

MIPS operations, A-35, A-37

MIPS pipeline control, C-38 to C-39

MIPS pipeline FP operations, C-52 to C-53

MIPS R4000, C-65

operand forwarding, C-19

operands per instruction example, A-6

parallelism, 45

pipeline branch issues, C-39 to C-41

pipeline execution rate, C-10 to C-11

power/DLP issues, 322

RISC architectures, K-5

RISC classic pipeline, C-7

RISC instruction set, C-4

simple MIPS implementation, C-31 to C-33

TX-2, L-49

ARM (Advanced RISC Machine)

addressing modes, K-5, K-6

arithmetic/logical instructions, K-15, K-24

characteristics, K-4

condition codes, K-12 to K-13

constant extension, K-9

control flow instructions, 14

data transfer instructions, K-23

embedded instruction format, K-8

GPU computing history, L-52

ISA class, 11

memory addressing, 11

multiply-accumulate, K-20

operands, 12

RISC instruction set lineage, K-43

unique instructions, K-36 to K-37

ARM AMBA, OCNs, F-3

ARM Cortex-A8

dynamic scheduling, 170

ILP concepts, 148

instruction decode, 234

ISA performance and efficiency prediction, 241–243

memory access penalty, 117

memory hierarchy design, 78, 114–117, 115

memory performance, 115–117

multibanked caches, 86

overview, 233

pipeline performance, 233–236, 235

pipeline structure, 232

processor comparison, 242

way prediction, 81

ARM Cortex-A9

vs. A8 performance, 236

Tegra 2, mobile vs. server GPUs, 323–324, 324

ARM Thumb

addressing modes, K-6

arithmetic/logical instructions, K-24

characteristics, K-4

condition codes, K-14

constant extension, K-9

data transfer instructions, K-23

embedded instruction format, K-8

ISAs, 14

multiply-accumulate, K-20

RISC code size, A-23

unique instructions, K-37 to K-38

ARPA (Advanced Research Project Agency)

LAN history, F-99 to F-100

WAN history, F-97

ARPANET, WAN history, F-97 to F-98

Array multiplier

example, J-50

integers, J-50

multipass system, J-51

Arrays

access age, 91

blocking, 89–90

bubble sort procedure, K-76

cluster server outage/anomaly statistics, 435

examples, 90

FFT kernel, I-7

Google WSC servers, 469

Layer 3 network linkage, 445

loop interchange, 88–89

loop-level parallelism dependences, 318–319

ocean application, I-9 to I-10

recurrences, H-12

WSC memory hierarchy, 445

WSCs, 443

Array switch, WSCs, 443–444

ASC See Advanced Simulation and Computing (ASC) program

ASCI See Accelerated Strategic Computing Initiative (ASCI)

ASCII character format, 12, A-14

ASC Purple, F-67, F-100

ASI See Advanced Switching Interconnect (ASI)

ASPLOS See Architectural Support for Compilers and Operating Systems (ASPLOS)

Assembly language, 2

Association of Computing Machinery (ACM), L-3

Associativity See also Set associativity

cache block, B-9 to B-10, B-10

cache optimization, B-22 to B-24, B-26, B-28 to B-30

cloud computing, 460–461

loop-level parallelism, 322

multilevel inclusion, 398

Opteron data cache, B-14

shared-memory multiprocessors, 368

Astronautics ZS-1, L-29

Asynchronous events, exception requirements, C-44 to C-45

Asynchronous I/O, storage systems, D-35

Asynchronous Transfer Mode (ATM)

interconnection networks, F-89

LAN history, F-99

packet format, F-75

total time statistics, F-90

VOQs, F-60

as WAN, F-79

WAN history, F-98

WANs, F-4

ATA (Advanced Technology Attachment) disks

Berkeley’s Tertiary Disk project, D-12

disk storage, D-4

historical background, L-81

power, D-5

RAID 6, D-9

server energy savings, 25

Atanasoff, John, L-5

Atanasoff Berry Computer (ABC), L-5

ATI Radeon 9700, L-51

Atlas computer, L-9

ATM See Asynchronous Transfer Mode (ATM)

ATM systems

server benchmarks, 41

TP benchmarks, D-18

Atomic exchange

lock implementation, 389–390

synchronization, 387–388

Atomic instructions

barrier synchronization, I-14

Core i7, 329

Fermi GPU, 308

T1 multithreading unicore performance, 229

Atomicity-consistency-isolation-durability (ACID), vs. WSC storage, 439

Atomic operations

cache coherence, 360–361

snooping cache coherence implementation, 365

“Atomic swap,” definition, K-20

Attributes field, IA-32 descriptor table, B-52

Autoincrement deferred addressing, VAX, K-67

Autonet, F-48

Availability

commercial interconnection networks, F-66

computer architecture, 11, 15

computer systems, D-43 to D-44, D-44

data on Internet, 344

fault detection, 57–58

I/O system design/evaluation, D-36

loop-level parallelism, 217–218

mainstream computing classes, 5

modules, 34

open-source software, 457

RAID systems, 60

as server characteristic, 7

servers, 16

source operands, C-74

WSCs, 8, 433–435, 438–439

Average instruction execution time, L-6

Average Memory Access Time (AMAT)

block size calculations, B-26 to B-28

cache optimizations, B-22, B-26 to B-32, B-36

cache performance, B-16 to B-21

calculation, B-16 to B-17

centralized shared-memory architectures, 351–352

definition, B-30 to B-31

memory hierarchy basics, 75–76

miss penalty reduction, B-32

via miss rates, B-29, B-29 to B-30

as processor performance predictor, B-17 to B-20

Average reception factor

centralized switched networks, F-32

multi-device interconnection networks, F-26

AVX See Advanced Vector Extensions (AVX)

AWS See Amazon Web Services (AWS)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Numbers

A

Table of Contents for
Index