Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

R

Race-to-halt, definition, 26

Rack units (U), WSC architecture, 441

Radio frequency amplifier, radio receiver, E-23

Radio receiver, components, E-23

Radio waves, wireless networks, E-21

Radix-2 multiplication/division, J-4 to J-7, J-6, J-55

Radix-4 multiplication/division, J-48 to J-49, J-49, J-56 to J-57, J-60 to J-61

Radix-8 multiplication, J-49

RAID (Redundant array of inexpensive disks)

data replication, 439

dependability benchmarks, D-21, D-22

disk array deconstruction case study, D-51, D-55

disk deconstruction case study, D-48

hardware dependability, D-15

historical background, L-79 to L-80

I/O subsystem design, D-59 to D-61

logical units, D-35

memory dependability, 104

NetApp FAS6000 filer, D-41 to D-42

overview, D-6 to D-8, D-7

performance prediction, D-57 to D-59

reconstruction case study, D-55 to D-57

row-diagonal parity, D-9

WSC storage, 442

RAID 0, definition, D-6

RAID 1

definition, D-6

historical background, L-79

RAID 2

definition, D-6

historical background, L-79

RAID 3

definition, D-7

historical background, L-79 to L-80

RAID 4

definition, D-7

historical background, L-79 to L-80

RAID 5

definition, D-8

historical background, L-79 to L-80

RAID 6

characteristics, D-8 to D-9

hardware dependability, D-15

RAID 10, D-8

RAM (random access memory), switch microarchitecture, F-57

RAMAC-350 (Random Access Method of Accounting Control), L-77 to L-78, L-80 to L-81

Random Access Method of Accounting Control, L-77 to L-78

Random replacement

cache misses, B-10

definition, B-9

Random variables, distribution, D-26 to D-34

RAR See Read after read (RAR)

RAS See Row access strobe (RAS)

RAW See Read after write (RAW)

Ray casting (RC)

GPU comparisons, 329

throughput computing kernel, 327

RDMA See Remote direct memory access (RDMA)

Read after read (RAR), absence of data hazard, 154

Read after write (RAW)

data hazards, 153

dynamic scheduling with Tomasulo’s algorithm, 170–171

first vector computers, L-45

hazards, stalls, C-55

hazards and forwarding, C-55 to C-57

instruction set complications, C-50

microarchitectural techniques case study, 253

MIPS FP pipeline performance, C-60 to C-61

MIPS pipeline control, C-37 to C-38

MIPS pipeline FP operations, C-53

MIPS scoreboarding, C-74

ROB, 192

TI TMS320C55 DSP, E-8

Tomasulo’s algorithm, 182

unoptimized code, C-81

Read miss

AMD Opteron data cache, B-14

cache coherence, 357, 358, 359–361

coherence extensions, 362

directory-based cache coherence protocol example, 380, 382–386

memory hierarchy basics, 76–77

memory stall clock cycles, B-4

miss penalty reduction, B-35 to B-36

Opteron data cache, B-14

vs. write-through, B-11

Read operands stage

ID pipe stage, 170

MIPS scoreboarding, C-74 to C-75

out-of-order execution, C-71

Realizable processors, ILP limitations, 216–220

Real memory, Virtual Machines, 110

Real-time constraints, definition, E-2

Real-time performance, PMDs, 6

Real-time performance requirement, definition, E-3

Real-time processing, embedded systems, E-3 to E-5

Rearrangeably nonblocking, centralized switched networks, F-32 to F-33

Receiving overhead

communication latency, I-3 to I-4

interconnection networks, F-88

OCNs vs. SANs, F-27

time of flight, F-14

RECN See Regional explicit congestion notification (RECN)

Reconfiguration deadlock, routing, F-44

Reconstruction, RAID, D-55 to D-57

Recovery time, vector processor, G-8

Recurrences

basic approach, H-11

loop-carried dependences, H-5

Red-black Gauss-Seidel, Ocean application, I-9 to I-10

Reduced Instruction Set Computer See RISC (Reduced Instruction Set Computer)

Reductions

commercial workloads, 371

cost trends, 28

loop-level parallelism dependences, 321

multiprogramming workloads, 377

T1 multithreading unicore performance, 227

WSCs, 438

Redundancy

Amdahl’s law, 48

chip fabrication cost case study, 61–62

computer system power consumption case study, 63–64

index checks, B-8

integrated circuit cost, 32

integrated circuit failure, 35

simple MIPS implementation, C-33

WSC, 433, 435, 439

WSC bottleneck, 461

WSC storage, 442

Redundant array of inexpensive disks See RAID (Redundant array of inexpensive disks)

Redundant multiplication, integers, J-48

Redundant power supplies, example calculations, 35

Reference bit

memory hierarchy, B-52

virtual memory block replacement, B-45

Regional explicit congestion notification (RECN), congestion management, F-66

MIPS, 12

VAX, K-67

compilers, 396, A-26 to A-29

VAX sort, K-76

VAX swap, K-72

MIPS data path, C-34

MIPS R4000, C-63

pipeline branches, C-41

simple MIPS implementation, C-31

simple RISC implementation, C-5 to C-6

data hazards, C-16, C-18, C-20

dynamic scheduling, 172, 173, 175, 177–178

Fermi GPU, 306

field, 176

hardware-based speculation, 184

longer latency pipelines, C-55 to C-57

MIPS exceptions, C-49

MIPS implementation, C-31, C-33

MIPS R4000, C-64

MIPS scoreboarding, C-75

Multimedia SIMD Extensions, 282, 285

multiple lanes, 272, 273

multithreading, 224

OCNs, F-3

precise exceptions, C-59

RISC classic pipeline, C-7 to C-8

RISC instruction set, C-5 to C-6

scoreboarding, C-73, C-75

speculation support, 208

structural hazards, C-13

Tomasulo’s algorithm, 180, 182

vector architecture, 264

VMIPS, 265, 308

architect-compiler writer relationship, A-30

dynamic scheduling, 171

Intel 80x86, K-52

ISA classification, 11, A-3 to A-6

dynamic scheduling, 169–172

hardware vs. software speculation, 222

ideal processor, 214

ILP hardware model, 214

ILP limitations, 213, 216–217

ILP for realizable processors, 216

instruction delivery and speculation, 202

microarchitectural techniques case study, 247–254

name dependences, 153

vs. ROB, 208–210

ROB instruction, 186

sample code, 250

SMT, 225

speculation, 208–210

superscalar code, 251

Tomasulo’s algorithm, 183

WAW/WAR hazards, 220

Registers

DSP examples, E-6

IA-64, H-33 to H-34

instructions and hazards, C-17

Intel 80x86, K-47 to K-49, K-48

network interface functions, F-7

pipe stages, C-35

PowerPC, K-10 to K-11

VAX swap, B-74 to B-75

Regularity

bidirectional MINs, F-33 to F-34

compiler writing-architecture relationship, A-30

Relative speedup, multiprocessor performance, 406

Relaxed consistency models

basic considerations, 394–395

compiler optimization, 396

WSC storage software, 439

Release consistency, relaxed consistency models, 395

Reliability

Amdahl’s law calculations, 56

commercial interconnection networks, F-66

example calculations, 48

I/O subsystem design, D-59 to D-61

modules, SLAs, 34

MTTF, 57

redundant power supplies, 34–35

storage systems, D-44

transistor scaling, 21

Relocation, virtual memory, B-42

Remainder, floating point, J-31 to J-32

Remington-Rand, L-5

Remote direct memory access (RDMA), InfiniBand, F-76

Remote node, directory-based cache coherence protocol basics, 381–382

Reorder buffer (ROB)

compiler-based speculation, H-31

dependent instructions, 199

dynamic scheduling, 175

FP unit with Tomasulo’s algorithm, 185

hardware-based speculation, 184–192

ILP exploitation, 199–200

ILP limitations, 216

Intel Core i7, 238

vs. register renaming, 208–210

Repeat interval, MIPS pipeline FP operations, C-52 to C-53

Replication

cache coherent multiprocessors, 354

centralized shared-memory architectures, 351–352

coherence enforcement, 354

R4000 performance, C-70

RAID storage servers, 439

TLP, 344

virtual memory, B-48 to B-49

WSCs, 438

Reply, messages, F-6

Reproducibility, performance results reporting, 41

Request

messages, F-6

switch microarchitecture, F-58

Requested protection level, segmented virtual memory, B-54

Request-level parallelism (RLP)

basic characteristics, 345

definition, 9

from ILP, 4–5

MIMD, 10

multicore processors, 400

multiprocessors, 345

parallelism advantages, 44

server benchmarks, 40

WSCs, 434, 436

Request phase, arbitration, F-49

Request-reply deadlock, routing, F-44

Reservation stations

dependent instructions, 199–200

dynamic scheduling, 178

example, 177

fields, 176

hardware-based speculation, 184, 186, 189–191

ILP exploitation, 197, 199–200

Intel Core i7, 238–240

loop iteration example, 181

microarchitectural techniques case study, 253–254

speculation, 208–209

Tomasulo’s algorithm, 172, 173, 174–176, 179, 180, 180–182

Resource allocation

computer design principles, 45

WSC case study, 478–479

Resource sparing, commercial interconnection networks, F-66

Response time See also Latency

I/O benchmarks, D-18

performance considerations, 36

performance trends, 18–19

producer-server model, D-16

server benchmarks, 40–41

storage systems, D-16 to D-18

vs. throughput, D-17

user experience, 4

WSCs, 450

Responsiveness

PMDs, 6

as server characteristic, 7

Restartable pipeline

definition, C-45

exceptions, C-46 to C-47

Restorations, SLA states, 34

Restoring division, J-5, J-6

Resume events

control dependences, 156

exceptions, C-45 to C-46

hardware-based speculation, 188

Return address predictors

instruction fetch bandwidth, 206–207

prediction accuracy, 207

Returns

Amdahl’s law, 47

cache coherence, 352–353

compiler technology and architectural decisions, A-28

control flow instructions, 14, A-17, A-21

hardware primitives, 388

Intel 80x86 integer operations, K-51

invocation options, A-19

procedure invocation options, A-19

return address predictors, 206

Reverse path, cell phones, E-24

RF See Register fetch (RF)

Rings

characteristics, F-73

NEWS communication, F-42

OCN history, F-104

process protection, B-50

topology, F-35 to F-36, F-36

Ripple-carry adder, J-3, J-3, J-42

chip comparison, J-60

Ripple-carry addition, J-2 to J-3

RISC (Reduced Instruction Set Computer)

addressing modes, K-5 to K-6

Alpha-unique instructions, K-27 to K-29

architecture flaws vs. success, A-45

ARM-unique instructions, K-36 to K-37

basic concept, C-4 to C-5

basic systems, K-3 to K-5

cache performance, B-6

classic pipeline stages, C-6 to C-10

code size, A-23 to A-24

compiler history, L-31

desktop/server systems, K-4

instruction formats, K-7

multimedia extensions, K-16 to K-19

desktop systems

addressing modes, K-5

arithmetic/logical instructions, K-11, K-22

conditional branches, K-17

constant extension, K-9

control instructions, K-12

conventions, K-13

data transfer instructions, K-10, K-21

features, K-44

FP instructions, K-13, K-23

multimedia extensions, K-18

development, 2

early pipelined CPUs, L-26

embedded systems, K-4

addressing modes, K-6

arithmetic/logical instructions, K-15, K-24

conditional branches, K-17

constant extension, K-9

control instructions, K-16

conventions, K-16

data transfers, K-14, K-23

DSP extensions, K-19

instruction formats, K-8

multiply-accumulate, K-20

historical background, L-19 to L-21

instruction formats, K-5 to K-6

instruction set lineage, K-43

ISA performance and efficiency prediction, 241

M32R-unique instructions, K-39 to K-40

MIPS16-unique instructions, K-40 to K-42

MIPS64-unique instructions, K-24 to K-27

MIPS core common extensions, K-19 to K-24

MIPS M2000 vs. VAX 8700, L-21

Multimedia SIMD Extensions history, L-49 to L-50

operations, 12

PA-RISC-unique, K-33 to K-35

pipelining efficiency, C-70

PowerPC-unique instructions, K-32 to K-33

Sanyo VPC-SX500 digital camera, E-19

simple implementation, C-5 to C-6

simple pipeline, C-7

SPARC-unique instructions, K-29 to K-32

Sun T1 multithreading, 226–227

SuperH-unique instructions, K-38 to K-39

Thumb-unique instructions, K-37 to K-38

vector processor history, G-26

Virtual Machines ISA support, 109

Virtual Machines and virtual memory and I/O, 110

RISC-I, L-19 to L-20

RISC-II, L-19 to L-20

RLP See Request-level parallelism (RLP)

ROB See Reorder buffer (ROB)

Roofline model

GPU performance, 326

memory bandwidth, 332

Multimedia SIMD Extensions, 285–288, 287

Round digit, J-18

Rounding modes, J-14, J-17 to J-19, J-18, J-20

FP precisions, J-34

fused multiply-add, J-33

Round-robin (RR)

arbitration, F-49

IBM 360, K-85 to K-86

InfiniBand, F-74

Routers

BARRNet, F-80

Ethernet, F-79

Routing algorithm

commercial interconnection networks, F-56

fault tolerance, F-67

implementation, F-57

Intel SCCC, F-70

interconnection networks, F-21 to F-22, F-27, F-44 to F-48

mesh network, F-46

network impact, F-52 to F-55

OCN history, F-104

and overhead, F-93 to F-94

SAN characteristics, F-76

switched-media networks, F-24

switch microarchitecture pipelining, F-61

system area network history, F-100

Row access strobe (RAS), DRAM, 98

Row-diagonal parity

example, D-9

RAID, D-9

Row major order, blocking, 89

RR See Round-robin (RR)

RS format instructions, IBM 360, K-87

Ruby on Rails, hardware impact on software development, 4

RX format instructions, IBM 360, K-86 to K-87

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Computer Architecture: A Quantitative Approach

Create new playlist

Sign In

Sign Up

R

Table of Contents for
Computer Architecture: A Quantitative Approach