R

Race-to-halt, definition, 26
Rack units (U), WSC architecture, 441
Radio frequency amplifier, radio receiver, E-23
Radio receiver, components, E-23
Radio waves, wireless networks, E-21
Radix-2 multiplication/division, J-4 to J-7, J-6, J-55
Radix-4 multiplication/division, J-48 to J-49, J-49, J-56 to J-57, J-60 to J-61
Radix-8 multiplication, J-49
RAID (Redundant array of inexpensive disks)
data replication, 439
dependability benchmarks, D-21, D-22
disk array deconstruction case study, D-51, D-55
disk deconstruction case study, D-48
hardware dependability, D-15
historical background, L-79 to L-80
I/O subsystem design, D-59 to D-61
logical units, D-35
memory dependability, 104
NetApp FAS6000 filer, D-41 to D-42
overview, D-6 to D-8, D-7
performance prediction, D-57 to D-59
reconstruction case study, D-55 to D-57
row-diagonal parity, D-9
WSC storage, 442
RAID 0, definition, D-6
RAID 1
definition, D-6
historical background, L-79
RAID 2
definition, D-6
historical background, L-79
RAID 3
definition, D-7
historical background, L-79 to L-80
RAID 4
definition, D-7
historical background, L-79 to L-80
RAID 5
definition, D-8
historical background, L-79 to L-80
RAID 6
characteristics, D-8 to D-9
hardware dependability, D-15
RAID 10, D-8
RAM (random access memory), switch microarchitecture, F-57
RAMAC-350 (Random Access Method of Accounting Control), L-77 to L-78, L-80 to L-81
Random Access Method of Accounting Control, L-77 to L-78
Random replacement
cache misses, B-10
definition, B-9
Random variables, distribution, D-26 to D-34
Ray casting (RC)
GPU comparisons, 329
throughput computing kernel, 327
Read after read (RAR), absence of data hazard, 154
Read after write (RAW)
data hazards, 153
dynamic scheduling with Tomasulo’s algorithm, 170–171
first vector computers, L-45
hazards, stalls, C-55
hazards and forwarding, C-55 to C-57
instruction set complications, C-50
microarchitectural techniques case study, 253
MIPS FP pipeline performance, C-60 to C-61
MIPS pipeline control, C-37 to C-38
MIPS pipeline FP operations, C-53
MIPS scoreboarding, C-74
ROB, 192
TI TMS320C55 DSP, E-8
Tomasulo’s algorithm, 182
unoptimized code, C-81
Read miss
AMD Opteron data cache, B-14
cache coherence, 357, 358, 359–361
coherence extensions, 362
directory-based cache coherence protocol example, 380, 382–386
memory hierarchy basics, 76–77
memory stall clock cycles, B-4
miss penalty reduction, B-35 to B-36
Opteron data cache, B-14
vs. write-through, B-11
Read operands stage
ID pipe stage, 170
MIPS scoreboarding, C-74 to C-75
out-of-order execution, C-71
Realizable processors, ILP limitations, 216–220
Real memory, Virtual Machines, 110
Real-time constraints, definition, E-2
Real-time performance, PMDs, 6
Real-time performance requirement, definition, E-3
Real-time processing, embedded systems, E-3 to E-5
Rearrangeably nonblocking, centralized switched networks, F-32 to F-33
Receiving overhead
communication latency, I-3 to I-4
interconnection networks, F-88
OCNs vs. SANs, F-27
time of flight, F-14
Reconfiguration deadlock, routing, F-44
Reconstruction, RAID, D-55 to D-57
Recovery time, vector processor, G-8
Recurrences
basic approach, H-11
loop-carried dependences, H-5
Red-black Gauss-Seidel, Ocean application, I-9 to I-10
Reduced Instruction Set Computer See RISC (Reduced Instruction Set Computer)
Reductions
commercial workloads, 371
cost trends, 28
loop-level parallelism dependences, 321
multiprogramming workloads, 377
T1 multithreading unicore performance, 227
WSCs, 438
Redundancy
Amdahl’s law, 48
chip fabrication cost case study, 61–62
computer system power consumption case study, 63–64
index checks, B-8
integrated circuit cost, 32
integrated circuit failure, 35
simple MIPS implementation, C-33
WSC, 433, 435, 439
WSC bottleneck, 461
WSC storage, 442
Redundant array of inexpensive disks See RAID (Redundant array of inexpensive disks)
Redundant multiplication, integers, J-48
Redundant power supplies, example calculations, 35
Reference bit
memory hierarchy, B-52
virtual memory block replacement, B-45
Regional explicit congestion notification (RECN), congestion management, F-66
Register addressing mode
MIPS, 12
VAX, K-67
Register allocation
compilers, 396, A-26 to A-29
VAX sort, K-76
VAX swap, K-72
Register deferred addressing, VAX, K-67
Register definition, 314
Register fetch (RF)
MIPS data path, C-34
MIPS R4000, C-63
pipeline branches, C-41
simple MIPS implementation, C-31
simple RISC implementation, C-5 to C-6
Register file
data hazards, C-16, C-18, C-20
dynamic scheduling, 172, 173, 175, 177–178
Fermi GPU, 306
field, 176
hardware-based speculation, 184
longer latency pipelines, C-55 to C-57
MIPS exceptions, C-49
MIPS implementation, C-31, C-33
MIPS R4000, C-64
MIPS scoreboarding, C-75
Multimedia SIMD Extensions, 282, 285
multiple lanes, 272, 273
multithreading, 224
OCNs, F-3
precise exceptions, C-59
RISC classic pipeline, C-7 to C-8
RISC instruction set, C-5 to C-6
scoreboarding, C-73, C-75
speculation support, 208
structural hazards, C-13
Tomasulo’s algorithm, 180, 182
vector architecture, 264
VMIPS, 265, 308
Register indirect addressing mode, Intel 80x86, K-47
Register management, software-pipelined loops, H-14
Register-memory instruction set architecture
architect-compiler writer relationship, A-30
dynamic scheduling, 171
Intel 80x86, K-52
ISA classification, 11, A-3 to A-6
Register prefetch, cache optimization, 92
Register renaming
dynamic scheduling, 169–172
hardware vs. software speculation, 222
ideal processor, 214
ILP hardware model, 214
ILP limitations, 213, 216–217
ILP for realizable processors, 216
instruction delivery and speculation, 202
microarchitectural techniques case study, 247–254
name dependences, 153
vs. ROB, 208–210
ROB instruction, 186
sample code, 250
SMT, 225
speculation, 208–210
superscalar code, 251
Tomasulo’s algorithm, 183
WAW/WAR hazards, 220
Register result status, MIPS scoreboard, C-76
Registers
DSP examples, E-6
IA-64, H-33 to H-34
instructions and hazards, C-17
Intel 80x86, K-47 to K-49, K-48
network interface functions, F-7
pipe stages, C-35
PowerPC, K-10 to K-11
VAX swap, B-74 to B-75
Register stack engine, IA-64, H-34
Register tag example, 177
Register windows, SPARC instructions, K-29 to K-30
Regularity
bidirectional MINs, F-33 to F-34
compiler writing-architecture relationship, A-30
Relative speedup, multiprocessor performance, 406
Relaxed consistency models
basic considerations, 394–395
compiler optimization, 396
WSC storage software, 439
Release consistency, relaxed consistency models, 395
Reliability
Amdahl’s law calculations, 56
commercial interconnection networks, F-66
example calculations, 48
I/O subsystem design, D-59 to D-61
modules, SLAs, 34
MTTF, 57
redundant power supplies, 34–35
storage systems, D-44
transistor scaling, 21
Relocation, virtual memory, B-42
Remainder, floating point, J-31 to J-32
Remington-Rand, L-5
Remote direct memory access (RDMA), InfiniBand, F-76
Remote node, directory-based cache coherence protocol basics, 381–382
Reorder buffer (ROB)
compiler-based speculation, H-31
dependent instructions, 199
dynamic scheduling, 175
FP unit with Tomasulo’s algorithm, 185
hardware-based speculation, 184–192
ILP exploitation, 199–200
ILP limitations, 216
Intel Core i7, 238
vs. register renaming, 208–210
Repeat interval, MIPS pipeline FP operations, C-52 to C-53
Replication
cache coherent multiprocessors, 354
centralized shared-memory architectures, 351–352
coherence enforcement, 354
R4000 performance, C-70
RAID storage servers, 439
TLP, 344
virtual memory, B-48 to B-49
WSCs, 438
Reply, messages, F-6
Reproducibility, performance results reporting, 41
Request
messages, F-6
switch microarchitecture, F-58
Requested protection level, segmented virtual memory, B-54
Request-level parallelism (RLP)
basic characteristics, 345
definition, 9
from ILP, 4–5
MIMD, 10
multicore processors, 400
multiprocessors, 345
parallelism advantages, 44
server benchmarks, 40
WSCs, 434, 436
Request phase, arbitration, F-49
Request-reply deadlock, routing, F-44
Reservation stations
dependent instructions, 199–200
dynamic scheduling, 178
example, 177
fields, 176
hardware-based speculation, 184, 186, 189–191
ILP exploitation, 197, 199–200
Intel Core i7, 238–240
loop iteration example, 181
microarchitectural techniques case study, 253–254
speculation, 208–209
Tomasulo’s algorithm, 172, 173, 174–176, 179, 180, 180–182
Resource allocation
computer design principles, 45
WSC case study, 478–479
Resource sparing, commercial interconnection networks, F-66
Response time See also Latency
I/O benchmarks, D-18
performance considerations, 36
performance trends, 18–19
producer-server model, D-16
server benchmarks, 40–41
storage systems, D-16 to D-18
vs. throughput, D-17
user experience, 4
WSCs, 450
Responsiveness
PMDs, 6
as server characteristic, 7
Restartable pipeline
definition, C-45
exceptions, C-46 to C-47
Restorations, SLA states, 34
Restoring division, J-5, J-6
Resume events
control dependences, 156
exceptions, C-45 to C-46
hardware-based speculation, 188
Return address predictors
instruction fetch bandwidth, 206–207
prediction accuracy, 207
Returns
Amdahl’s law, 47
cache coherence, 352–353
compiler technology and architectural decisions, A-28
control flow instructions, 14, A-17, A-21
hardware primitives, 388
Intel 80x86 integer operations, K-51
invocation options, A-19
procedure invocation options, A-19
return address predictors, 206
Reverse path, cell phones, E-24
Rings
characteristics, F-73
NEWS communication, F-42
OCN history, F-104
process protection, B-50
topology, F-35 to F-36, F-36
Ripple-carry adder, J-3, J-3, J-42
chip comparison, J-60
Ripple-carry addition, J-2 to J-3
RISC (Reduced Instruction Set Computer)
addressing modes, K-5 to K-6
Alpha-unique instructions, K-27 to K-29
architecture flaws vs. success, A-45
ARM-unique instructions, K-36 to K-37
basic concept, C-4 to C-5
basic systems, K-3 to K-5
cache performance, B-6
classic pipeline stages, C-6 to C-10
code size, A-23 to A-24
compiler history, L-31
desktop/server systems, K-4
instruction formats, K-7
multimedia extensions, K-16 to K-19
desktop systems
addressing modes, K-5
arithmetic/logical instructions, K-11, K-22
conditional branches, K-17
constant extension, K-9
control instructions, K-12
conventions, K-13
data transfer instructions, K-10, K-21
features, K-44
FP instructions, K-13, K-23
multimedia extensions, K-18
development, 2
early pipelined CPUs, L-26
embedded systems, K-4
addressing modes, K-6
arithmetic/logical instructions, K-15, K-24
conditional branches, K-17
constant extension, K-9
control instructions, K-16
conventions, K-16
data transfers, K-14, K-23
DSP extensions, K-19
instruction formats, K-8
multiply-accumulate, K-20
historical background, L-19 to L-21
instruction formats, K-5 to K-6
instruction set lineage, K-43
ISA performance and efficiency prediction, 241
M32R-unique instructions, K-39 to K-40
MIPS16-unique instructions, K-40 to K-42
MIPS64-unique instructions, K-24 to K-27
MIPS core common extensions, K-19 to K-24
MIPS M2000 vs. VAX 8700, L-21
Multimedia SIMD Extensions history, L-49 to L-50
operations, 12
PA-RISC-unique, K-33 to K-35
pipelining efficiency, C-70
PowerPC-unique instructions, K-32 to K-33
Sanyo VPC-SX500 digital camera, E-19
simple implementation, C-5 to C-6
simple pipeline, C-7
SPARC-unique instructions, K-29 to K-32
Sun T1 multithreading, 226–227
SuperH-unique instructions, K-38 to K-39
Thumb-unique instructions, K-37 to K-38
vector processor history, G-26
Virtual Machines ISA support, 109
Virtual Machines and virtual memory and I/O, 110
RISC-I, L-19 to L-20
RISC-II, L-19 to L-20
Roofline model
GPU performance, 326
memory bandwidth, 332
Multimedia SIMD Extensions, 285–288, 287
Round digit, J-18
Rounding modes, J-14, J-17 to J-19, J-18, J-20
FP precisions, J-34
fused multiply-add, J-33
Round-robin (RR)
arbitration, F-49
IBM 360, K-85 to K-86
InfiniBand, F-74
Routers
BARRNet, F-80
Ethernet, F-79
Routing algorithm
commercial interconnection networks, F-56
fault tolerance, F-67
implementation, F-57
Intel SCCC, F-70
interconnection networks, F-21 to F-22, F-27, F-44 to F-48
mesh network, F-46
network impact, F-52 to F-55
OCN history, F-104
and overhead, F-93 to F-94
SAN characteristics, F-76
switched-media networks, F-24
switch microarchitecture pipelining, F-61
system area network history, F-100
Row access strobe (RAS), DRAM, 98
Row-diagonal parity
example, D-9
RAID, D-9
Row major order, blocking, 89
RS format instructions, IBM 360, K-87
Ruby on Rails, hardware impact on software development, 4
RX format instructions, IBM 360, K-86 to K-87
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset