W

Wafers
example, 31
integrated circuit cost trends, 28–32
Wafer yield
chip costs, 32
definition, 30
Waiting line, definition, D-24
Wait time, shared-media networks, F-23
Wallace tree
example, J-53, J-53
historical background, J-63
Wall-clock time
execution time, 36
scientific applications on parallel processors, I-33
Warehouse-scale computers (WSCs)
Amazon Web Services, 456–461
basic concept, 432
characteristics, 8
cloud computing, 455–461
cloud computing providers, 471–472
cluster history, L-72 to L-73
computer architecture
array switch, 443
basic considerations, 441–442
memory hierarchy, 443, 443–446, 444
storage, 442–443
as computer class, 5
computer cluster forerunners, 435–436
cost-performance, 472–473
definition, 345
and ECC memory, 473–474
efficiency measurement, 450–452
facility capital costs, 472
Flash memory, 474–475
Google
containers, 464–465
cooling and power, 465–468
monitoring and repairing, 469–470
PUE, 468
server, 467
servers, 468–469
MapReduce, 437–438
network as bottleneck, 461
physical infrastructure and costs, 446–450
power modes, 472
programming models and workloads, 436–441
query response-time curve, 482
relaxed consistency, 439
resource allocation, 478–479
server energy efficiency, 462–464
vs. servers, 432–434
SPECPower benchmarks, 463
switch hierarchy, 441–442, 442
TCO case study, 476–478
Warp, L-31
definition, 292, 313
terminology comparison, 314
Warp Scheduler
definition, 292, 314
Multithreaded SIMD Processor, 294
Wavelength division multiplexing (WDM), WAN history, F-98
Way prediction, cache optimization, 81–82
Way selection, 82
Weak ordering, relaxed consistency models, 395
Weak scaling, Amdahl’s law and parallel computers, 406–407
Web index search, shared-memory workloads, 369
Web servers
benchmarking, D-20 to D-21
dependability benchmarks, D-21
ILP for realizable processors, 218
performance benchmarks, 40
WAN history, F-98
Weighted arithmetic mean time, D-27
Weitek 3364
arithmetic functions, J-58 to J-61
chip comparison, J-58
chip layout, J-60
West-first routing, F-47 to F-48
Wet-bulb temperature
Google WSC, 466
WSC cooling systems, 449
Whirlwind project, L-4
Wide area networks (WANs)
ATM, F-79
characteristics, F-4
cross-company interoperability, F-64
effective bandwidth, F-18
fault tolerance, F-68
historical overview, F-97 to F-99
InfiniBand, F-74
interconnection network domain relationship, F-4
latency and effective bandwidth, F-26 to F-28
offload engines, F-8
packet latency, F-13, F-14 to F-16
routers/gateways, F-79
switches, F-29
switching, F-51
time of flight, F-13
topology, F-30
Wilkes, Maurice, L-3
Winchester, L-78
Window
latency, B-21
processor performance calculations, 218
scoreboarding definition, C-78
TCP/IP headers, F-84
Windowing, congestion management, F-65
Window size
ILP limitations, 221
ILP for realizable processors, 216–217
vs. parallelism, 217
Windows operating systems See Microsoft Windows
Wireless networks
basic challenges, E-21
and cell phones, E-21 to E-22
Wires
energy and power, 23
scaling, 19–21
Within instruction exceptions
definition, C-45
instruction set complications, C-50
stopping/restarting execution, C-46
Word count, definition, B-53
Word displacement addressing, VAX, K-67
Word offset, MIPS, C-32
Words
aligned/misaligned addresses, A-8
AMD Opteron data cache, B-15
DSP, E-6
Intel 80x86, K-50
memory address interpretation, A-7 to A-8
MIPS data transfers, A-34
MIPS data types, A-34
MIPS unaligned reads, K-26
operand sizes/types, 12
as operand type, A-13 to A-14
VAX, K-70
Working set effect, definition, I-24
Workloads
execution time, 37
Google search, 439
Java and PARSEC without SMT, 403–404
RAID performance prediction, D-57 to D-59
symmetric shared-memory multiprocessor performance, 367–374, I-21 to I-26
WSC goals/requirements, 433
WSC resource allocation case study, 478–479
WSCs, 436–441
Wormhole switching, F-51, F-88
performance issues, F-92 to F-93
system area network history, F-101
Worst-case execution time (WCET), definition, E-4
Write after read (WAR)
data hazards, 153–154, 169
dynamic scheduling with Tomasulo’s algorithm, 170–171
hazards and forwarding, C-55
ILP limitation studies, 220
MIPS scoreboarding, C-72, C-74 to C-75, C-79
multiple-issue processors, L-28
register renaming vs. ROB, 208
ROB, 192
TI TMS320C55 DSP, E-8
Tomasulo’s advantages, 177–178
Tomasulo’s algorithm, 182–183
Write after write (WAW)
data hazards, 153, 169
dynamic scheduling with Tomasulo’s algorithm, 170–171
execution sequences, C-80
hazards and forwarding, C-55 to C-58
ILP limitation studies, 220
microarchitectural techniques case study, 253
MIPS FP pipeline performance, C-60 to C-61
MIPS scoreboarding, C-74, C-79
multiple-issue processors, L-28
register renaming vs. ROB, 208
ROB, 192
Tomasulo’s advantages, 177–178
Write allocate
AMD Opteron data cache, B-12
definition, B-11
example calculation, B-12
Write-back cache
AMD Opteron example, B-12, B-14
coherence maintenance, 381
coherency, 359
definition, B-11
directory-based cache coherence, 383, 386
Flash memory, 474
FP register file, C-56
invalidate protocols, 355–357, 360
memory hierarchy basics, 75
snooping coherence, 355, 356–357, 359
Write-back cycle (WB)
basic MIPS pipeline, C-36
data hazard stall minimization, C-17
execution sequences, C-80
hazards and forwarding, C-55 to C-56
MIPS exceptions, C-49
MIPS pipeline, C-52
MIPS pipeline control, C-39
MIPS R4000, C-63, C-65
MIPS scoreboarding, C-74
pipeline branch issues, C-40
RISC classic pipeline, C-7 to C-8, C-10
simple MIPS implementation, C-33
simple RISC implementation, C-6
Write broadcast protocol, definition, 356
Write buffer
AMD Opteron data cache, B-14
Intel Core i7, 118, 121
invalidate protocol, 356
memory consistency, 393
memory hierarchy basics, 75
miss penalty reduction, 87, B-32, B-35 to B-36
write merging example, 88
write strategy, B-11
Write hit
cache coherence, 358
directory-based coherence, 424
single-chip multicore multiprocessor, 414
snooping coherence, 359
write process, B-11
Write invalidate protocol
directory-based cache coherence protocol example, 382–383
example, 359, 360
implementation, 356–357
snooping coherence, 355–356
Write merging
example, 88
miss penalty reduction, 87
Write miss
AMD Opteron data cache, B-12, B-14
cache coherence, 358, 359, 360, 361
definition, 385
directory-based cache coherence, 380–383, 385–386
example calculation, B-12
locks via coherence, 390
memory hierarchy basics, 76–77
memory stall clock cycles, B-4
Opteron data cache, B-12, B-14
snooping cache coherence, 365
write process, B-11 to B-12
write speed calculations, 393
Write result stage
data hazards, 154
dynamic scheduling, 174–175
hardware-based speculation, 192
instruction steps, 175
ROB instruction, 186
scoreboarding, C-74 to C-75, C-78 to C-80
status table examples, C-77
Tomasulo’s algorithm, 178, 180, 190
Write serialization
hardware primitives, 387
multiprocessor cache coherency, 353
snooping coherence, 356
Write stall, definition, B-11
Write strategy
memory hierarchy considerations, B-6, B-10 to B-12
virtual memory, B-45 to B-46
Write-through cache
average memory access time, B-16
coherency, 352
invalidate protocol, 356
memory hierarchy basics, 74–75
miss penalties, B-32
optimization, B-35
snooping coherence, 359
write process, B-11 to B-12
Write update protocol, definition, 356
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset