Example calculations
barrier synchronization, I-15
branch-target buffer branch penalty,
205–206
carry-lookahead adder, J-39
compiler-based speculation, H-29 to H-31
conditional instructions, H-23 to H-24
credit-based control flow, F-10 to F-11
crossbar switch interconnections, F-31 to F-32
data dependences, H-3 to H-4
DAXPY on VMIPS, G-18 to G-20
dependence analysis, H-7 to H-8
deterministic
vs. adaptive routing, F-52 to F-55
dimension-order routing, F-47 to F-48
disk subsystem failure rates,
48
fetch-and-increment barrier, I-20 to I-21
fixed-point arithmetic, E-5 to E-6
floating-point addition, J-24 to J-25
floating-point square root,
47–48
hardware-based speculation,
200–201
integer multiplication, J-9
interconnecting node costs, F-35
interconnection network latency and effective bandwidth, F-26 to F-28
I/O system utilization, D-26
large-scale multiprocessor locks, I-20
large-scale multiprocessor synchronization, I-12 to I-13
loop-level parallelism,
317
loop-level parallelism dependences,
320
microprocessor dynamic energy/power,
23
multiplication algorithm, J-19
network effective bandwidth, F-18
network topologies, F-41 to F-43
Ocean application, I-11 to I-12
packet latency, F-14 to F-15
power-performance benchmarks,
439–440
predicated instructions, H-25
processor performance comparison,
218–219
queue waiting time, D-28 to D-29
radix-4 SRT division, J-56
redundant power supply reliability,
35
sequential consistency,
393
signed-digit numbers, J-53
SIMD multimedia instructions,
284–285
single-precision numbers, J-15, J-17
software pipelining, H-13 to H-14
torus topology interconnections, F-36 to F-38
true sharing misses and false sharing,
366–367
vector memory systems, G-9
vector
vs. scalar operation, G-19
vector sequence chimes,
270
VMIPS vector operation, G-6 to G-7
write
vs. no-write allocate,
B-12
WSC running service availability,
434–435
WSC server data transfer,
446