B
Back-off time, shared-media networks, F-23
Backpressure, congestion management, F-65
Backside bus, centralized shared-memory multiprocessors,
351
Balanced systems, sorting case study, D-64 to D-67
Balanced tree, MINs with nonblicking, F-34
Bandwidth
See also Throughput
centralized shared-memory multiprocessors,
351–352
communication mechanism, I-3
congestion management, F-64 to F-65
Ethernet and bridges,
F-78
interconnection networks,
F-28
multi-device networks, F-25 to F-29
performance considerations, F-89
two-device networks, F-12 to F-20
memory, and vector performance,
332
network performance and topology, F-41
performance milestones,
20
point-to-point links and switches, D-34
routing/arbitration/switching impact, F-52
shared-
vs. switched-media networks,
F-22
switched-media networks, F-24
system area network history, F-101
vs. TCP/IP reliance, F-95
Bandwidth gap, disk storage, D-3
Banerjee, Uptal, L-30 to L-31
Bank busy time, vector memory systems, G-9
Banked memory
See also Memory banks
vector architectures,
G-10
Barcelona Supercomputer Center,
F-76
Barnes
characteristics, I-8 to I-9
distributed-memory multiprocessor,
I-32
symmetric shared-memory multiprocessors, I-22,
I-23, I-25
Barnes-Hut
n-body algorithm, basic concept, I-8 to I-9
Barriers
commercial workloads,
370
fetch-and-increment, I-20 to I-21
large-scale multiprocessor synchronization, I-13 to I-16,
I-14,
I-16,
I-19, I-20
Based indexed addressing mode, Intel 80x86, K-49,
K-58
Batch processing workloads
WSC goals/requirements,
433
Bay Area Research Network (BARRNet),
F-80
Before rounding rule, J-36
Benchmarking
see also specific benchmark suites
embedded applications
basic considerations, E-12
power consumption and efficiency, E-13
instruction set operations,
A-15
as performance measurement,
37–41
real-world server considerations,
52–55
response time restrictions,
D-18
server performance,
40–41
sorting case study, D-64 to D-67
Benesˆ topology
centralized switched networks, F-33
Berkeley’s Tertiary Disk project
Best-case lower bounds, multi-device interconnection networks, F-25
Best-case upper bounds
multi-device interconnection networks, F-26
network performance and topology, F-41
Between instruction exceptions, definition,
C-45
Bidirectional multistage interconnection networks
characteristics, F-33 to F-34
SAN characteristics,
F-76
Bidirectional rings, topology, F-35 to F-36
Big Endian
interconnection networks, F-12
memory address interpretation,
A-7
MIPS core extensions, K-20 to K-21
MIPS data transfers,
A-34
Binary code compatibility
Binary-coded decimal, definition,
A-14
Binary-to-decimal conversion, FP precisions, J-34
Bing search
delays and user behavior,
451
WSC processor cost-performance,
473
Bisection bandwidth
as network cost constraint, F-89
network performance and topology, F-41
Bisection bandwidth, WSC array switch,
443
Bisection traffic fraction, network performance and topology, F-41
Bit error rate (BER), wireless networks, E-21
Bit rot, case study, D-61 to D-64
Bit selection, block placement,
B-7
Black box network
basic concept, F-5 to F-6
effective bandwidth, F-17
switched-media networks, F-24
switched network topologies, F-40
Block addressing
interleaved cache banks,
86
memory hierarchy basics,
74
Blocked floating point arithmetic, DSP, E-6
Blocking
centralized switched networks, F-32
network performance and topology, F-41
Blocking calls, shared-memory multiprocessor workload,
369
Blocking factor, definition,
90
Block multithreading, definition, L-34
Block placement
memory hierarchy considerations,
B-7
Blocks
See also Cache block See also Thread Block
vs. bytes per reference,
378
compiler optimizations,
89–90
disk array deconstruction, D-51,
D-55
disk deconstruction case study, D-48 to D-51
global code scheduling, H-15 to H-16
L3 cache size, misses per instruction,
371
memory hierarchy basics,
74
placement in main memory,
B-44
RAID performance prediction, D-57 to D-58
Block servers,
vs. filers, D-34 to D-35
Block size
memory hierarchy basics,
76
Block transfer engine (BLT)
interconnection network protection, F-87
Body of Vectorized Loop
GPU Memory structure,
304
Thread Block Scheduler,
314
Booth recoding, J-8 to J-9,
J-9, J-10 to J-11
chip comparison, J-60 to J-61
integer multiplication,
J-49
Bose-Einstein formula, definition,
30
Bounds checking, segmented virtual memory,
B-52
Branches
MIPS control flow instructions,
A-38
RISC instruction set,
C-5
Branch folding, definition,
206
Branch hazards
basic considerations,
C-21
Branch offsets, control flow instructions,
A-18
Branch penalty
instruction fetch bandwidth,
203–206
simple scheme examples,
C-25
Branch prediction
early schemes, L-27 to L-28
instruction fetch bandwidth,
205
integrated instruction fetch units,
207
misprediction rates on SPEC89,
166
two-bit predictor comparison,
165
Branch registers
PowerPC instructions, K-32 to K-33
Branch stalls, MIPS R4000 pipeline,
C-67
Branch-target address
MIPS control flow instructions,
A-38
RISC instruction set,
C-5
Branch-target buffers
branch hazard stalls,
C-42
instruction fetch bandwidth,
203–206
instruction handling,
204
MIPS control flow instructions,
A-38
Bubble sort, code example,
K-76
Buffered crossbar switch, switch microarchitecture, F-62
Buffered wormhole switching, F-51
Buffers
DSM multiprocessor cache coherence, I-38 to I-40
interconnection networks, F-10 to F-11
network interface functions, F-7
switch microarchitecture, F-58 to F-60
Bundles
IA-64, H-34 to H-35,
H-37
Bus-based coherent multiprocessors, L-59 to L-60
Buses
barrier synchronization,
I-16
centralized shared-memory multiprocessors,
351
dynamic scheduling with Tomasulo’s algorithm,
172,
175
I/O bus replacements, D-34,
D-34
large-scale multiprocessor synchronization, I-12 to I-13
scientific workloads on symmetric shared-memory multiprocessors,
I-25
Sony PlayStation 2 Emotion Engine, E-18
vs. switched networks, F-2
switch microarchitecture, F-55 to F-56
Tomasulo’s algorithm,
180,
182
Byte displacement addressing, VAX, K-67
Byte offset
misaligned addresses,
A-8
Bytes
aligned/misaligned addresses,
A-8
arithmetic intensity example,
286
Intel 80x86 integer operations, K-51
MIPS data transfers,
A-34
operand types/sizes,
A-14
per reference,
vs. block size,
378
Byte/word/long displacement deferred addressing, VAX, K-67