G

Gateways, Ethernet, F-79
Gather-Scatter
definition, 309
GPU comparisons, 329
multimedia instruction compiler support, A-31
sparse matrices, G-13 to G-14
vector architectures, 279–280
GE 645, L-9
General-Purpose Computing on GPUs (GPGPU), L-51 to L-52
General-purpose electronic computers, historical background, L-2 to L-4
General-purpose registers (GPRs)
advantages/disadvantages, A-6
IA-64, H-38
Intel 80x86, K-48
ISA classification, A-3 to A-5
MIPS data transfers, A-34
MIPS operations, A-36
MIPS64, A-34
VMIPS, 265
Geometric means, example calculations, 43–44
Gibson mix, L-6
Giga Thread Engine, definition, 292, 314
Global address space, segmented virtual memory, B-52
Global code scheduling
example, H-16
parallelism, H-15 to H-23
superblock scheduling, H-21 to H-23, H-22
trace scheduling, H-19 to H-21, H-20
Global common subexpression elimination, compiler structure, A-26
Global data area, and compiler technology, A-27
Global Environment for Network Innovation (GENI), F-98
Global load/store, definition, 309
Global Memory
definition, 292, 314
GPU programming, 290
locks via coherence, 390
Global miss rate
definition, B-31
multilevel caches, B-33
Global optimizations
compilers, A-26, A-29
optimization types, A-28
Global Positioning System, CDMA, E-25
Global predictors
Intel Core i7, 166
tournament predictors, 164–166
Global scheduling, ILP, VLIW processor, 194
Global system for mobile communication (GSM), cell phones, E-25
Goldschmidt’s division algorithm, J-29, J-61
Goldstine, Herman, L-2 to L-3
Google
Bigtable, 438, 441
cloud computing, 455
cluster history, L-62
containers, L-74
MapReduce, 437, 458–459, 459
server CPUs, 440
server power-performance benchmarks, 439–441
WSCs, 432, 449
containers, 464–465, 465
cooling and power, 465–468
monitoring and repairing, 469–470
PUE, 468
servers, 467, 468–469
Google App Engine, L-74
Google Clusters
memory dependability, 104
power consumption, F-85
Google File System (GFS)
MapReduce, 438
WSC storage, 442–443
Google Goggles
PMDs, 6
user experience, 4
Google search
shared-memory workloads, 369
workload demands, 439
Gordon Bell Prize, L-57
GPGPU (General-Purpose Computing on GPUs), L-51 to L-52
GPU (Graphics Processing Unit)
banked and graphics memory, 322–323
computing history, L-52
definition, 9
DLP
basic considerations, 288
basic PTX thread instructions, 299
conditional branching, 300–303
coprocessor relationship, 330–331
definitions, 309
Fermi GPU architecture innovations, 305–308
Fermi GTX 480 floorplan, 295
GPUs vs. vector architectures, 308–312, 310
mapping examples, 293
Multimedia SIMD comparison, 312
multithreaded SIMD Processor block diagram, 294
NVIDIA computational structures, 291–297
NVIDIA/CUDA and AMD terminology, 313–315
NVIDIA GPU ISA, 298–300
NVIDIA GPU Memory structures, 304, 304–305
programming, 288–291
SIMD thread scheduling, 297
terminology, 292
fine-grained multithreading, 224
future features, 332
gather/scatter operations, 280
historical background, L-50
loop-level parallelism, 150
vs. MIMD with Multimedia SIMD, 324–330
mobile client/server features, 324, 324
power/DLP issues, 322
raw/relative performance, 328
Roofline model, 326
scalable, L-50 to L-51
strided access-TLB interactions, 323
thread count and memory performance, 332
TLP, 346
vector kernel implementation, 334–336
vs. vector processor operation, 276
GPU Memory
caches, 306
CUDA program, 289
definition, 292, 309, 314
future architectures, 333
GPU programming, 288
NVIDIA, 304, 304–305
splitting from main memory, 330
Gradual underflow, J-15, J-36
Grain size
MIMD, 10
TLP, 346
Grant phase, arbitration, F-49
Graph coloring, register allocation, A-26 to A-27
Graphics double data rate (GDDR)
characteristics, 102
Fermi GTX 480 GPU, 295, 324
Graphics dynamic random-access memory (GDRAM)
bandwidth issues, 322–323
characteristics, 102
Graphics-intensive benchmarks, desktop performance, 38
Graphics pipelines, historical background, L-51
Graphics Processing Unit See GPU (Graphics Processing Unit)
Graphics synchronous dynamic random-access memory (GSDRAM), characteristics, 102
Graphics Synthesizer, Sony PlayStation 2, E-16, E-16 to E-17
Greater than condition code, PowerPC, K-10 to K-11
Greatest common divisor (GCD) test, loop-level parallelism dependences, 319, H-7
Grid
arithmetic intensity, 286
CUDA parallelism, 290
definition, 292, 309, 313
and GPU, 291
GPU Memory structures, 304
GPU terms, 308
mapping example, 293
NVIDIA GPU computational structures, 291
SIMD Processors, 295
Thread Blocks, 295
Grid computing, L-73 to L-74
Grid topology
characteristics, F-36
direct networks, F-37
Guest definition, 108
Guest domains, Xen VM, 111
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset