E

Early restart, miss penalty reduction, 86
Earth Simulator, L-46, L-48, L-63
Eckert, J. Presper, L-2 to L-3, L-5, L-19
Eckert-Mauchly Computer Corporation, L-4 to L-5, L-56
ECL minicomputer, L-19
Economies of scale
WSC vs. datacenter costs, 455–456
WSCs, 434
EDSAC (Electronic Delay Storage Automatic Calculator), L-3
EDVAC (Electronic Discrete Variable Automatic Computer), L-2 to L-3
EEPROM (Electronically Erasable Programmable Read-Only Memory)
compiler-code size considerations, A-44
Flash Memory, 102–104
memory hierarchy design, 72
Effective address
ALU, C-7, C-33
data dependences, 152
definition, A-9
execution/effective address cycle, C-6, C-31 to C-32, C-63
hardware-based speculation, 186, 190, 192
load interlocks, C-39
load-store, 174, 176, C-4
RISC instruction set, C-4 to C-5
simple MIPS implementation, C-31 to C-32
simple RISC implementation, C-6
TLB, B-49
Tomasulo’s algorithm, 173, 178, 182
Effective bandwidth
definition, F-13
example calculations, F-18
vs. interconnected nodes, F-28
interconnection networks
multi-device networks, F-25 to F-29
two-device networks, F-12 to F-20
vs. packet size, F-19
Efficiency factor, F-52
Eight-way set associativity
ARM Cortex-A8, 114
cache optimization, B-29
conflict misses, B-23
data cache misses, B-10
Elapsed time, execution time, 36
Elastic Block Storage (EBS), MapReduce cost calculations, 458–460, 459
Electronically Erasable Programmable Read-Only Memory See EEPROM (Electronically Erasable Programmable Read-Only Memory)
Electronic Delay Storage Automatic Calculator (EDSAC), L-3
Electronic Design News Embedded Microprocessor Benchmark Consortium (EEMBC)
benchmark classes, E-12
ISA code size, A-44
kernel suites, E-12
performance benchmarks, 38
power consumption and efficiency metrics, E-13
Electronic Discrete Variable Automatic Computer (EDVAC), L-2 to L-3
Electronic Numerical Integrator and Calculator (ENIAC), L-2 to L-3, L-5 to L-6, L-77
Element group, definition, 272
Embedded multiprocessors, characteristics, E-14 to E-15
Embedded systems
benchmarks
basic considerations, E-12
power consumption and efficiency, E-13
cell phone case study
Nokia circuit board, E-24
overview, E-20
phone block diagram, E-23
phone characteristics, E-22 to E-24
radio receiver, E-23
standards and evolution, E-25
wireless networks, E-21 to E-22
characteristics, 8–9, E-4
as computer class, 5
digital signal processors
definition, E-3
desktop multimedia support, E-11
examples and characteristics, E-6
media extensions, E-10 to E-11
overview, E-5 to E-7
TI TMS320C6x, E-8 to E-10
TI TMS320C6x instruction packet, E-10
TI TMS320C55, E-6 to E-7, E-7 to E-8
TI TMS320C64x, E-9
EEMBC benchmark suite, E-12
overview, E-2
performance, E-13 to E-14
real-time processing, E-3 to E-5
RISC systems
addressing modes, K-6
addressing modes and instruction formats, K-5 to K-6
arithmetic/logical instructions, K-24
conditional branches, K-17
constant extension, K-9
control instructions, K-16
conventions, K-16
data transfer instructions, K-14, K-23
DSP extensions, K-19
examples, K-3, K-4
instruction formats, K-8
multiply-accumulate, K-20
Sanyo digital camera SOC, E-20
Sanyo VPC-SX500 digital camera case study, E-19
Sony PlayStation 2 block diagram, E-16
Sony PlayStation 2 Emotion Engine case study, E-15 to E-18
Sony PlayStation 2 Emotion Engine organization, E-18
EMC, L-80
Emotion Engine
organization modes, E-18
Sony PlayStation 2 case study, E-15 to E-18
empowerTel Networks, MXP processor, E-14
Encoding
control flow instructions, A-18
erasure encoding, 439
instruction set, A-21 to A-24, A-22
Intel 80x86 instructions, K-55, K-58
ISAs, 14, A-5 to A-6
MIPS ISA, A-33
MIPS pipeline, C-36
opcode, A-13
VAX instructions, K-68 to K-70, K-69
VLIW model, 195–196
Encore Multimax, L-59
End-to-end flow control
congestion management, F-65
vs. network-only features, F-94 to F-95
Energy efficiency See also Power consumption
Climate Savers Computing Initiative, 462
embedded benchmarks, E-13
hardward fallacies, 56
ILP exploitation, 201
Intel Core i7, 401–405
ISA, 241–243
microprocessor, 23–26
PMDs, 6
processor performance equation, 52
servers, 25
and speculation, 211–212
system trends, 21–23
WSC, measurement, 450–452
WSC goals/requirements, 433
WSC infrastructure, 447–449
WSC servers, 462–464
Energy proportionality, WSC servers, 462
Engineering Research Associates (ERA), L-4 to L-5
ENIAC (Electronic Numerical Integrator and Calculator), L-2 to L-3, L-5 to L-6, L-77
Enigma coding machine, L-4
Entry time, transactions, D-16, D-17
Environmental faults, storage systems, D-11
EPIC approach
historical background, L-32
IA-64, H-33
VLIW processors, 194, 196
Equal condition code, PowerPC, K-10 to K-11
Erasure encoding, WSCs, 439
Error-Correcting Code (ECC)
disk storage, D-11
fault detection pitfalls, 58
Fermi GPU architecture, 307
hardware dependability, D-15
memory dependability, 104
RAID 2, D-6
and WSCs, 473–474
Error handling, interconnection networks, F-12
Errors, definition, D-10 to D-11
Escape resource set, F-47
ETA processor, vector processor history, G-26 to G-27
Ethernet
and bandwidth, F-78
commercial interconnection networks, F-63
cross-company interoperability, F-64
interconnection networks, F-89
as LAN, F-77 to F-79
LAN history, F-99
LANs, F-4
packet format, F-75
shared-media networks, F-23
shared- vs. switched-media networks, F-22
storage area network history, F-102
switch vs. NIC, F-86
system area networks, F-100
total time statistics, F-90
WAN history, F-98
Ethernet switches
architecture considerations, 16
Dell servers, 53
Google WSC, 464–465, 469
historical performance milestones, 20
WSCs, 441–444
European Center for Particle Research (CERN), F-98
Even/odd array
example, J-52
integer multiplication, J-52
EVEN-ODD scheme, development, D-10
Example calculations
average memory access time, B-16 to B-17
barrier synchronization, I-15
block size and average memory access time, B-26 to B-28
branch predictors, 164
branch schemes, C-25 to C-26
branch-target buffer branch penalty, 205–206
bundles, H-35 to H-36
cache behavior impact, B-18, B-21
cache hits, B-5
cache misses, 83–84, 93–95
cache organization impact, B-19 to B-20
carry-lookahead adder, J-39
chime approximation, G-2
compiler-based speculation, H-29 to H-31
conditional instructions, H-23 to H-24
CPI and FP, 50–51
credit-based control flow, F-10 to F-11
crossbar switch interconnections, F-31 to F-32
data dependences, H-3 to H-4
DAXPY on VMIPS, G-18 to G-20
dependence analysis, H-7 to H-8
deterministic vs. adaptive routing, F-52 to F-55
dies, 29
die yield, 31
dimension-order routing, F-47 to F-48
disk subsystem failure rates, 48
fault tolerance, F-68
fetch-and-increment barrier, I-20 to I-21
FFT, I-27 to I-29
fixed-point arithmetic, E-5 to E-6
floating-point addition, J-24 to J-25
floating-point square root, 47–48
GCD test, 319, H-7
geometric means, 43–44
hardware-based speculation, 200–201
inclusion, 397
information tables, 176–177
integer multiplication, J-9
interconnecting node costs, F-35
interconnection network latency and effective bandwidth, F-26 to F-28
I/O system utilization, D-26
L1 cache speed, 80
large-scale multiprocessor locks, I-20
large-scale multiprocessor synchronization, I-12 to I-13
loop-carried dependences, 316, H-4 to H-5
loop-level parallelism, 317
loop-level parallelism dependences, 320
loop unrolling, 158–160
MapReduce cost on EC2, 458–460
memory banks, 276
microprocessor dynamic energy/power, 23
MIPS/VMIPS for DAXPY loop, 267–268
miss penalty, B-33 to B-34
miss rates, B-6, B-31 to B-32
miss rates and cache sizes, B-29 to B-30
miss support, 85
M/M/1 model, D-33
MTTF, 34–35
multimedia instruction compiler support, A-31 to A-32
multiplication algorithm, J-19
network effective bandwidth, F-18
network topologies, F-41 to F-43
Ocean application, I-11 to I-12
packet latency, F-14 to F-15
parallel processing, 349–350, I-33 to I-34
pipeline execution rate, C-10 to C-11
pipeline structural hazards, C-14 to C-15
power-performance benchmarks, 439–440
predicated instructions, H-25
processor performance comparison, 218–219
queue I/O requests, D-29
queue waiting time, D-28 to D-29
queuing, D-31
radix-4 SRT division, J-56
redundant power supply reliability, 35
ROB commit, 187
ROB instructions, 189
scoreboarding, C-77
sequential consistency, 393
server costs, 454–455
server power, 463
signed-digit numbers, J-53
signed numbers, J-7
SIMD multimedia instructions, 284–285
single-precision numbers, J-15, J-17
software pipelining, H-13 to H-14
speedup, 47
status tables, 178
strides, 279
TB-80 cluster MTTF, D-41
TB-80 IOPS, D-39 to D-40
torus topology interconnections, F-36 to F-38
true sharing misses and false sharing, 366–367
VAX instructions, K-67
vector memory systems, G-9
vector performance, G-8
vector vs. scalar operation, G-19
vector sequence chimes, 270
VLIW processors, 195
VMIPS vector operation, G-6 to G-7
way selection, 82
write buffer and read misses, B-35 to B-36
write vs. no-write allocate, B-12
WSC memory latency, 445
WSC running service availability, 434–435
WSC server data transfer, 446
Exceptions
ALU instructions, C-4
architecture-specific examples, C-44
categories, C-46
control dependence, 154–155
floating-point arithmetic, J-34 to J-35
hardware-based speculation, 190
imprecise, 169–170, 188
long latency pipelines, C-55
out-of-order completion, 169–170
precise, C-47, C-58 to C-60
preservation via hardward support, H-28 to H-32
return address buffer, 207
ROB instructions, 190
speculative execution, 222
stopping/restarting, C-46 to C-47
types and requirements, C-43 to C-46
Execute step
instruction steps, 174
Itanium 2, H-42
ROB instruction, 186
TI 320C55 DSP, E-7
Execution address cycle (EX)
basic MIPS pipeline, C-36
data hazards requiring stalls, C-21
data hazard stall minimization, C-17
exception stopping/restarting, C-46 to C-47
hazards and forwarding, C-56 to C-57
MIPS FP operations, basic considerations, C-51 to C-53
MIPS pipeline, C-52
MIPS pipeline control, C-36 to C-39
MIPS R4000, C-63 to C-64, C-64
MIPS scoreboarding, C-72, C-74, C-77
out-of-order execution, C-71
pipeline branch issues, C-40, C-42
RISC classic pipeline, C-10
simple MIPS implementation, C-31 to C-32
simple RISC implementation, C-6
Execution time
Amdahl’s law, 46–47, 406
application/OS misses, B-59
cache performance, B-3 to B-4, B-16
calculation, 36
commercial workloads, 369–370, 370
energy efficiency, 211
integrated circuits, 22
loop unrolling, 160
multilevel caches, B-32 to B-34
multiprocessor performance, 405–406
multiprogrammed parallel “make” workload, 375
multithreading, 232
performance equations, B-22
pipelining performance, C-3, C-10 to C-11
PMDs, 6
principle of locality, 45
processor comparisons, 243
processor performance equation, 49, 51
reduction, B-19
second-level cache size, B-34
SPEC benchmarks, 42–44, 43, 56
and stall time, B-21
vector length, G-7
vector mask registers, 276
vector operations, 268–271
Expand-down field, B-53
Explicit operands, ISA classifications, A-3 to A-4
Explicit parallelism, IA-64, H-34 to H-35
Explicit unit stride, GPUs vs. vector architectures, 310
Exponential back-off
large-scale multiprocessor synchronization, I-17
spin lock, I-17
Exponential distribution, definition, D-27
Extended accumulator
flawed architectures, A-44
ISA classification, A-3
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset