Note: Page numbers in bold indicate tables; those in italics indicate figures.
ABFT (algorithm-based fault tolerance), 580, 748‒750
ABORT mode in FT-MPI, 746
AC (autonomic computing) systems, 438
accelerated processing units (APUs), 593, 594, 595, 607
accelerators
and CPUs, 709
and DAGuE engine, 707‒709, 708, 721‒722,
hardware, 63
ACS (adaptive compression service), 352‒353, 352
A/D (analog-to-digital) converters, 636
adaptive compression algorithm, 344, 350, 352‒363
adaptive compression service, 352‒353, 352
information collection, 357
node state determination, 357‒359, 360
performance evaluations, 360‒362, 361, 362,
queueing model, 353‒357, 353, 356
adaptive compression service (ACS), 352‒353, 352
adaptive dynamic loop scheduling algorithms, 441
addition-based distributed algorithms for connected dominating sets, 11‒12
ADLB (asynchronous dynamic load balancing library), 771‒772
advanced anonymization technique for cloud computing, 287‒288
aerospace applications of embedded systems, 632
affinity-based partitioning approaches, 527, 528
aggregation patterns, 172, 173, 173, 174, 175, 178‒181, 180‒181, 182
algorithm overhead, 10
algorithm-based fault tolerance (ABFT), 580, 748‒750
Amazon product recommendation algorithm, 474, 488
AMD Magny-Cours-based system, 432‒434, 433
AMD Phenom II processor, 65‒66
AMG benchmark, 164, 164, 166‒167, 168, 168, 169, 170, 171
analog-to-digital (A/D) converters, 636 Android, 152
APM (automatic path migration), 741‒742
Apple, 153
application deployment in runtime environments, 566‒567, 571‒572
application kernels, 689, 691, 691, 691‒692, 693, 694, 695, 695
application software in embedded systems, 638
application start-up in communication infrastructure, 568
application-based fault resilience, 738, 748‒753, 751
application-level checkpointing, 750‒753, 751
application-specific integrated circuits (ASICs) in embedded systems, 635, 636
APUs (accelerated processing units), 593, 594, 595, 607
Arnoldi algorithm, 480
Array Building Blocks, 70
array copying, 259‒260, 260
ASICs (application-specific integrated circuits) in embedded systems, 635, 636
asset custodian in cloud computing, 279
asset owner in cloud computing, 279
associative maps in POET, 266
asynchronous dynamic load balancing library (ADLB), 771‒772
augmented cubes, diagnosability of, 120‒122, 120, 121
automatic fault tolerance, 579
automatic path migration (APM), 741‒742
automotive applications of embedded systems, 633‒634
autonomic computing (AC) systems, 438
autonomous embedded systems, 631
bandwidth allocation protocol (divide and 43‒44, 44
Basic Local Alignment Search Tool (BLAST), 496, 497, 499, 500, 501, 751
BC (branch comparison) binary operator, 407, 408
BCET (best-case execution time), 391
behavior of users, modeling, 471, 472‒473, 473, 497, 498
best-case execution time (BCET), 391
BGM model for determining diagnosability 97‒98
biased neighbor selection (BNS), 42
big bang simulation, 40
binomial trees, 570, 571, 573, 574, 576
biosensors, 227
bitmap join indexes (BJI), 529, 544‒545, 545, 546, 547‒548, 548
BJI (bitmap join indexes), 529, 544‒545, 545, 546, 547‒548, 548
BLANK mode in FT-MPI, 746
BLAST (Basic Local Alignment Search Tool), 496, 497, 499, 500, 501, 751
block distribution, 214
blocking communication in MPI, 763
blocking coordinated checkpoint protocols, 744
BNS (biased neighbor selection), 42
boundary integral equations, 717
branch comparison (BC) binary operator, 407, 408
branch prediction, 58
Brook, in mobile multicore computing, 151
buffered send in MPI, 763
cache hierarchy, 256
cache memory, 58, 62‒63, 65‒67
caches
in embedded systems, 637
and search engines, 469
shared, in multicore computing, 62‒63
CAF (Co-Array Fortran), 765, 767‒768, 767
call delivery. See terminal paging
call handling models
call plus location update (CPLU) model, 189, 190, 191, 192‒194, 197‒199, 201‒202, 202‒203, 203, 204‒205
call without location update (CWLU) model, 189, 190‒192, 194‒196, 199‒202, 202‒203, 203, 204‒205
call plus location update (CPLU) model, 189, 190, 191
and location update cost minimization with terminal paging constraint, 197‒199, 203, 204‒205
and terminal paging cost minimization with location update constraint, 192‒194, 201‒202, 202‒203
call without location update (CWLU) model, 189, 190‒192
and location update cost minimization with terminal paging constraint, 199‒201, 203, 204‒205
and terminal paging cost minimization with location update constraint, 194‒196, 201‒202, 202‒203
capacity, relationship with reputation, 312‒316, 312, 313, 314, 315
Catanzaro DSLs, 223
CDS-based topology control, 10‒12
CDSs. See connected dominating sets
Cell BE (Cell Broadband Engine), 590, 590‒591
Cell Broadband Engine (Cell BE), 590, 590‒591
central processing units. See CPUs
centralized algorithms for connected dominating sets, 11
centralized approach for service provider selection, 664, 665, 666, 669
CGs (computational grids), 395, 397‒398, 398
chares, in Charm + + , 748, 769‒770
Charm + + , 222, 223, 748, 769‒770
checkpointing
at application-level, 750‒753, 751
at systems software level, 743‒745, 743, 748
chip multiprocessors
iterative procedure, 135‒136, 136, 138, 139
queuing network modeling, 132‒136, 133, 136, 139‒140, 141
simulation tools, 130‒131, 130, 136‒139, 140‒141
testing, 136‒139, 137, 138‒139
chip-kill, 740
chipspare, 740
Cholesky factorization, 728, 729‒730, 730
chromosomes in genetic algorithms, 13, 17, 18, 19, 20, 21, 22, 23, 23
CiLK-NOW, 748
circuit satisfiability, 216‒218, 217, 218, 221
Cisco Metro chip, 75
cliques, in junction trees, 420, 421, 422, 424, 426, 429‒430
clocks
in X10, 773
cloud computing, 1‒2, 277‒278, 282, 288, 613‒614
advanced anonymization technique, 287‒288
computational assets of stakeholders, 278, 279, 280, 282
confidentiality concerns, 277‒278, 282, 285, 286, 288
expenses related to, 626
integrity of output, 282
and matrix multiplication, 281‒285, 282
privacy concerns, 277, 278, 280, 286, 288
and query processing on relational databases, 286‒288, 287, 287
server placement, 614‒620, 615
cluster storage subsystems, 325‒326, 325
cluster-based gossip protocols, 37
CMPs. See chip multiprocessors
Co-Array Fortran (CAF), 765, 767‒768, 767
code encapsulation, 644
code generation, 639
code integration, 644
code templates in POET, 266
coding, multi-instantiation of, 535, 535
collective/group communications, 569, 764‒765
collusion, 292, 299, 300, 309, 310, 311
communication infrastructure in runtime environments, 566‒570, 575, 576
communication processors (CPs), organization of, 127, 127
communication-computation operations in MPI, 764‒765
communication-induced checkpoint protocols, 744‒745
communication-only in MPI, 764
communicators in MPI, 762
comparison algorithms BLAST, 496, 497, 499, 500, 501
comparative analysis of, 498‒499, 499, 499‒500, 501
Dice coefficient, 487‒488, 499, 500, 501
Hamming distance, 489, 499, 500, 501
Jaccard similarity coefficient, 486‒487, 488, 489, 499, 499, 501
Levenshtein distance, 490‒492, 491, 497, 499, 500, 501
Monge-Elkan algorithm, 496‒497, 499, 501
Needleman-Wunsch algorithm, 492‒494, 493, 494, 497, 499, 500, 501
Smith-Waterman algorithm, 494‒495, 495, 496, 497, 499, 501
Tanimoto coefficient, 487, 489, 499, 500, 501
vector cosine, 483‒484, 484, 486‒487, 489, 499, 499, 501
Wagner-Fisher algorithm, 490‒491, 491, 495‒497, 499, 501
comparison diagnosis model for determining diagnosability, 97, 98, 101, 101
complete checkpointing, 745
component implementation in embedded systems design, 639
component test/validation in embedded systems design, 639
composite partitioning, 530
composition plan in service composition framework, 662, 669‒670
compression, 344, 345, 362‒363
algorithms, 344, 345‒346, 346, 347, 347, 350, 352‒363, 352, 353, 356, 360, 361, 362
Lempel-Ziv-Welch (LZW) algorithm, 344, 345‒346, 346, 347, 347, 352
lossy/lossless, 345
no-compression scheme, 348, 350, 360‒362, 361, 362
packet delay, 348‒350, 349, 351, 358‒359
packet generation rate, 349, 350, 351 ratio, 344
scheme, 348, 350, 360‒362, 361, 362
in sensor nodes, 345
computation model in parallel programming models, 760
computation reordering, 257‒259, 258
computational grids (CGs), 395, 397‒398, 398
computational models in cloud computing, 280
Compute Unified Device Architecture(CUDA), 84, 151, 595‒598, 597, 599.
See also MPI + CUDA hybrid programming pattern; MPI + OpenMP/ CUDA hybrid programming pattern
computed output data in cloud computing, 280
concurrent CPU cores, 256
concurrent itinerary structures, 515, 515‒516
confidentiality concerns in cloud computing, 277‒278, 282, 285, 286, 288
connected dominating sets (CDSs), 9, 10‒11
CDS-based topology control, 10‒12
reliability, 14, 16, 17, 21‒22, 26, 27
See also reliable minimum-sized
connected dominating sets
CoolStreaming mesh-pull overlay, 34
CoopNet multiple-tree streaming, 33
coordinated checkpoint protocols, 744
coordinated fault tolerance protocols, 579
Core i7 processor, 65
Core2Duo architecture, 256, 256
cores
addition of in mobile multicore computing, 154
cost per core, 626
correction schemes in genetic operations, 25
cosimulation, 644
cost optimization, 185‒190, 202, 206
location management, 185‒186, 187, 187‒191, 188
location update cost minimization with terminal paging constraint, 196‒201, 203, 204‒205
terminal paging cost minimization with location update constraint, 192‒196, 201‒202, 202‒203
trade-off between location update cost and terminal paging cost, 186
cost-based horizontal partitioning approaches, 527, 528
CPLU model. See call plus location update model
CPs (communication processors), organization of, 127, 127
CPSs (cyberphysical systems), 632
and legacy code, 82
scalable programming patterns for, 84‒86
TianHe-1A supercomputer, 82‒83, 83
workload distribution, 86‒87, 90, 90‒91, 91
See also CPUs; GPUs
CPUs (central processing units)
and accelerators, 709
See also CPU/GPU systems
CRCs (cyclic-redundancy checks), 741
Credence reputation management system, 294
crossed cubes, diagnosability of, 111, 111‒112
CUDA (Compute Unified Device Architecture), 84, 151, 595‒598, 597, 599
See also MPI + CUDA hybrid programming pattern;
MPI + OpenMP/ CUDA hybrid programming pattern
cumulative objective function and scheduling, 401‒402
customized virtual clusters (CVCs), 367, 369, 370
CVCs (customized virtual clusters), 367, 369, 370
CWLU model. See call without location update model
CX (cycle crossover method) in combinatorial optimization problems, 403
cyberphysical systems (CPSs), 632
cycle crossover (CX) method in combinatorial optimization problems, 403
cyclic distribution, 214
cyclic-redundancy checks (CRCs), 741
D/A (digital-to-analog) converters, 637
DAC (divide and conquer) bandwidth allocation protocol, 43‒44, 44
DLS (dynamic loop scheduling) algorithms, 438, 440, 441, 444‒445.
See also DLS-with-RL approach to load balancing
DLS-with-RL approach to load balancing, 439, 444‒455, 446, 447, 450, 451, 452, 453, 454, 462‒463
application component, 448
experimental setup/results, 455‒462, 456, 457, 459, 460
scheduling component, 448
DM (deadline monotonic) algorithm, 388
DNM (deterministic network model), 12‒13
domain values, decomposition of, 531‒532, 531, 533, 533
domain-specific languages (DSLs), 212‒213, 212, 223
dominatees, 10
dominating sets (DSs), 8‒9, 8, 238‒239
dominators, 10
DORMQR in QR factorization, 719
double-precision computations and VOCL, 692, 694, 695, 695
DPLASMA, 701, 718‒725, 720‒721, 723, 724.
See also DAGuE engine;PLASMA
DPnDP (Design Patterns and Distributed Process), 222, 223
DRAM, 739
DSI (distributed spatial index), 510
DSLs (domain-specific languages), 212‒213, 212, 223
DSs (dominating sets), 8‒9, 8, 238‒239
DSSMQR in QR factorization, 719
DTSQRT in QR factorization, 719
DVFS. See dynamic voltage and frequency scaling
dynamic bandwidth auctions, 43, 45
dynamic concurrency throttling (DCT), 159‒160, 164‒165, 164, 168, 169, 170, 171
dynamic loop scheduling (DLS) algorithms, 438, 440, 441, 444‒445.
See also DLS-with-RL approach to load balancing
dynamic scheduling, 400, 408, 409, 410, 413, 414, 416, 416
dynamic voltage and frequency scaling (DVFS)
in mobile multicore computing, 147‒148, 149‒150
in MPI/OpenMP model, 159‒160, 161, 165‒166, 168, 169, 170, 171
dynamic-priority scheduling algorithms, 388
ECCs (error correction codes), 739‒740
ECUs (electronic control units), 633‒634
edge costs in task aggregation, 176‒177
efficient overlay construction, 47
EIB (element interconnect bus) in Cell CBE, 590, 591
EigenTrust reputation management system, 292‒293
electronic control units (ECUs), 633‒634
element interconnect bus (EIB) in Cell CBE, 590, 591
elitist generational replacement method, 403
embarrassingly parallel pattern, 216
embedded computing research initiatives in
mobile multicore computing, 153
embedded operating systems (EOSs), 637‒638
embedded systems, 3‒4, 629‒630
autonomous, 631
caches in, 637
modeling of, 631, 638‒644, 640, 654‒655
scalability of, 630‒631, 645, 646
scale-out/scale-up, 630
software, 634‒635, 635, 637‒638
verification, 646
See also wireless sensor networks
empirical tuning, 262‒263, 263, 264‒265, 265, 265‒272, 268, 268, 269, 270, 271, 272, 272
energy consumption. See power consumption
energy stretch factor and topology control, 10
environment partitioning, 670
Environment-Specific Service (ESS) in ORTE, 586
EOSs (embedded operating systems), 637‒638
ERRMGR (Error Manager) in ORTE, 585
error correction codes (ECCs), 739‒740
Error Manager (ERRMGR) in ORTE, 585
error return handling, 687
ESRT (event-to-sink reliable transport), 245
ESS (Environment-Specific Service) in ORTE, 586
ETC (expected time to compute) matrix model, 399
event-to-sink reliable transport (ESRT), 245
evidence collection, 422, 424‒425, 430
evidence distribution, 422, 424‒425, 430
evidence propagation, 422, 423, 424‒425, 430
exact inference, 420, 421‒422, 421, 424, 425
pointer-jumping-based. See pointer jumping
scheduling-based (SEI), 429, 430‒433, 431‒433
expected time to compute (ETC) matrix model, 399
explicit parallelization, 210‒211, 222‒224
and FraSPA, 211‒219, 212, 214‒215, 216, 217, 218, 219, 220, 221, 221‒222, 223‒224
exponential failure law, 235
extended ECC, 740
factorization algorithms, 716‒718
fail-stop failure model, 577, 749
FASTA algorithm, 496, 497, 499, 501
fault diagnosis, 97, 122. See also diagnosability
fault tolerance
algorithm-based fault tolerance (ABFT), 580, 748‒750
automatic fault tolerance, 579
in density-aware itinerary-based KNN, 517
and hexagonal 1-fault-tolerant model, 232, 233, 234, 234‒235
and minimum coverage model, 234, 234, 235
and pivot routing, 238‒244, 243
probability to function, 234‒235
in runtime environments, 577‒581, 581, 585
and square 1-fault-tolerant model, 232‒233, 232, 234, 234, 235
in wireless sensor networks, 228, 231‒233 232, 234, 234‒235, 238‒244, 243, 647‒654, 648‒650
fault-resilience techniques, 737, 753
algorithm-based fault tolerance (ABFT), 580, 748‒750
application-based resilience, 738, 748‒753, 751
checkpointing, 743‒745, 743, 748, 750‒753, 751
cyclic-redundancy checks (CRCs), 741
error correction codes (ECCs), 739‒740
fail-stop fault recovery, 749
hardware resilience, 737‒743, 753
nonstop hot-replacement-based fault
reliability, 741
software resilience, 738, 743‒748, 743, 753
fault-resilient application environment, 4
fault-tolerant message-passing interface (FT-MPI), 579, 746
FCFS (first come, first served) memory, 133, 133
Fibonacci trees, 570, 571, 578
field-programmable gate arrays (FPGAs), 636‒637
file fragmentation/allocation, 321‒322, 324‒325, 326, 327‒328, 328, 339
fine-grained reputation system (Zhang/Fang), 306‒307
first come, first served (FCFS) memory, 133, 133
fitness
and genetic algorithms, 17, 18, 21‒23, 23
and scheduling, 399, 401‒402, 410, 411‒413
fixed-priority scheduling algorithms, 388
flash memory, 637
flat membership protocol, 36, 37, 38
FlexRay communication standard, 634
flowtime, 401, 402, 410, 414, 415, 416, 416
folded hypercubes, diagnosability of, 119‒120, 119
Fortress, 774
FPGAs (field-programmable gate arrays), 636‒637
Framework for Synthesizing Parallel Applications. See FraSPA
FraSPA (Framework for Synthesizing Parallel Applications), 211‒213, 212, 215‒216, 216, 221‒222, 223‒224
circuit satisfiability, 216‒218, 217, 218, 221
data mapping in, 214
design templates in, 213
evaluation/experimental setup, 218‒219
and explicit parallelization, 211‒219, 212, 214‒215, 216, 217, 218, 219, 220, 221, 221‒222, 223‒224
and High-Level Parallelization Language (Hi-PaL), 212‒213, 212, 214‒215, 214, 215, 217, 217, 219
and Mandelbrot set, 218, 219, 220, 221
and MPI, 211
patterns in, 216
results/analysis, 219, 221, 221
source-to-source compiler (SSC), 212, 212, 213
frequency
and MPI/OpenMP model, 161‒162, 165‒166
FT-MPI (fault-tolerant message-passing interface), 579, 746
fused CPU/GPU systems, 593, 594, 595
FuzzyTrust reputation management system, 294
GA (Global Arrays), 765, 768‒769
GA + TS hybrid grid scheduler, 396, 404‒406, 405, 414, 414, 415, 415, 416, 416, 417
GA scheduler. See single-population genetic schedulers
GA-CX-S-SS algorithm, 410, 410‒412, 411‒413, 414, 415, 415, 416, 416
and efficient overlay construction, 47
and incentive scheme design, 45‒46
and selfish overlay construction, 46‒47
GAs. See genetic algorithms
generalized bound, 390‒392, 393
generalized cubes, diagnosability of, 115‒116, 115, 116
general-purpose graphics processing units.See GPGPUs
generic programming, 73
genes in genetic algorithms, 17, 19
genetic algorithm and tabu search hybrid grid scheduler. See GA + TS hybrid grid scheduler
chromosomes in, 13, 17, 18, 19, 20, 21, 22, 23, 23
crossover operation, 18, 23‒25, 24
and fitness, 17, 18, 21‒23, 23
inheritance population initialization (IPI), 19‒21, 20
mutation operation, 18, 23‒24, 25, 25
performance evaluation, 26‒27, 26, 27
recombination in, 18
and reliable minimum-sized connected dominating set, 13, 14‒15, 17‒28, 18, 20, 23, 23, 24, 25, 26, 27
selection scheme, 22‒23, 23, 23
geographically scalable systems, 630
GFMC (Green's function Monte Carlo), 752‒753
global alignment, 492‒494, 493, 494, 497
Global Arrays (GA), 765, 768‒769
global memory, 601, 601‒602, 604, 605
global network positioning (GNP), 39‒40
global partitioned indexes, 530‒531
global variables in POET, 266
GNP (global network positioning), 39‒40
cluster-based, 37
HiScamp, 37
system reliability of, 36
GossipTrust reputation management system, 295‒296
GPGPUs (general-purpose graphics processing units),
147, 149, 150, 152, 153, 154‒155, 675. See also GPUs
GPU computing, 595‒596, 605‒607, 606‒607
GPUs (graphics processing units), 591‒593, 592, 595, 607, 675, 696‒697
and accelerators, 708‒709, 721‒722
fused CPU/GPU systems, 593, 594, 595
memory, 593, 600‒602, 601, 677
NVIDIA GPU, 677
scalability of applications, 81‒82, 94
separate CPU/GPU systems, 593, 594
See also CPU/GPU systems; GPGPUs; GPU computing; VOCL
graph metrics and topology control, 10
graphics processing units. See GPUs
Green's function Monte Carlo (GFMC), 752‒753
grid simulation tools
Opportunistic Grid Simulation Tool (OGST), 788‒789, 790
Schelling Manager (SchMng), 789‒790, 791, 792, 793, 794, 795
Grid Uniandes Management Application(GUMA), 370
grids, 777‒778. See also opportunistic grids
Group Communications (GRPCOMM) framework, in ORTE, 586
growth prediction in cloud computing, 620‒625
GRPCOMM (Group Communications) framework, in ORTE, 586
GUMA (Grid Uniandes Management Application), 370
Hamming distance, 489, 499, 500, 501
hard memory errors, 739
hard real-time systems, 387‒388
hardware accelerators, 63
hardware fault resilience, 737‒738, 753
processor resilience, 738
hardware/software (HW/SW) partitioning, 639
HARNESS, 580
heartbeat failure detection, 577‒578
heterogeneous computing, 589‒590, 595
APUs (accelerated processing units), 593, 594, 595
Cell BE (Cell Broadband Engine), 590, 590‒591
fused CPU/GPU systems, 593, 594, 595
separate CPU/GPU systems, 593, 594
vulnerabilities, 322, 324, 327‒328, 328, 339
heterogeneous overlay construction and video streaming, 35, 37‒39
hexagonal cell structure for wireless networks, 187, 187
hexagonal 1-fault-tolerant model, 232, 233, 234, 234‒235, 237
HGS (hierarchic genetic strategy) model, 396
HGS-Sched (hierarchic genetic strategyscheduler), 396, 406‒408, 406, 412, 413, 414, 415, 415, 416, 416, 417
hierarchic genetic strategy (HGS) model, 396
hierarchic genetic strategy scheduler. See HGS-Sched
hierarchical membership protocol, 36, 37, 38
hierarchical software engineering, 4
High Productivity Computing Systems (HPCS) program, 772
high-dimensional problem sets, 3‒4
High-Level Parallelization Language (Hi-PaL), 212‒213, 212, 214‒215, 214, 215, 217, 217, 219
highly dynamic problem sets, 3‒4
high-performance computing (HPC), 209‒210
high-productivity parallel programming models, 772
Fortress, 774
hill climbing algorithm, 538, 542, 542, 543, 549‒550, 549, 550, 551, 551
merge operations, 540‒542, 541
random distribution, 540, 549, 549, 550
vs. simulated annealing algorithm, 552, 552‒553, 553
split operations, 540‒541, 541, 542
uniform distribution, 538‒539, 539, 549, 549, 550
and Zipf distribution, 539‒540, 539‒540, 549, 549, 550
Hi-PaL. See High-Level Parallelization Language
HiScamp gossip protocol, 37
homomorphic encryption, 283
honest players reputation management system, 302‒303
hop stretch factor and topology control, 9‒10
horizontal partitioning (HP), 524‒527, 559‒560
and bitmap join indexes, 529, 544‒545, 545, 546, 547‒548, 548, 553, 554
data mining-based approaches, 527, 528
and decomposition of domain values, 531‒532, 531, 533, 533
experimental studies, 549‒553, 549, 550, 551, 552, 553, 554
fragmentation methodology, 532‒535, 533, 534
and hill climbing algorithm, 538‒542, 539, 540, 541, 542, 543, 549‒550, 549, 550, 551, 551, 552, 553
and index fragmentation, 530‒531
and interval composite mode, 530
one-domain HP problem, 535‒538
physical design simulator tool (SimulPh.D), 553‒556, 555, 556‒558, 558‒559, 560
and simulated annealing algorithm, 543, 543, 550‒553, 551, 552, 553
threshold-based approaches, 527, 527, 528
unconstrained approaches, 527‒528, 527
HP. See horizontal partitioning
HPC (high-performance computing), 209‒210
HPCS (High Productivity Computing Systems) program, 772
HP-Palm, 153
H-Trust reputation management system, 296‒297
hybrid approach for service provider selection, 666, 667‒668
hybrid genetic-based metaheuristic models, 404. See also GA + TS hybrid grid scheduler
hybrid implementations of CPU/GPU systems, 87‒89
hybrid programming models, 158‒159, 760. See also MPI/OpenMP model
hypercubes, diagnosability of, 110‒111, 111, 119‒120, 119
hyper-Petersen networks, diagnosability of, 118‒119, 118
IBM POWER7 processor, 66
ILP (instruction-level parallelism), 58
image processing in cloud computing, 285‒286
implicit parallelization, 210
incentive scheme design, 45‒46
incremental checkpointing, 745
indistinguishable sets, 102
information collection in adaptive compression algorithm, 357
infrastructure-based KNN queries, 508
DSI (distributed spatial index), 510
KNN perimeter tree (KPT), 511, 511
peer-to-peer indexing structure, 509‒510, 510
infrastructure-free KNN queries, 508, 511
density-aware itinerary-based KNN (DIKNN), 516‒517
itinerary-based, 511‒520, 512‒515, 518
parallel concentric-circle itinerary-based KNN (PCIKNN), 518‒519, 518
inheritance population initialization (IPI), 19‒21, 20
in-network approach to k-nearest neighbor query processing, 508, 508‒509
infrastructure-based methods, 508, 509‒511, 510, 511
infrastructure-free methods, 508, 511‒519, 512‒515, 518
input data in cloud computing, 280
Input/Output Forwarding (IOF) Service, in ORTE, 586
instruction-level parallelism (ILP), 58
instructions per cycle (IPC) in task aggregation, 174‒175
Intel Array Building Blocks, 70
Intel Core i7 processor, 65
Intel Core2Duo architecture, 256, 256
Intel QuickPath Interconnect (QPI), 64
Intel Single-Chip Cloud Computer, 76
Intel Teraflops Research Chip, 60, 76‒77
Intel Threading Building Blocks, 70‒71
intelligent adversaries, 316
interactive embedded system, 629
interoverlay cooperation, 35‒36, 43‒45, 43, 44
interoverlay optimization (IOO), 42
interval composite mode and horizontal partitioning, 530
interval partitioning, 530
intrusion detection/tolerance, 323
inverse Zipf distribution, 539, 540, 549, 549, 550
IOF (Input/Output Forwarding) Service, in ORTE, 586
IOO (interoverlay optimization), 42
IPC (instructions per cycle) in taskaggregation, 174‒175
IPI (inheritance population initialization), 19‒21, 20
IRS benchmark, and MPI/OpenMP model, 166‒167, 167, 169, 170, 171
itinerary-based infrastructure-free KNN queries, 511‒516, 512‒515, 519‒520
concurrent structures, 515, 515‒516
DIKNN (density-aware itinerary-based KNN), 516‒517
KNN boundary estimation phase, 512, 512, 516, 518, 519
PCIKNN (parallel concentric-circleitinerary-based KNN), 518‒519, 518
query dissemination/data collection phase, 512, 512‒515, 513‒515, 517
Jaccard similarity coefficient, 486‒487, 488, 489, 499, 499, 501
Jensen-Shannon (JS) divergence, 486
Job Data Flow (JDF) in DAGuE engine, 709‒712, 709, 710, 719‒720, 720‒721, 725, 726
job scheduling. See scheduling
JS (Jensen-Shannon) divergence, 486
junction trees, 420‒435, 421, 428, 432
k-dominating sets, 238
kernel argument caching, 684
kernel code in OpenCL, 604‒605, 607‒609
kernel execution in VOCL, 688
kernel-level threads, 69
kernels, 677
k-fold cover t-sets, 239‒244, 243, 250
k-fold dominating sets, 238‒239, 250
k-fold pivot routing, 238‒244, 243
KL (Kullback‒Leibler) divergence, 484‒486
K-means algorithm, 528
k-nearest neighbor, 474‒476, 475
in mobile sensor networks, 507‒508, 519‒520
in-network approach, 508, 508‒520, 510‒515, 518
KNN. See k-nearest neighbor
KNN perimeter tree (KPT), 511, 511
KPT (KNN perimeter tree), 511, 511
Kullback‒Leibler (KL) divergence, 484‒486
Lanczos algorithm, 480
latency, 630
latent semantic indexing (LSI), 485‒486
leaf nodes, in video streaming, 31, 33
legacy code and CPU/GPU systems, 82
Lempel-Ziv-Welch (LZW) algorithm, 344, 345‒346, 346, 347, 347, 352
Levenshtein distance, 490‒492, 491, 497, 499, 500, 501
linear ranking method in combinatorial optimization problems, 403
link quality index (LQI), 15
list partitioning, 530
LJFR-SJFR (longest job to fastest resource-shortest job to fastest resource) method, 403
load balancing, 438
DLS-with-RL approach to, 439, 444‒463, 446, 447, 450, 451, 452, 453, 454, 456, 457, 459, 460
and dynamic loop scheduling algorithms, 438, 440, 441
and reinforcement learning, 439, 444‒445
load imbalance, 437‒438, 439‒440
load-scalable systems, 630
local alignment, 494‒495, 495, 496, 497
local computation in mobile computing, 145‒146, 146
local memory, 601, 601, 604, 605, 608‒609
local partitioned indexes, 530‒531
local storage and many-core architectures, 74‒75
local variables in POET, 266
locality-aware overlay construction, 42‒43
locality-aware programming in Chapel, 73
localized approach for service provider selection, 667, 668, 669, 671
locally twisted cubes, diagnosability of, 114‒115, 115
location management, 185‒186, 187, 187‒191, 188
location registration. See location update
location update, 185‒186, 188‒189
location update cost minimization with terminal paging constraint, 196‒201, 203, 204‒205
terminal paging cost minimization with location update constraint, 192‒196, 201‒202, 202‒203
location-aware topology matching (LTM), 42
log-based checkpoint protocols, 745
longest job to fastest resource-shortest job to fastest resource (LJFR-SJFR) method, 403
longevity, relationship with reputation, 312‒316, 312, 313, 314, 315
longwave radiation process. See RRTM_LW scheme
loop parallelization, 258, 258‒259
loop scheduling algorithms, 438, 440, 441, 444‒445
loop-invariant code motion, 261, 262
lossy links, 13
lossy/lossless compression, 345
LQI (link quality index), 15
LSI (latent semantic indexing), 485‒486
LTM (location-aware topology matching), 42
LU factorization, 728
LZW algorithm. See Lempel-Ziv-Welchalgorithm
reinforcement learning, 439, 442‒445, 443
supervised learning, 442
unsupervised learning, 442
makespan, 400‒401, 402, 410, 414, 415, 416, 416
manager-worker pattern, 216
Mandelbrot set and FraSPA, 218, 219, 220, 221
many-core computing, 55‒60, 57, 74‒77. See also multicore computing
MapReduce, 223
Markov models, 235, 236, 236, 237, 472‒473, 643
master, in Green's function Monte Carlo, 752
master-worker strategy in loop scheduling algorithms, 455
matching, 99
matrix computations and decomposition, 717‒718
matrix multiplication in cloud computing, 281‒285, 282
matrix transpose application kernel, 689, 691, 691, 691‒692, 693, 695, 695
Maze P2P file-sharing system, 299
MBRs (minimum bounding rectangles) in peer-to-peer indexing, 509‒510
MC (MPI + CUDA) hybrid programming pattern, 84‒85, 85
MCDSs (minimum connected dominating sets), 10‒11, 13
MCT (minimum completion time) heuristics, 403
mean time to failure (MTTF), 649, 650, 651‒653, 651‒653
medical applications of embedded systems, 632‒633
membership management protocols for random overlay construction, 36‒37, 38
in AMD Phenom II processor, 65
in IBM POWER7 processor, 66
in Intel Core i7 processor, 65
in Intel Single-Chip Cloud Computer, 76
levels, 62
in many-core architectures, 74
in Oracle SPARC T3 processor, 66‒67
scrubbing, 740
and Sequoia programming model, 71
in Sony/IBM/Toshiba Cell BE processor, 67
subsystems, 637
merge operations and hill climbing algorithm, 540‒542, 541
Meridian proximity-aware overlay construction scheme, 40‒41, 41
mesh cell structure for wireless networks, 187, 188
mesh overlay structure in video streaming, 32, 33‒34, 35
mesh pattern, 216
message-logging fault tolerance protocols, 579
Message-Passing Interface. See MPI
Metro chip, 75
Microsoft, 153
middleware, 638
minimum bounding rectangles (MBRs) in peer-to-peer indexing, 509‒510
minimum completion time (MCT) heuristics, 403
minimum connected dominating sets (MCDSs), 10‒11, 13
minimum coverage model, 234, 234, 235, 236, 236
minterm generation-based partitioning approaches, 527, 527
MM model for determining diagnosability, 97, 98, 101, 101
MM* model for determining diagnosability,
98, 101‒103, 101, 102, 103, 105‒110, 106, 107, 109, 110, 120, 122
MMC. See mobile multicore computing
mobile multicore computing, 145‒147, 154‒155
company-specific initiatives, 152‒153
cores, addition of, 154
dynamic voltage frequency scaling in, 147‒148, 149‒150
and embedded computing research initiatives, 153
GPGPUs in, 147, 149, 150, 152, 153, 154‒155
multitasking applications, 148, 150‒151, 152, 153
power scaling, 147‒148, 150‒151, 152, 153
software-driven energy efficiency, 153‒154, 154
mobile sensor networks, 507‒520, 508, 510‒515, 518
mobility issues in density-aware itinerary-based KNN, 517
Möbius cubes, diagnosability of, 112, 113
MOC (MPI + OpenMP/CUDA) hybrid programming pattern, 85, 85‒86
model encapsulation, 644
model translation, 644
modeling
of embedded systems, 631, 638‒644, 640, 654‒655
multiparadigm, 643
of user behavior, 471, 472‒473, 473, 497, 498
models, 638
Monge-Elkan algorithm, 496‒497, 499, 501
move mutation in combinatorial optimization problems, 403
movement-based location update, 188
MPD (multipurpose daemons), 578
MPD runtime environment, 582‒583, 583
MPI (Message-Passing Interface), 68, 70, 83‒84, 211, 564, 761‒762
collective/group communication in, 764‒765
communicators, 762
and DAGuE engine, 722‒723, 724‒725
data types, 762
and explicit parallelization, 210‒211
and FraSPA, 211
ranks, 762
and software fault resilience, 746‒747
tags, 762
two-sided communication in, 763‒764
See also MPI/OpenMP model; MPI + CUDA hybrid programming pattern; MPI + OpenMP/CUDA hybrid programming pattern
MPI/OpenMP model, 158‒160, 159, 160, 182
AMG benchmark, 164, 164, 166‒167, 168, 168, 169, 170, 171
and dynamic concurrency throttling, 159‒160, 164‒165, 164, 168, 169, 170, 171
and dynamic voltage and frequency scaling, 159‒160, 161, 165‒166, 168, 169, 170, 171
and frequency, 161‒162, 165‒166
IRS benchmark, 166‒167, 167, 169, 170, 171
performance evaluation, 166‒168, 167, 168, 169, 170, 171
and power consumption, 160, 160‒162, 168, 170
and profile-driven static mapping, 164‒165, 164
and THREAD_MASTERONLY model, 158‒159
time prediction, 162‒163, 163, 164
MPI + CUDA hybrid programming pattern, 84‒85, 85
MPI + OpenMP/CUDA (MOC) hybrid programming pattern, 85, 85‒86
M-to-N models, 69
MTTF (mean time to failure), 649, 650, 651‒653, 651‒653
multicore computing, 55‒60, 57, 77, 157, 181‒182, 699‒701, 732‒733
and communication efficiency, 700
on-chip interconnection networks, 64
and power consumption, 60, 65‒66
programming models for, 67‒74, 77
and simultaneous multithreading, 61‒62, 61 See also many-core computing
multi-instantiation of coding, 535, 535
multilevel genetic schedulers, 408‒412, 409‒414, 415, 414‒416, 416
multiparadigm modeling, 643
multipurpose daemons (MPD), 578
multiple issue of instructions, 58‒59
multiple leader distributed algorithms for connected dominating sets, 11, 12
multiple-tree overlay structure in video
multitasking, 148, 150‒151, 152, 153
multiunit embedded systems, 629‒630, 646
NAS Parallel Benchmarks (NPBs), 87‒88
N-body application kernel, 689, 691, 691, 691‒692, 693, 695, 695
Needleman-Wunsch algorithm, 492‒494, 493, 494, 497, 499, 500, 501
neighborhoods of similarity, 472, 473, 474
k-nearest neighbor model, 474‒476, 475, 481
SVD (singular value decomposition) model, 476‒479, 476‒479, 480‒481, 481, 498
See also similarity metrics
Netflix, 480
network access time, 468
network coordinate systems, 39‒40
network fault resilience, 740‒742
network-aware overlay construction, 35, 39‒42, 41
networked embedded systems, 629‒630, 646
network-on-chip (NoC), 645
NoC (network-on-chip), 645
no-compression scheme, 348, 350, 360‒362, 361, 362
node reputation, time to estimate, 316
node state determination in adaptive compression algorithm, 357‒359, 360
node-to-node delivery ratio and probabilistic network model, 16
Nokia, 153
nonadaptive dynamic loop scheduling algorithms, 441
nonblocking communication in MPI, 763
nonblocking coordinated checkpoint protocols, 744
nonstop hot-replacement-based fault recovery, 749‒750
NPBs (NAS Parallel Benchmarks), 87‒88
NPS network coordinate systems, 40
numerical stability, 479
NVIDIA GPU, 677
object-oriented programming in Chapel, 73
ODEs (ordinary differential equations), 641‒642
ODLS (Open RTE Daemon's Local Launch Subsystem) in ORTE, 585
off-chip memory in GPUs, 593, 677
OGST (Opportunistic Grid SimulationTool), 788‒789, 790
on-chip memory in GPUs, 677
on-demand approach to connecting processes, 568
(1, 2)-matching composition networks(MCNs), 99, 99, 103‒105
one-domain HP problem, 535‒538
one-hop neighborhood and probabilistic network model, 15
1-to-N/1-to-1 models, 69
OOB (out-of-band) messages, 566, 567
OOB framework in ORTE, 585‒586
Open Computing Language. See OpenCL
Open RTE Daemon's Local Launch Subsystem (ODLS) in ORTE, 585
OpenCL (Open Computing Language), 598, 599, 602, 677
local memory optimization, 604, 605, 608‒609
in mobile multicore computing, 151, 152
See also VOCL (virtual OpenCL)
OpenMP phase groups, 159, 159, 160, 161, 165
OpenMP programming model, 70, 84. See also MPI + OpenMP/CUDA hybrid programming pattern
operating systems (OS), 637‒638
Opportunistic Grid Simulation Tool (OGST), 788‒789, 790
opportunistic grids, 365‒366, 367, 382
and dedicated clusters, 366
and power consumption, 366, 368‒369, 372‒374, 372‒373, 376‒380, 377, 377, 378, 378, 379, 380, 380, 382
and virtualization, 366, 367, 368, 369‒370, 382
See also UnaGrid
optimistic logging, 745
Oracle
horizontal positioning in, 529‒531, 529
ordinary differential equations (ODEs), 641‒642
OS (operating systems), 637‒638
out-of-band (OOB) messages, 566, 567
output devices in embedded systems, 637
Overcast single-tree streaming, 32‒33
overhead
of algorithm, 10
in VOCL, 682, 683, 683‒684, 684, 688, 688, 691‒692, 693‒694, 696
overlay structures in video streaming construction of, 34‒45, 38, 41, 43, 44, 46‒47
unstructured meshes, 32, 33‒34, 35
owned test beds, 778
packet delay, 348‒350, 349, 351, 358‒359
packet distribution game (PDG), 45, 46
packet exchange game (PEG), 45‒46
packet generation rate, 349, 350, 351
packet waiting time, 356
packets in chip multiprocessors, 127, 128
Pandora.com, 475
parallel concentric-circle itinerary-based KNN (PCIKNN), 518‒519, 518
parallel loop pattern, 216
parallel programming, 209‒210, 759‒760, 775
ADLB (asynchronous dynamic load balancing library), 771‒772
CAF (Co-Array Fortran), 765, 767‒768, 767
computation model, 760
data model, 760
distributed-memory model, 564
expression of parallelism, 760‒761
Fortress, 774
GA (Global Arrays), 765, 768‒769
high-productivity models, 772‒774
PGAS (partitioned global address space)models, 765‒769, 765, 767
performance model, 761
Scioto (scalable collections of task objects), 770‒771
UPC (Unified Parallel C), 765‒766
partial differential equations (PDEs), 642
partially matched crossover (PMX) method
in combinatorial optimization
problems, 403
partition independence, 524
partitioned global address space models. See PGAS models
partitioning, 524
affinity-based approaches, 527, 528
composite, 530
hardware/software (HW/SW), 639
horizontal. See horizontal partitioning
interval, 530
list, 530
minterm generation-based approaches, 527, 527
virtual column, 530
partition-wise joins, 524
PASIS architecture, 324
PCIKNN (parallel concentric-circle itinerary-based KNN), 518‒519, 518
PCoord network coordinate systems, 40
PDEs (partial differential equations), 642
PDG (packet distribution game), 45, 46
for overlay construction, 34‒45, 38, 41, 43, 44, 46‒47
peer-to-peer indexing structure, 509‒510, 510
peer-to-peer (P2P) networks, 31, 291
peer-to-peer video streaming systems, 31
PeerTrust reputation management system, 293
PEG (packet exchange game), 45‒46
performance compared to scalability, 630
performance model, 761
permutation-based encoding, 402‒403
persistence-based service recovery method, 669
persistent communication in MPI, 763
Personalized Trust (PET) reputation management system, 296
pervasive computing environments (PvCEs), 659‒661, 661
dynamicity in, 663
scalability in, 660, 663‒664, 665, 666‒670
service composition in. See service composition
pessimistic logging, 745
PET (Personalized Trust) reputation management system, 296
Petri nets, 643
PGAS (partitioned global address space) models, 564, 765, 765
CAF (Co-Array Fortran), 765, 767‒768, 767
GA (Global Arrays), 765, 768‒769
UPC (Unified Parallel C), 765‒766
physical design simulator tool (SimulPh.D), 553‒556, 555, 556‒558, 558‒559, 560
PIC network coordinate systems, 40
pipeline pattern, 216
pipelines, 58
pivot routing, 238‒244, 243, 250
places, in X10 programming model, 72
PLASMA, 712, 713, 718‒725, 720‒721, 723, 724. See also DAGuE engine; DPLASMA
PLM (Process Life Cycle Management), in ORTE, 585
PMC model for determining diagnosability,
97, 100‒101, 100, 103‒105, 111, 112, 114, 115, 116, 117‒119, 120‒121
PMs (process managers) in MPD runtime environment, 583
PMX (partially matched crossover) method in combinatorial optimization problems, 403
PNM (probabilistic network model), 13, 14‒17, 27‒28
POET, 255, 262‒263, 263, 264‒265, 265, 267‒268, 268, 268, 270, 271‒272, 272, 272, 273
associative maps, 266
code templates, 266
control flow, 267
correctness of optimized code, 271
lists, 265
parameterized optimizations, 268, 268, 270, 270‒271, 271
script, 267
tagging input code for optimization, 268, 269, 270
tracing optimizations of input code, 268, 269‒270, 269, 270
tuples, 266
xform handles, 266
pointer jumping, 419‒421, 420, 422, 434
and AMD Magny-Cours-based system, 432‒434, 433
and directed acyclic graph (DAG)-structured computations, 427
and exact inference, 420, 421‒422, 421, 424, 426
experimental implementation, 428‒434, 428, 431‒434
and Intel Nehalem-EX-based system, 419, 428, 432, 431‒432, 434, 434
point-to-point communications, 569
pollution avoidance, reputation systems for, 294‒295
polymorphism, and Chapel, 73
POSIX Threads, 68
power consumption, 60, 158, 161‒162
in AMD Phenom II processor, 65‒66
and dynamic voltage and frequency scaling, 165‒166
in Intel Core i7 processor, 65
and KNN queries in mobile sensor networks, 512, 516, 517, 518‒519, 520
in many-core architectures, 74
and MPI/OpenMP model, 160, 160‒162, 168, 170
and multicore computing, 60, 65‒66
and opportunistic grids, 366, 368‒369, 372‒374, 372‒373, 376‒380, 377, 377, 378, 378, 379, 380, 380, 382
and task aggregation, 170, 172, 178, 179
in Teraflops Research Chip, 60
and UnaGrid, 372‒374, 372‒373, 376‒380, 377, 377, 378, 378, 379, 380, 380
in wireless sensor networks, 228, 229, 239, 240
power function model and growth prediction in cloud computing, 623
power processing element (PPE) in Cell CBE, 590, 590
Power Processor Element (PPE) in Sony/IBM/Toshiba Cell BE processor, 67
power scaling in mobile multicore computing, 147‒148, 150‒151, 152, 153
POWER7 processor, 66
PowerTrust reputation management system, 307‒309
PPE (power processing element) in Cell CBE, 590, 590
PPE (Power Processor Element) in Sony/IBM/Toshiba Cell BE processor, 67
predicates clustering, and decomposition of domains, 532
predictive search engines, 470‒472, 497‒498
and behavior modeling, 471, 472‒473, 473, 497, 498
k-nearest neighbor model, 474‒476, 475
neighborhoods of similarity, 472, 473, 474‒481, 475, 476‒479, 481, 497‒498
product recommendation algorithm (Amazon), 474
prefetching approach to connecting processes, 567‒568
preliminary design in embedded systems design, 639
privacy concerns in cloud computing, 277, 278, 280, 286, 288
private memory in GPUs, 600‒601, 601
private-memory data model, 760
probabilistic network model (PNM), 13, 14‒17, 27‒28
probability to function, 234‒235
problem sets, high-dimensional/highly dynamic, 3‒4
Process Life Cycle Management (PLM), in ORTE, 585
process managers (PMs) in MPD runtime environment, 583
process-driven techniques for software fault resilience, 746‒747
processing units in embedded systems, 636
processor fault resilience, 738
processors in embedded systems, 635, 636
product recommendation algorithm (Amazon), 474, 488
production in embedded systems design, 639
profile-driven static mapping, 164‒165, 164
defined, 759
Intel Array Building Blocks, 70
Intel Threading Building Blocks, 70‒71
MPI. See MPI (Message-Passing Interface)
for multicore computing, 67‒74, 77
Sequoia, 71
X10, 72
programming paradigm, 3
provided services description in service composition framework, 661
proximity-aware overlay construction, 35, 39‒42, 41
Pseudo Trust (PT) reputation management system, 304
PT (Pseudo Trust) reputation management system, 304
P2301 standard, 2
P2P (peer-to-peer) networks, 31, 291
P2PRep reputation management system, 302
punishment mechanism in Sorcery, 302
PvCEs. See pervasive computing environments
Q-learning, 444, 446, 447, 456, 458, 459, 459, 460, 460, 461, 462
QPI (Intel QuickPath Interconnect), 64
QR factorization, 712, 713, 718‒727, 720‒721, 723, 724, 726, 727
query entry process in search engines, 469, 469
query processing on relational databases, in cloud computing, 286‒288, 287, 287
query-driven approaches to domain decomposition, 531, 532
query-response system in search engines, 468, 468
queueing models, 353, 353, 643
and adaptive compression algorithm, 353‒357, 353, 356
queueing analysis for sensor node, 355‒357, 356
queuing network modeling, 132‒136, 133, 136, 139‒140, 141
QuickPath Interconnect (QPI), 64
radar cross-section problem and dense linear system solvers, 717
RAID (redundant array of independent disks), 742‒743
random distribution
and hill climbing algorithm, 540, 549, 549, 550
and simulated annealing algorithm, 543, 550, 551
random overlay construction, 34, 36‒37
random Zipf distribution
and hill climbing algorithm, 539, 540, 549, 549, 550
and simulated annealing algorithm, 550, 551
ranks in MPI, 762
RAS (Resource Allocation System) in ORTE, 585
rate monotonic (RM) algorithm, 388
reactive embedded system, 629
ready send in MPI, 763
real test beds, 778
real-time application development and worst-case execution time, 638
real-time tasks, 387
execution time, adjustment of, 391‒392, 393
REBUILD mode in FT-MPI, 746
reciprocative decision function, 300‒301
reconfiguration and composition plan, 669
recursive circulant, diagnosability of, 116‒118, 117
RedMPI, 747
redundancy elimination, 257, 261‒262, 261
redundancy-based service recovery, 669
redundant array of independent disks (RAID), 742‒743
redundant evaluation elimination, 261, 262
redundant execution, 738
referential partitioning mode, 530
regular send in MPI, 763
reinforcement learning, 439, 442‒445, 443
DLS-with-RL approach to load balancing, 439, 444‒463, 446, 447, 450, 451, 452, 453, 454, 456, 457, 459, 460
model-free, 443
in scientific applications, 444‒445
relevance, in search engines, 467‒468
reliability
and network fault resilience, 741
of wireless sensor networks, 227‒229, 235‒237, 236, 237, 653‒654, 654
reliable minimum-sized connected dominating sets
genetic algorithm, 13, 14‒15, 17‒28, 18, 20, 23, 23, 24, 25, 26, 27
reliable rating aggregation system, 305
relocation-based service recovery method, 669
remote computation in mobile computing, 145‒146, 146
remotely accessible memory, 760
repair, in composition plan, 669
replicable pattern, 216
reputation, 312‒316, 312, 313, 314, 315. See also reputation management systems
reputation management systems, 291‒292, 316‒317
and accountability, 299
and accuracy, 292
accurate trust model, 312‒316, 312, 313, 314, 315
and causes of misbehavior, 316
and collusion, 292, 299, 300, 309, 310, 311
Credence, 294
decentralized recommendation chains, 305‒306
fine-grained system (Zhang/Fang), 306‒307
FuzzyTrust, 294
and intelligent adversaries, 316
Maze P2P file-sharing system, 299
node reputation, time to estimate, 316
PeerTrust, 293
Personalized Trust (PET), 296
Pseudo Trust (PT), 304
P2PRep, 302
reciprocative decision function, 300‒301
and reliability, 299
reliable rating aggregation system, 305
reputation-based fines in electronic markets, 305
reputation-based trust management system, 298
robust system (Buchegger/Boudec), 306
and scalability, 292
Scrubber, 294
and security, 299
and social network environment, 309‒311, 311, 316
stamp trading schemes/protocols, 299‒300
trust inference system, 297
TrustMe, 304
Xrep/ X2Rep, 301
reputation protocols, 299, 300
reputation-based fines in electronic markets, 305
reputation-based trust management system, 298
requirement description in service composition framework, 661‒662
requirement specifications in embedded systems design, 639
Resource Allocation System (RAS) in ORTE, 585
resource management and scalability, 157‒158
Resource Mapping System (RMAPS), in ORTE, 585
response time in search engines, 467‒468
restricted growth functions, 535, 536
r-hop neighborhood and PNM, 15‒16
risk evaluation and PET reputation management system, 296
RL. See reinforcement learning
RM (rate monotonic) algorithm, 388
RMAPS (Resource Mapping System), in ORTE, 585
RMCDS-GA (reliable minimum-sized connected dominating sets genetic algorithm),
13, 14‒15, 17‒28, 18, 20, 23, 23, 24, 25, 26, 27
RML (Runtime Messaging Layer)framework in ORTE, 586
robustness to mobility, and topology control, 10
roulette wheel selection (RWS), 22, 23, 23
ROUTED framework in ORTE, 585
RRTM_LW scheme, 88‒89, 89, 89, 90‒91, 90‒91, 93‒94, 93
RTS/CTS mechanism, 355
rule generator in FraSPA, 212, 213
runtime environments, 563, 565‒566, 586
application deployment, 566‒567, 571‒576, 574, 576
communication infrastructure, 566‒570
daemons in, 566, 569, 570, 571, 572, 573, 574, 576, 577, 578
and fault tolerance, 577‒581, 581, 585
and parallel programming models, 564
and portability, 566
topologies in, 566, 569‒570, 573‒576, 574, 576, 577‒578, 582, 585
Runtime Messaging Layer (RML) framework in ORTE, 586
RWS (roulette wheel selection), 22, 23, 23
sample-and-hold circuits in embedded systems, 636
SAP (secure allocation processing) algorithm, 332‒333
SARSA, 444, 446, 447, 456, 458, 459, 459, 460, 460, 461, 462
scalability, xix, 1, 3‒4, 157‒158
of application deployment in runtime environments, 575‒576, 576
of Cholesky factorization, 730, 730
of communication infrastructure in runtime environments, 569‒570
of embedded systems, 630‒631, 645, 646
of energy-saving opportunities, 167
geographically scalable systems, 630
of GPU applications, 81‒82, 94
load-scalable systems, 630
in modeling/verification, 639‒640
and opportunistic grids, 366‒368
performance compared to, 630
in pervasive computing environments, 660, 663‒664, 665, 666‒670
and reputation management systems, 292
and resource management, 157‒158
of service composition, 660, 663‒664, 665, 666‒670, 671
scalable computing/communications, xix, 1‒4
scalable programming paradigm, 3
scalable programming patterns for CPU/GPU systems, 84‒86
ScaLAPACK numerical library, 728‒729, 728, 731
scalar replacement, 259, 260, 260‒261
scale-out/scale-up in embedded systems, 630
SCC (Intel Single-Chip Cloud Computer)76
schedulability region, 390, 391
scheduling, 398
and cumulative objective function, 401‒402
dynamic, 400, 408, 409, 410, 413, 414, 416, 416
and fitness, 399, 401‒402, 410, 411‒413
and flowtime, 401, 402, 410, 414, 415, 416, 416
GA + TS hybrid grid scheduler, 396, 404‒406, 405, 414, 414, 415, 415, 416, 416, 417
HGS-Sched, 396, 406‒408, 406, 412, 413, 414, 415, 415, 416, 416, 417
and makespan, 400‒401, 402, 410, 414, 415, 416, 416
of real-time tasks, 388, 389‒391, 393
single-/multilevel genetic schedulers, 408‒412, 409‒414, 415, 414‒416, 416
single-population genetic schedulers, 396, 402‒404, 402, 409‒411, 410, 414, 415, 415, 416, 416, 417.
See also GA + TS hybrid grid scheduler
static, 400, 408, 409, 410, 413, 414, 416, 416
scheduling-based exact inference (SEI), 429‒434, 431‒433
Schelling Manager (SchMng), 789‒790, 791, 792, 793, 794, 795
SchMng (Schelling Manager), 789‒790, 791, 792, 793, 794, 795
scientific applications
iterative, 440
load imbalance in, 437‒438, 439‒440
and machine learning, 441‒445, 443
reinforcement learning in, 444‒445
time-stepping applications, 438, 440, 440
scientific codes optimization, 255, 256‒257, 272‒273
computation reordering, 257‒259, 258
data layout reordering, 259‒261, 260
empirical tuning, 262‒263, 263, 264‒265, 265, 265‒272, 268, 268, 269, 270, 271, 272, 272
redundancy elimination, 257, 261‒262, 261
Scioto (scalable collections of task objects), 770‒771
scratchpad memory, 637, 722, 723
Scrubber reputation management system, 294
SDC (silent data corruption), 747
and data caching, 469
most popular searches, 501‒502, 502
network access time, 468
query-response system, 468, 468
search time, 468, 469‒470, 470
and web caching, 469
See also predictive search engines
secure allocation processing (SAP) algorithm, 332‒333
security in distributed storage systems, 321‒322, 323, 339
file fragmentation/allocation, 321‒322, 324‒325, 326, 327‒328, 328, 339
heterogeneous vulnerabilities, 322, 324, 327‒328, 328, 339
intrusion detection/tolerance, 323
See also S-FAS fragmentation allocationscheme
seed daemons, 570, 571, 573, 575
SEI (scheduling-based exact inference), 429‒434, 430‒433
selfish overlay construction, 46‒47
biosensors, 227
in embedded systems, 636
spatial distribution of, 229
separate CPU/GPU systems, 593, 594
sequence alignment with mpiBLAST, 750‒752, 751
Sequoia programming model, 71
server placement in cloud computing, 614‒620, 615
server type and security, 322, 323, 327‒328, 328
service adaptation in service composition framework, 662‒663
service composition, 659‒660, 671
composition plan, 662, 669‒670
dynamic, 663
and network size, 668
provided services description, 661
and request complexity, 668‒669
requirement description, 661‒662
scalability of, 660, 663‒664, 665, 666‒670, 671
service monitors, 663
service provider selection, 662, 664, 665‒666, 666‒670
static, 663
service locality and service composition, 670
service monitors in service composition framework, 663
service provider selection, 662, 664, 665‒666, 666‒670
service user in cloud computing, 279
SEs (SIMD Engines), 591‒592, 592
S-FAS fragmentation allocation scheme, 323, 339‒341
allocation principles, 332
dynamic assurance, 331‒332, 333, 337, 337‒338, 338
and heterogeneity, 322, 323, 324, 328‒329, 332, 333, 334, 334‒335, 340‒341
and number of file fragments, 335‒336, 336, 338, 339, 340
and number of fragments transmitted across storage clusters, 337‒338, 338
performance evaluation, 338‒339, 340
and probability of fragment interception, 337, 337
prototype architecture/design, 333
secure allocation processing algorithm, 332‒333
and size of server groups, 335, 335
storage assurance, 330‒331, 333‒336, 334, 335, 336
and threshold, 336
SFTrust reputation management system, 303‒304
SGEMM/DGEMM application kernel, 689, 691, 691, 691‒692, 693, 695, 695
shared caches, in multicore computing, 62‒63
shared history and reciprocative decision function, 300
short-term history and reciprocative decision function, 300‒301
SHRINK mode in FT-MPI, 746
silent data corruption (SDC), 747
SIMD Engines (SEs), 591‒592, 592
similarity coefficients, 487
similarity metrics, 481
string-based, 472, 481, 489‒497, 491, 493‒495, 497, 498, 499, 501
vector-based, 472, 481‒489, 482, 482, 484, 497, 498, 499, 499, 501
simulated annealing algorithm, 543, 543, 550, 551, 552, 552‒553, 553
in chip multiprocessor performance analysis, 130‒131, 130, 136‒139, 140‒141
in grid testing, 777‒790, 780‒787, 789‒790, 791, 792, 793, 794‒795, 794
See also grid simulation tools
SimulPh.D (physical design simulator tool), 553‒556, 555, 556‒558, 558‒559, 560
simultaneous multithreading (SMT), 61‒62, 61, 132, 133
single leader distributed algorithms for connected dominating sets, 11‒12
single processor/uniprocessor systems, 55‒57, 61‒62, 61, 157
single tree overlay structure in video streaming, 31, 32‒33, 35
Single-Chip Cloud Computer (SCC), 76
single-level genetic schedulers, 408‒412, 409‒414, 415, 414‒416, 416
single-point crossover operation, 24, 25
single-population genetic schedulers, 396, 402‒404, 402, 409‒411, 410, 414, 415, 415, 416, 416, 417.
See also GA + TS hybrid grid scheduler
single-precision computations and VOCL, 691, 692, 693, 695, 695
single-program, multiple-data (SPMD) programming, 701
single-unit embedded systems, 629, 645
singular value decomposition (SVD) model, 476‒479, 476‒479, 480‒481, 481, 498
SL (supervised learning), 442
smart sensors, 227
smartphones, 145, 149, 149, 150
Smith-Waterman algorithm, 494‒495, 495, 496, 497, 499, 501, 689, 691, 691, 691‒692, 693, 695, 695
SMP (symmetric multiprocessing), 152
SMT (simultaneous multithreading), 61‒62, 61, 132, 133
SNAPC (Snapshot Coordination Interface) in ORTE, 586
Snapshot Coordination Interface (SNAPC) in ORTE, 586
social network mechanism in Sorcery, 301‒302
social networks, and reputation management systems, 309‒311, 311, 316
SocialTrust reputation management system, 309‒311, 311
soft real-time systems, 388
software engineering, 4
software fault resilience, 738, 743, 753
checkpointing, 743‒745, 743, 748
enhancements to parallel programming models, 745‒748
software-driven energy efficiency in mobile multicore computing, 153‒154, 154
solid-state drives (SSDs), 742
Sony/IBM/Toshiba Cell BE processor, 64, 67
Sorcery reputation management system, 301‒302
source-to-source compiler (SSC) in FraSPA, 212, 212, 213
space/aerospace applications of embedded systems, 632
space-redundant execution, 738
spatial irregularity in density-awareitinerary-based KNN, 516‒517
SPE (SynergisticProcessor Processor Element) in Sony/IBM/Toshiba Cell BE processor, 67
SPIRAL, 223
split operations and hill climbing algorithm, 540‒541, 541, 542
SplitStream multiple-tree streaming, 33
SPMD (single-program, multiple-data) programming, 701
sprouting operations, 407
SPs (stream processors), 591
SPUs (synergistic processing units), 590, 590‒591
square 1-fault-tolerant model, 232‒233, 232, 234, 234, 235, 236, 237
SSC (source-to-source compiler) in FraSPA, 212, 212, 213
SSDs (solid-state drives), 742
stamp trading schemes/protocols, 299‒300
standards
FlexRay, 634
IEEE P2301, 2
star schemas, 523‒524, 525‒526
star topology, 570, 573, 574, 575‒576, 576, 578
state machines, 642
static scheduling, 400, 408, 409, 410, 413, 414, 416, 416
STATIC scheduling algorithm, 458
static variables in POET, 266
steady-state strategy in combinatorial optimization problems, 403‒404, 410
ST-Ericsson/ARM/Google collaboration, 152
storage fault resilience, 742‒743
stream processors (SPs), 591
streaming languages in mobile multicore computing, 151‒152
StreamIt, 151
stretch factors and topology control, 9‒10
string-based similarity metrics, 472, 481, 489, 497, 501
converting distance to similarity, 490
Levenshtein distance, 490‒492, 491, 497, 499, 500, 501
Needleman-Wunsch algorithm, 492‒494, 493, 494, 497, 499, 500, 501
Smith-Waterman algorithm, 494‒495, 495, 496, 497, 499, 501
string alignment, 492‒495, 493‒495, 496
Wagner-Fischer algorithm, 490‒491, 491, 495‒497, 499, 501
structural parallelism, 420‒435
subtraction-based distributed algorithms forconnected dominating sets, 11
supervised learning (SL), 442
SVD (singular value decomposition) model, 476‒479, 476‒479, 480‒481, 481, 498
swap mutation in combinatorial optimization problems, 403
symmetric multiprocessing (SMP), 152
synchronization in MPI, 764
synchronous send in MPI, 763
syndrome of diagnosis, 100, 101, 102
synergistic processing units (SPUs), 590, 590‒591
SynergisticProcessor Element (SPE) in Sony/IBM/Toshiba Cell BE processor, 67
system integration in embedded systems design, 639
system verification/evaluation in embedded systems design, 639
system virtual machines, 368
system-level checkpointing, 744‒745
systems software level checkpointing, 743‒745, 743, 748
tags in MPI, 762
Tanimoto coefficient, 487, 489, 499, 500, 501
task aggregation, 170, 172‒173, 173, 182
aggregation patterns, 172, 173, 173, 174, 175, 178‒181, 180‒181, 182
communication, 173, 175, 176, 177, 178
computation, 173, 174‒175, 178
and power consumption, 170, 172, 178, 179
task class, in DAGuE Job Data Flow, 709‒710, 711, 712
task execution time, adjustment of, 391‒392, 393
task grouping in task aggregation, 175‒177, 182
task-driven techniques for software faultresilience, 748
task-parallel programming models, 769
ADLB (asynchronous dynamic load balancing library), 771‒772
t-diagnosable systems, 100‒103, 101, 102, 103
telemedicine, 633
Teraflops Research Chip, 60, 76‒77
location update cost minimization with
terminal paging constraint, 196‒201
terminal paging cost minimization with
location update constraint, 192‒196
test beds, 778
Texas Instruments, 152
TH-1A (TianHe-1A) supercomputer, 82‒83, 83, 90, 90‒91, 91
THREAD_MASTERONLY model, 158‒159
Threading Building Blocks, 70‒71
kernel-level, 69
migration, 134
and MPI + OpenMP/CUDA hybrid programming pattern, 86
and multitasking, 148
and performance analysis of chip multiprocessor, 126, 127, 128, 129, 129, 130, 131, 132‒134, 133, 135, 136, 140
POSIX, 68
simultaneous multithreading, 132, 133
user-level, 69
threshold-based approaches to horizontal partitioning, 527, 527, 528
TianHe-1A (TH-1A) supercomputer, 82‒83, 83, 90, 90‒91, 91
tile QR algorithm, 712, 713, 718‒727, 720‒721, 723, 724, 726, 727
TILEPro64 chip, 76
Tilera TILEPro64 chip, 76
time prediction, and MPI/OpenMP model, 162‒163, 163, 164
time-based location update, 188
time-redundant execution, 738
time-stepping applications, 438, 440, 440, 444, 445, 449‒451, 450, 451
and algorithm overhead, 10
total search time in search engines, 468, 469‒470, 470
transformational embedded system, 629
transmission range, 246‒249, 248, 249, 250, 251
transmission success ratio (TSR), 13, 15, 21‒22
tree-based topologies, 570‒571, 573‒576, 574, 576, 578
Trilinos, 4
trust inference system, 297
TrustGuard reputation management system, 293‒294
TrustMe reputation management system, 304
TS algorithm, 404‒406, 405, 412, 414, 414.
See alsoGA + TS hybrid grid scheduler
TSR (transmission success ratio), 13, 15, 21‒22
twisted cubes, diagnosability of, 112‒115, 114, 115
2-matching composition networks (MCNs), 105‒110, 106, 107, 109, 110
two-point crossover operation, 24, 25
two-sided communication in MPI, 763‒764
UL (unsupervised learning), 442
UML, 643
UnaGrid, 367, 368, 369, 374‒375, 382
CPU performance, 375‒376, 376, 376
intrusion level, 375, 375‒376, 376, 376
performance degradation, 381, 381‒382
power consumption, 372‒374, 372‒373, 376‒380, 377, 377, 378, 378, 379, 380, 380
unconstrained approaches to horizontal partitioning, 527‒528, 527
uncoordinated checkpoint protocols, 744
Unified Parallel C (UPC), 765‒766
uniform crossover operation, 24, 25
uniform distribution
and hill climbing algorithm, 538‒539, 539, 549, 549, 550
and simulated annealing algorithm, 543, 550, 551
uniprocessor/single processor systems, 55‒57, 61‒62, 61, 157
unstructured mesh overlay structure in video streaming, 32, 33‒34, 35
unsupervised learning, 442
UPC (Unified Parallel C), 765‒766
user behavior, modeling, 471, 472‒473, 473, 497, 498
user-driven approaches to domain decomposition, 531‒532, 531
user-level threads, 69
VB (virtual backbone), 8‒9, 10
vector cosine, 483‒484, 484, 486‒487, 489, 499, 499, 501
vector-based similarity metrics, 472, 481‒483, 482, 482, 488‒489, 497, 501
vector cosine, 483‒484, 484, 486‒487, 489, 499, 499, 501
vectorization in OpenCL, 604‒605, 609
very long instruction word (VLIW) architecture, 592
video streaming and heterogeneous overlay construction, 35, 37‒39
video-on-demand (VOD) systems, 34
view-upload decoupling (VUD), 44‒45, 44
virtual backbone (VB), 8‒9, 10
virtual column partitioning, 530
virtual OpenCL. See VOCL
virtualization, 366, 367, 368, 369‒370, 382
Vivaldi network coordinate systems, 40
VLIW (very long instruction word) architecture, 592
VOCL (virtual OpenCL), 676‒677, 676, 696‒697
and application kernels, 689, 691, 691, 691‒692, 693, 694, 695, 695
communication channels, 680‒681
data transfer pipelining, 684‒687, 685, 686
double-precision computations, 692, 694, 695, 695
experimental evaluation, 687‒689, 687, 688, 690, 691, 691, 691‒692, 693, 694, 694‒695, 695
kernel argument caching, 684, 684
kernel execution, 688
multiple virtual GPUs, 694‒695, 695
multiple-level handler translation, 680, 680
native OpenCL function calls, 681, 682
optimizations, 682‒687, 683, 684, 685, 686
and overhead, 682, 683, 683‒684, 684, 688, 688, 691‒692, 693‒694, 696
single-precision computations, 691, 692, 693, 695, 695
VOCL function operations, 679, 679
VOCL proxy, 678, 678, 680‒681, 682, 692
See also OpenCL
VOD (video-on-demand) systems, 34
VUD (view-upload decoupling), 44‒45, 44
Wagner-Fischer algorithm, 490‒491, 491, 495‒497, 499, 501
walker Nodes in Green's function Monte Carlo, 752‒753
WCET (worst-case execution time), 387, 391, 638
web caching in search engines, 469
when clauses in X10, 773
wildcard receives
in MPI, 764
and software fault resilience, 746‒747
wireless communication networks
cost optimization in. See cost optimization
wireless multicast advantage (WMA), 240, 246, 246
wireless sensor networks (WSNs), 239‒240, 245‒246, 245, 246, 248, 343, 646‒647, 647
compression in, 344
congestion in, 245
connectivity of, 228, 230‒231, 232, 239
coverage in, 228, 230‒237, 232, 234, 236, 237
deterministic network model, 12‒13
and event-to-sink reliable transport, 245
fault detection in, 647
fault tolerance in, 228, 231‒233, 232, 234, 234‒235, 238‒244, 243, 647‒654, 648‒650
mean time to failure in, 649, 650, 651‒653, 651‒653
nodes in, 228‒229, 231, 232, 238, 239, 240, 241, 245, 246, 247, 250, 251
and pivot routing, 238‒244, 243, 250
power consumption in, 228, 229, 239, 240
queueing model for, 353‒357, 353, 356
reliability of, 227‒229, 235‒237, 236, 237, 653‒654, 654
sensor nodes, 343
topology control in, 7‒9, 8, 9, 10‒12
and transmission range, 246‒249, 248, 249, 250, 251
wireless multicast advantage, 240, 246, 246
WMA (wireless multicast advantage), 240, 246, 246
work stealing in Scioto, 771
worker nodes in Green's function Monte Carlo, 752
workload distribution in CPU/GPU systems, 86‒87, 90, 90‒91, 91
worst-case execution time (WCET), 387, 391, 638
WSNs. See wireless sensor networks
xform handles in POET, 266
Xrep/X2Rep reputation management system, 301
Zipf distribution
and hill climbing algorithm, 539‒540, 539‒540, 549, 549, 550