Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

Note: Page numbers followed by f indicate figures and t indicate tables.

Adders

carry-look-ahead adder 103, 103f

prefix computation 103–104

ripple-carry adder 102, 102f

Advanced vector extensions (AVX) 304

Alice programming language 13

Amdahl’s law 199–201

Anonymous processes, Python 34–36, 35f

append method 248

Array processors 304

Arrays 173, 193–194

Asynchronous communication 139

Asynchronous computing 314

Barriers 132, 263–264

Bit serial adder 113

Blocking command 20–21, 21f, 22f

Blocking communication 313

Bounded buffer 251–252

Broadcast 90

Broadcast commands 20–21, 22f

Bulk synchronous parallel (BSP) model

definition 136

direction and velocity 136

grid points 137, 137f

MPI 119

superstep 136, 137f

Bus arbiter 113

Buses 95–96

Call stack 165

Carry-look-ahead adder 103, 103f

Catch statement 221

Center for Curriculum Development and Educational Resources (CDER) 1–3

Cluster computing

high-performance networks 140–148

MPI 118–127

outcomes 117

Clusters 310

Coarse-grained data parallelism

definition 309

distributed memory 310–314, 311f, 312f

shared memory parallelism 309–310, 310f

task scheduling 314–315

Collective communication

barrier synchronization 132

broadcast collective 133

gather collective 133, 135

message passing programs 134f

reduce collective 135, 136

Communication

blocking 127–132

collective 132–136

nonblocking 138–140, 141f, 313–314

point-to-point 127–132

two-sided 127–132

Communicators 128

Compute unified device architecture (CUDA) language 307–308

Concurrency control 161–164

Condition variables

bounded buffer 251–252

bugs 254, 257

full/empty buffer 255–256

inexcusably inefficient 252

newCondition method 257

notify 253

notifyAll 253, 256

ReentrantLock class 257

spinning 252, 257

wait 253

Context switch 30–31

Convergecast 90, 92–93

Conway’s game of life 6

cells 300–301

coarse-grained data parallelism 309–315

data parallelism 303–308

data partitioning 316–318, 316f, 317f

load balancing 320

loop-based parallelism 308–309

outcome 300

programming 301–302

rules of 301

work/minimizing communication 318–319, 319f

Copy-scans

n ≤ p 275

n > p 275

Counters

flip-flop propagation delay 109f

AND gates 108, 108f

internet 110

prefix computations 107

ripple 109, 109f

synchronous circuit 109

Cray Seastar interconnection 100

Dataflow 315

Data parallelism 171

GPUs 307–308

sequential reference code 303–304

SIMD 304

vector instructions 304–306, 305f

vector pipelining 306–307

Data partitioning

distributed case 316

latency 316

matrix-vector product algorithm 317

One-dimensional partitioning 316, 317–318, 317f

strong scaling 317

two-dimensional partitioning 316, 317–318, 317f

weak scaling 317

Deadlock 5, 245–248, 313

Digital logic 49990:t0020 49990:t0025

adders 102–104

counters 107–110

field-programmable gate arrays 113–114

finite state machines 113

flip-flops 112–113

hardware 84

Karnaugh maps 99–102, 100f, 101f

latches 112–113

logic gates 90–96

multiplexers 104–106, 105f, 106f, 107f

number representation 85–89

outcomes 83

pedagogical levels, PDC 84t

programmable logic devices 113–114

recursive thinking 88–89

shift registers 110–112, 110f, 111f

timing analysis 97–99, 97f, 98f

tree representation 86–87, 86f

trees and interconnection networks 88–89

unary number vs. binary number 88–89

Verilog 113

Directed acyclic graphs (DAGs)

applications 98–99

data hazards 98–99

faster circuit/algorithm 98

finite impulse response filter 98

multiplexer circuit 97, 97f, 98f

speedup and parallelism 196–198

steady state 97

synchronization 98–99

threads 99

work and span 194–196, 195f, 197f

Distributed memory parallelism 310–311, 311f

Distributed memory programming

blocking communication 313

deadlock 313

MPI library 311, 312

nonblocking communication 313–314

pseudocode 314

two-sided communication 312

Divide-and-conquer parallelism

array elements 181

helper thread 185

implementations of 184

insertion sort 185

java.lang.InterruptedException 183

recursion tree 181, 183f

strategies 5

DoubleBalance method 225

Dynamic scheduling 80–81

Early adopter program 2

EduPar workshop series 2

Ethernet 140–142, 148, 310

Exclusive scan 269–270

Fan-out and fan-in

associative operations 90

broadcast 90

convergecast 90, 92–93

efficiency 94

feasible schedule 93–94, 94f

Monopoly 92–93, 93f

multicast 90, 92f

OR gate 90

recursive decomposition 94–95

speedup 94

trees 90

Finally statement 221

Fine-grained parallelism See Data parallelism

Flip-flops 112–113

FORALL loop 174

Fork-join parallelism 49990:t0035 49990:t0040 5, 177

Amdahl’s law 199–201

arrays 173, 193–194

vs. concurrency control 161–164

cores 161

dataflow 171

data parallelism 171

divide-and-conquer parallelism 181–186

error-prone 170

expected time bound 198–199

FORALL loop 174

Java framework 187–190

java.lang.Thread method 176

join method 177

maps 190–193

material 158–160

mergesort 209–211

message passing 171

Moore’s law 199–201

multithreaded programming 160–161

outcomes 157

pack algorithm 206–207

prefix-sum algorithm 202–206

processors 173, 178–181

quicksort 207–209, 209f

reductions 190–193

sequential programming 160

shared memory 164–170

speedup and parallelism, DAG 196–198

sum method 175, 176, 179

SumThread constructor 175

synchronization 176

threads 164–170

time-slicing 160

work and span, DAG 194–196, 195f, 197f

Graphics processing unit (GPU)

data parallelism 307–308

supercomputers 150, 151

Halo region 318–319

Heap data structure 165

Hiding 216

High-performance computing (HPC) community 119, 125–127, 126f

High-performance networks

Cray supercomputing networks 140–142

Ethernet 140–142

features of 142

HPC vs. traditional networking 144

InfiniBand 140–142

operating system 142, 143f

performance of 142

RDMA 144–146

IBM Blue Gene/L 100

Immutable memory 238–239

Inclusive scan 269–270, 274

Independent iterations 308

InfiniBand 140–142, 310

Integrated development environment (IDE) 27

International Parallel and Distributed Processing Symposium (IPDPS) 2

isEmpty method 232

Java

condition variables 225–226

doubleBalance method 225

private field 223

public field 224

synchronized statement 222–223

unusual style 223

Java ForkJoin framework

classes and methods 188

ForkJoinPool 188

java.util.concurrent package 187

RecursiveAction class 188–189, 190

RecursiveTask 189

Jitter 79

Joining 64–65

Join method 42–43, 43f, 49, 50f, 177, 190

Karnaugh maps

adjacency graph 99–100, 100f, 101f

definition 99–100

hypercube 102

mesh and torus 100

ring-based topologies 100

synchronization 101

topology of 4

wraparound connections 101

Kogge-Stone circuit 108f

Ladner-Fischer circuit 108f

Latches 112–113

Latency 316, 318

Lightweight kernels 151

Load imbalance 320

Lock

catch statement 221

concurrency control and 220

finally statement 221

Java 222–226

pseudocode 220, 221

reentrant locks 222

throw statement 221

Logic gates

buses 95–96

fan-in 90–95

fan-out 90–95

tristate gates 95–96

Loop-based parallelism 308–309

Loop unrolling 306

Mandelbrot set

explicit threading approach 81

fractal image 72–73, 73f

OpenMP 75–81, 76f, 78f, 80f

sequential program 74, 75f

uses 73–74

Mandlebrot 4

Map-reduce strategies 5

Masking 216

Master-worker model 315

Medium access control (MAC) protocols 110

Meld operation 277

Memory

distributed 310–311, 311f

shared 309–310, 310f

Mergesort

fork-join parallelism 209–211

POOL/map paradigm 54, 56f, 57f

queue objects 46–47, 46f

spawning process 46–47, 46f

Message passing See Message passing interface (MPI)

Message passing interface (MPI) 49990:t0030 4–5, 311

BSP model 119, 136–138

cluster computing 118–127

collective communication 132–136

definition 118

HPC community 119, 125–127, 126f

internet 118

LAM 119

nodes 119

nonblocking communication 138–140, 141f

nonuniform memory access 120–121

OpenMP 148–150, 150f

Parallel Virtual Machine middleware 119

point-to-point communication 127–132

scalability 122–123

shared memory systems 119–120

SPMD 123–125

supercomputers 150–152, 152t

threads and 150f

Monitors 258

Moore’s Law 201

MPI See Message passing interface (MPI)

MPI_Recv() 131–132

MPI_Send() 131–132

Multicast 90, 92f

Multiple instruction, multiple data (MIMD) 310

Multiplexers 104–106, 105f, 106f, 107f

Multiprocessing module 27, 28–29

Multithreaded prime counter

C 64f

C++ 64f

Java 65f

nextCand loop 63

Mutex 66, 67

Mutual exclusion 66

Networks 49990:t0030 See also High-performance networks

cluster computing 117, 140–148

interconnection 88–89, 95–96

and MPI 49990:t0030

Next costume command 15

Node 118, 119, 310

Nonblocking command 20–21, 21f, 22f

Nonblocking communication 138–140, 141f, 313–314

notifyAll () method 253, 256

Notify () method 253

One-dimensional partitioning 316, 317–318, 317f

OpenMP 308

compared to explicit threading 81

dynamic scheduling 80–81

MPI 148–150, 150f

race condition 77

OR gate 90

Outline 264

Overhead 79

Parallel and distributed computing (PDC) 1

CDER 2–3

circuit fan-in and fan-out 4

Conway’s Game of Life 6

course material 6–7

courseware repository 2

early adopter program 2

EduPar workshop series 2

Fork-join parallelism 5

MPI 4–5

organization 3

prime counting 4

Python-based computer science course 6

Python programming language 4

scratch 3–4

shared-memory concurrency control 5

topology 4

Parallelizable loops 262–263

Parallel partitioning 279–281

analysis 281

meld operation 277

permute operation 278–279

Parallel performance 49–52, 50f, 51f

Parallel reduction

global variables 264–265

n > p 267–269

n ≤ p 265–267, 266f

operator 264

Parallel running times 293–294, 295f, 296f

Pattern recognizer 113

PDC See Parallel and distributed computing (PDC)

Permute operations 278–279

Pipeline 214, 306–307

Play sound command 20

Point-to-point communication

and message size 130–131, 131f

sequential processes 127

telephone, classic game of 128–129, 129f

unique addresses 127–128

using buffers storage 130

POOL/map paradigm, Python

abstract procedure 52

mergesort 54, 56f, 57f

P Monte Carlo estimation 52, 53f

Riemann sum 54, 55f

Pragmas 60

Prefix-sum algorithm

definition 202

down pass 204–206, 205f

sum-an-array problem 206

up pass 202–203, 202f, 203f

Prime counting module 4

assignment 71–72

C program 69f

C++ program 69f

definition 61

Java program 69f

load balance 70–71, 70f

multiple thread creation 63, 64f, 65f

privatizing variables 68

race condition 66–68, 66f

sequential code 62–63, 62f

thread joining 64–65

uses 62

Privatizing variables 68

Program counter 165

Public method 167

Python 49990:t0010 49990:t0050

anonymous processes 34–36, 35f

child process 43–44, 44f

communicating via queue 40–41, 40f

communication 27

context switches 30–31

course materials 30

data structures 29, 47–48, 47f, 48f

extended communication via queue 41, 42f

goals 26

graphics courses 29

harnessing parallel hardware 27

hole diggers 38–40, 39f

introductory courses 29

join method 42–43, 43f

limited programming ability 28

lock to control printing 36–38, 38f

mergesort 46–47, 46f

multiple child process 45, 45f

multiprocessing module 27, 28–29

outcomes 25

POOL/map paradigm 52–54, 53f, 55f, 56f, 57f

process names 36, 37f

Python 2.x 26–27

requirements 26

security/resource management 28

spawning process 31–34, 31f, 33f, 34f

speedup 48–52

tools 26–27

traditional content 29–30

Python-based computer science course 6

Python 2.x 26–27

Queue

communication 40–41, 40f

extended communication 41, 42f

Quicksort

analysis 284–285

fork-join parallelism 207–209, 209f

partitioning in 282, 283f, 284f

segment bits 282

segmented scan 282

Race condition, shared-memory concurrency control

bad interleavings 226, 227–232, 227f

data race 226–227, 227f, 232–237

definition 226

stacks 227–232

Random numbers 48–49, 49f

Rank, MPI 128

Reader/writer locks 249–251

Recursion tree 181, 183f

Reentrant locks 222

Remote direct memory access (RDMA)

Ethernet 148

main memory and 146–148

queue-pair model 144–146, 147f

send-recv model 146

uses 144

write/read 146

Riemann sum 54, 55f

Ripple-carry adder 102, 102f

Ripple counters 109, 109f

Scaling

strong 317

weak 317

Scan operation

exclusive scan 269–270

inclusive scan 269–270, 274

n > p 272–274

n ≤ p 270–272, 270f

op function 270

Scratch 49990:t0005

advantage of 3–4

blocking command 20–21, 21f, 22f

cartoonish characters and photographs 13

communication, clean solutions 14–17, 15f, 16f, 17f

computer literacy course 12

context of 12–13

CS1 13

dragging blocks 13

nonblocking command 20–21, 21f, 22f

outcomes 11

PDC 11–12

private variable 21–23, 22f

race conditions 17–20, 19f

shared variable 21–23, 22f

toy language 13–14

Segmented copy-scans 289–290

Segmented inclusive scans 289

Segmented reductions 290–292

Segmented scan 282, 285–289

Sequential implementation 302

Sequential parallel 49–52, 50f, 51f

Sequential running times 293–294, 295f, 296f

Shared-memory concurrency control 49990:t0045 5

bank-account class 215

cooks 214–215

deadlock 245–248

debug 215

failure/performance isolation 216

immutable memory 238–239

mutual-exclusion ( See Lock)

outcomes 213

parallelism 216

pipeline 214

processor utilization 216

race condition 226–237

responsiveness 216

synchronization 215–226, 240–245, 248–259

thread 215

thread-local memory 237–238

Shared memory parallelism 309, 310f

Shared memory systems 119–120

Shift registers 110–112, 110f, 111f

Single program, multiple data (SPMD) 310

advantage of 125

applications 125

data parallelism 304

definition 123

extreme scales 124

failures 125

STAT 124

synchronization 123–124

Spawning process, Python

mergesort 46–47, 46f

multiple 32–34, 33f

Pool and Pool.map mechanism 34, 34f

single process 31–32

Sprite 14–15

Stacks

array 229

index 229

intermediate states 227–228

myPeekHelperWrong causes 230, 231f

pop operation 228, 231

push operation 228, 230

StackEmptyException 231

synchronized keyword 229

Stack trace analysis tool (STAT) 124

Start method 32, 35–36, 167, 185

Static scheduling 80

Streaming SIMD Extensions (SSE) instructions 304

StringBuffer class 248

Strong scaling 317

Subdomain 316–317

Sum method 175, 176, 179

Supercomputers 150–152, 152t

Synchronization 215–216

A-B-A problem 243–244

assignment statements 242

atomic operations 244

BankAccount class 217

barriers 258

coarse-grained locking 241–242

ConcurrentHashMap class 244, 245

condition variables 251–258

data races 240

finegrained locking 241

Hashtable class 244

locking 240

monitors 258

mutual exclusion 218–219

possible interleavings 216–217

primitives 220

race conditions 242

reader/writer locks 249–251

semaphores 258

stale information 218

thread scheduler 217

vertical timeline 217–218

while loop 219

System.out.println method 168

Think command 20

Threads

DAGs 99

MPI 150f

OpenMP 60

shared-memory concurrency control 215

Threads and shared memory 170f

aliasing 170

call stack 165

constructor 168

definition 164

heap data structure 165

interference 166

multithreaded program 166f

new thread creation 167, 169f

nondeterministic 167

program counter 165

scheduling 164

sequential program 164–165, 165f

Throw statement 221

Tightly coupled systems 122–123

Time-slicing 160

Toy language 13–14

transferTo method 245, 246, 247, 248

Tristate gates 95–96

Unary number vs. binary number 88–89

Vector pipelining 306–307

Verilog 113

Volatile keyword 235

Weak scaling 317

Whack-a-mole game 22, 22f

While loops 219, 254

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Table of Contents for
Index