21. Parallel Programming Using Threads (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 21

Parallel Programming Using Threads

21.1 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

21.2 Multi-Tasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

21.3 POSIX Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

21.4 Subset Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

21.4.1 Generate Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

21.4.2 Sequential Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

21.4.3 Multi-Threaded Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

21.5 Interleaving the Execution of Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

21.6 Thread Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

21.7 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Multi-core processors are everywhere: high-end desktops have multiple cores. Even mobile

phones also have multi-core chips. It has become diﬃcult to make individual cores faster, but

adding more cores is easier. More cores do not necessarily mean faster programs, because

many factors aﬀect a computer’s performance. The number of cores is one factor, but the

software is also important. If a program does not take advantage of multiple cores, then

it may as well be running on a single core processor. If a program is written for multiple

cores, then the program can be referred to as a parallel program. If a program is not written

for multiple cores, then it is called a sequential program. All programs so far in this book

are sequential programs. This chapter provides an introduction to writing parallel programs

using threads.

21.1 Parallel Programming

There are many ways to write parallel programs. A parallel program performs multiple

operations at the same time. Parallel programs are often classiﬁed into three categories:

1. SIMD: single instruction, multiple data. This is commonly used in signal processing,

such as manipulating data in arrays. When adding two arrays, the same instruction

(addition) is applied to all array elements at once.

2. MISD: multiple instructions, single data. Diﬀerent operations are performed on the

same piece of data. As an example, census data are widely studied for various purposes.

The same data may be processed simultaneously by diﬀerent threads of execution, in

order to ﬁnd, for example, the average and median age of a population at once.

3. MIMD: multiple instructions, multiple data. This is the most general case. Diﬀerent

operations are performed on diﬀerent pieces of data.

A sequential program is SISD: At any moment, the program is performing one operation

on one piece of data.

335

336 Intermediate C Programming

21.2 Multi-Tasking

If parallel programs take advantage of multiple cores, then does this mean that parallel

programs cannot run on single-core processors? Yes, they can. The operating system is be-

tween the program and the cores, and provides an abstraction so that programs do not need

to be too concerned about the number of cores. For example, how many computer programs

are running simultaneously on your computer right now? There could be a web browser, a

text editor, a music player, instant messaging, and many other programs, including various

system services.

How many cores does your computer have? One? Two? Four? It is the operating system

that makes sure everything works—every program gets a turn to execute some of its code

in a timely fashion. Speciﬁcally, the operating system gives each program a short time

interval (usually several milliseconds) in which one or several cores execute the program

and the program makes some progress. After this interval, the operating system suspends

the program, and then allows another program to run. By giving every program a short

time interval to make some progress, the operating system gives the user the impression that

every program makes progress. If a processor has only one core, then only one program can

run at any given moment. This is called multi-tasking and is analogous to several children

sharing a slide in a playground. Even though only one person can slide down at any moment

(for safety), everyone gets a turn, and everyone enjoys the slide.

In order to improve the overall performance, the operating system may change the

lengths of the time intervals depending on what particular programs are doing. For example,

when a program wants to read data from a ﬁle, this program has to wait for the disk to

get the data. During this waiting period, the operating system shortens the program’s time

interval so that another program can use the processor. If a program waits for a user to

enter something on the keyboard, then the operating system also shortens the program’s

time interval while the program is waiting for the user’s input. Shortened intervals also

occur when a program is waiting on data from the network. Because the lengths of the time

intervals can change, it is diﬃcult to predict exactly which program is executing at any

given moment of time.

21.3 POSIX Threads

Threading is one way to write parallel programs. Each thread in a program may execute

the code in parallel. A sequential program can be thought of as having a single thread of

execution. Parallel programs have two or more threads. This book uses POSIX threads, also

called pthread. POSIX stands for Portable Operating System Interface and it is a standard

that most operating systems support.

A typical multi-thread program starts with only one thread, called the main thread or the

ﬁrst thread. The main thread creates one or several threads by calling the pthread create

function. This function requires four arguments:

1. The ﬁrst argument is an address of a structure called pthread t. This is a structure

deﬁned in pthread.h and it stores information about the newly created thread. You do

not need to know precisely what information the pthread library stores in pthread t.

Pthreads needs to manage some data on each thread of execution, and this data is

Parallel Programming Using Threads 337

stored in pthread t. This is similar to the FILE pointer: it stores some information

about an opened ﬁle.

2. The second argument is the address of a structure called pthread attr t. This spec-

iﬁes optional attributes for initializing a thread. If this argument is NULL, the thread

is initialized with default values.

3. The third argument is the name of a function. Section 9.2 explains how to use a

function as an argument to another function (qsort). There are some diﬀerences

here. For pthread create, the function’s return type must be void* and the function

takes precisely one argument. For qsort, it expects a function that returns an integer

and the function takes two pointers as arguments.

4. The fourth argument is the argument passed to the function speciﬁed in the third

argument.

After calling pthread create, a new thread (the second thread) is created and executes

the function speciﬁed in the third argument. The main thread and the new thread may

now run simultaneously (if there are two or more cores). If the program equally divides its

tasks between the ﬁrst thread and the second thread, then the program can be about twice

as fast. On a single core machine, the two threads take turns and there is no performance

improvement.

The main thread should call pthread join on the second thread before terminating.

This function causes the calling (main) thread to wait until another thread terminates.

After the call to pthread join, there will only be one thread executing. If the main thread

does not wait for the second thread to ﬁnish, and the main thread terminates, then the

second thread will also be terminated. Below is a code listing for a simple program creating

one thread:

// thread1 .c1

#in clude < pthread .h >2

#in clude < stdio .h >3

#in clude < stdlib .h >4

void * p rintHello ( void * arg )5

int * intptr = ( in t *) arg ;7

int val = * intptr ;8

printf (" Hello World ! arg = %d n" , val ) ;9

return NULL ;10

}11

int main ( i n t argc , char * argv [])12

{13

pthread_t second ;14

int rtv ; // return value of pth r ead_cr eate15

int arg = 12345;16

rtv = p thread _creat e (& second , NULL ,17

printHello , ( void *) & arg ) ;18

i f ( rtv != 0)19

{20

printf (" ERROR ; pth r ead_cr eate () returns %d n" , rtv ) ;21

return EXIT _FAILUR E ;22

}23

rtv = p thread_ join ( second , NULL ) ;24

i f ( rtv != 0)25

{26

338 Intermediate C Programming

printf (" ERROR ; pthr ead_join () returns %d n" , rtv ) ;27

return EXIT _FAILUR E ;28

}29

return EXIT _SUCCES S ;30

}31

To compile this ﬁle into an executable program, we need to add -lpthread at the end

of the gcc command in this way:

$ gcc -g -Wall -Wshadow thread1.c -o thread1 -lpthread

This will link the pthread library to the executable program. When the program runs,

it prints the following output:

Hello World! arg = 12345

How does this program work? Let’s start at line 17 where pthread

create is called.

The ﬁrst argument is the address of a pthread t object created at line 14. The second

argument is NULL, and thus the thread is initialized with the default attributes. The third

argument is printHello. This is the address of the printHello function and pthread will

use this address to execute printHello of lines 4 to 10. The function’s return type must be

void *. The function must take one and only one argument and the type must be void *.

The fourth argument to pthread create is the argument that pthreads will in turn pass

to printHello when it calls this function. In this case, it is the address of an integer. If

pthread create succeeds, it returns zero. This indicates that the thread is created normally.

There are now two parallel threads in the program.

Inside printHello, the argument is cast to the correct type. Since line 18 uses the

address of an integer, the correct type is int *. Line 7 dereferences the address to get the

integer value. The printHello has no useful information to return so it returns NULL at line

9. The main thread calls pthread join to wait for the second thread to ﬁnish executing

before main terminates. If pthread join is not called, the main thread may terminate and

destroy the second thread before it gets a chance to print anything to the terminal.

21.4 Subset Sum

The subset sum problem is commonly encountered when studying cryptography. This

problem can be deﬁned in diﬀerent ways. Here is one deﬁnition. Consider a positive integer

k and a set of positive integers A = {a

, a

, ..., a

}. Is it possible to ﬁnd a subset B =

, b

, ..., b

} ⊂ A such that k = b

+ b

+ ... + b

? Note that B cannot be the empty set

because k > 0. Since B is a subset of A, m must be less than or equal to n.

Consider the following example: A = {19, 10, 2, 9, 8} and k = 20. We have k = 10+2+8,

and thus there is a solution. Consider another example, A = {3, 5, 1, 2, 6} and k = 21. In

this case, the sum of all elements in A is 3 + 5 + 1 + 2 + 6 = 17, which is smaller than

k. Hence there is no solution. Sometimes multiple solutions are possible. For example, if

A = {1, 2, 3, 4, 5} and k is 7. Here we have the solutions: {1, 2, 4}, {3, 4}, and {2, 5}.

This problem asks you to write a program that counts the number of subsets whose sum

equals to the value of k. For a set of n elements, there are 2

subsets. This program restricts

the size of sets to 31 so that the number of subsets does not exceed 2

Parallel Programming Using Threads 339

21.4.1 Generate Test Cases

As always, it is good practice to think about testing a program before it is written. In

this case we can write a program that generates test cases. This generator program takes

three command-line arguments:

• argv[1] is the number of elements. For this program, the value must be between 3

and 31.

• argv[2] is ”1” if the set is valid, and ”0” if the set is invalid. A valid set must contain

distinct positive integers. If an element is zero or negative, then the set is invalid. If

two or more elements have the same value, then the set is also invalid.

• argv[3] is ”1” if there is a solution for the test case. It is ”0” if there is no solution.

The program ﬁrst generates an array of positive integers. The values are guaranteed to

be distinct because each element is larger than the previous element. If we want to generate

an invalid set, then the program chooses one of the two possibilities: Either it makes an

element zero or negative, or it makes two elements have the same value. If the set is invalid,

then the value of k is irrelevant and the generator chooses a random number for k.

For a valid set with a solution, k is the sum of some elements. Each element has a one-

third chance of being part of the sum equal to k. It is possible that no element is added. In

this case, k will be zero and the generator program corrects this situation by setting k to

the ﬁrst element. If the set is valid but the generator creates a test case with no solution,

then k is greater than the sum of all elements. The output or the generator program prints

k ﬁrst, and then the set elements one after another. The complete generator program is

shown below:

// testgen .c1

#in clude < stdio .h >2

#in clude < stdlib .h >3

#in clude < time .h >4

// command line argu m ents5

// argv [1]: the number of e lements6

// must be between 3 and 317

// argv [2]: "1" the array is valid8

// "0" the array is invalid9

// argv [3]: "1" if there is a s o lution10

// "0" if there is no solution11

void swap ( i n t * a , i nt * b )12

{13

int t = * a;14

* a = * b ;15

* b = t ;16

}17

void shuffl eArray ( i n t * arr , i nt num )18

{19

int ind1 , ind2 , ind3 ;20

for ( ind1 = 0; ind1 < 1000; ind1 ++)21

{22

ind2 = rand () % num ;23

ind3 = rand () % num ;24

swap (& arr [ ind2 ] , & arr [ ind3 ]);25

}26

}27

void printArr a y ( i n t * arr , in t num , in t kval )28

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 21. Parallel Programming Using Threads (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
21. Parallel Programming Using Threads (1/5)