Chapter 21
Parallel Programming Using Threads
21.1 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
21.2 Multi-Tasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
21.3 POSIX Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
21.4 Subset Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
21.4.1 Generate Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
21.4.2 Sequential Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
21.4.3 Multi-Threaded Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
21.5 Interleaving the Execution of Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
21.6 Thread Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
21.7 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Multi-core processors are everywhere: high-end desktops have multiple cores. Even mobile
phones also have multi-core chips. It has become difficult to make individual cores faster, but
adding more cores is easier. More cores do not necessarily mean faster programs, because
many factors affect a computer’s performance. The number of cores is one factor, but the
software is also important. If a program does not take advantage of multiple cores, then
it may as well be running on a single core processor. If a program is written for multiple
cores, then the program can be referred to as a parallel program. If a program is not written
for multiple cores, then it is called a sequential program. All programs so far in this book
are sequential programs. This chapter provides an introduction to writing parallel programs
using threads.
21.1 Parallel Programming
There are many ways to write parallel programs. A parallel program performs multiple
operations at the same time. Parallel programs are often classified into three categories:
1. SIMD: single instruction, multiple data. This is commonly used in signal processing,
such as manipulating data in arrays. When adding two arrays, the same instruction
(addition) is applied to all array elements at once.
2. MISD: multiple instructions, single data. Different operations are performed on the
same piece of data. As an example, census data are widely studied for various purposes.
The same data may be processed simultaneously by different threads of execution, in
order to find, for example, the average and median age of a population at once.
3. MIMD: multiple instructions, multiple data. This is the most general case. Different
operations are performed on different pieces of data.
A sequential program is SISD: At any moment, the program is performing one operation
on one piece of data.
335
336 Intermediate C Programming
21.2 Multi-Tasking
If parallel programs take advantage of multiple cores, then does this mean that parallel
programs cannot run on single-core processors? Yes, they can. The operating system is be-
tween the program and the cores, and provides an abstraction so that programs do not need
to be too concerned about the number of cores. For example, how many computer programs
are running simultaneously on your computer right now? There could be a web browser, a
text editor, a music player, instant messaging, and many other programs, including various
system services.
How many cores does your computer have? One? Two? Four? It is the operating system
that makes sure everything works—every program gets a turn to execute some of its code
in a timely fashion. Specifically, the operating system gives each program a short time
interval (usually several milliseconds) in which one or several cores execute the program
and the program makes some progress. After this interval, the operating system suspends
the program, and then allows another program to run. By giving every program a short
time interval to make some progress, the operating system gives the user the impression that
every program makes progress. If a processor has only one core, then only one program can
run at any given moment. This is called multi-tasking and is analogous to several children
sharing a slide in a playground. Even though only one person can slide down at any moment
(for safety), everyone gets a turn, and everyone enjoys the slide.
In order to improve the overall performance, the operating system may change the
lengths of the time intervals depending on what particular programs are doing. For example,
when a program wants to read data from a file, this program has to wait for the disk to
get the data. During this waiting period, the operating system shortens the program’s time
interval so that another program can use the processor. If a program waits for a user to
enter something on the keyboard, then the operating system also shortens the program’s
time interval while the program is waiting for the user’s input. Shortened intervals also
occur when a program is waiting on data from the network. Because the lengths of the time
intervals can change, it is difficult to predict exactly which program is executing at any
given moment of time.
21.3 POSIX Threads
Threading is one way to write parallel programs. Each thread in a program may execute
the code in parallel. A sequential program can be thought of as having a single thread of
execution. Parallel programs have two or more threads. This book uses POSIX threads, also
called pthread. POSIX stands for Portable Operating System Interface and it is a standard
that most operating systems support.
A typical multi-thread program starts with only one thread, called the main thread or the
first thread. The main thread creates one or several threads by calling the pthread create
function. This function requires four arguments:
1. The first argument is an address of a structure called pthread t. This is a structure
defined in pthread.h and it stores information about the newly created thread. You do
not need to know precisely what information the pthread library stores in pthread t.
Pthreads needs to manage some data on each thread of execution, and this data is
Parallel Programming Using Threads 337
stored in pthread t. This is similar to the FILE pointer: it stores some information
about an opened file.
2. The second argument is the address of a structure called pthread attr t. This spec-
ifies optional attributes for initializing a thread. If this argument is NULL, the thread
is initialized with default values.
3. The third argument is the name of a function. Section 9.2 explains how to use a
function as an argument to another function (qsort). There are some differences
here. For pthread create, the function’s return type must be void* and the function
takes precisely one argument. For qsort, it expects a function that returns an integer
and the function takes two pointers as arguments.
4. The fourth argument is the argument passed to the function specified in the third
argument.
After calling pthread create, a new thread (the second thread) is created and executes
the function specified in the third argument. The main thread and the new thread may
now run simultaneously (if there are two or more cores). If the program equally divides its
tasks between the first thread and the second thread, then the program can be about twice
as fast. On a single core machine, the two threads take turns and there is no performance
improvement.
The main thread should call pthread join on the second thread before terminating.
This function causes the calling (main) thread to wait until another thread terminates.
After the call to pthread join, there will only be one thread executing. If the main thread
does not wait for the second thread to finish, and the main thread terminates, then the
second thread will also be terminated. Below is a code listing for a simple program creating
one thread:
// thread1 .c1
#in clude < pthread .h >2
#in clude < stdio .h >3
#in clude < stdlib .h >4
void * p rintHello ( void * arg )5
{6
int * intptr = ( in t *) arg ;7
int val = * intptr ;8
printf (" Hello World ! arg = %d n" , val ) ;9
return NULL ;10
}11
int main ( i n t argc , char * argv [])12
{13
pthread_t second ;14
int rtv ; // return value of pth r ead_cr eate15
int arg = 12345;16
rtv = p thread _creat e (& second , NULL ,17
printHello , ( void *) & arg ) ;18
i f ( rtv != 0)19
{20
printf (" ERROR ; pth r ead_cr eate () returns %d n" , rtv ) ;21
return EXIT _FAILUR E ;22
}23
rtv = p thread_ join ( second , NULL ) ;24
i f ( rtv != 0)25
{26
338 Intermediate C Programming
printf (" ERROR ; pthr ead_join () returns %d n" , rtv ) ;27
return EXIT _FAILUR E ;28
}29
return EXIT _SUCCES S ;30
}31
To compile this file into an executable program, we need to add -lpthread at the end
of the gcc command in this way:
$ gcc -g -Wall -Wshadow thread1.c -o thread1 -lpthread
This will link the pthread library to the executable program. When the program runs,
it prints the following output:
Hello World! arg = 12345
How does this program work? Let’s start at line 17 where pthread
create is called.
The first argument is the address of a pthread t object created at line 14. The second
argument is NULL, and thus the thread is initialized with the default attributes. The third
argument is printHello. This is the address of the printHello function and pthread will
use this address to execute printHello of lines 4 to 10. The function’s return type must be
void *. The function must take one and only one argument and the type must be void *.
The fourth argument to pthread create is the argument that pthreads will in turn pass
to printHello when it calls this function. In this case, it is the address of an integer. If
pthread create succeeds, it returns zero. This indicates that the thread is created normally.
There are now two parallel threads in the program.
Inside printHello, the argument is cast to the correct type. Since line 18 uses the
address of an integer, the correct type is int *. Line 7 dereferences the address to get the
integer value. The printHello has no useful information to return so it returns NULL at line
9. The main thread calls pthread join to wait for the second thread to finish executing
before main terminates. If pthread join is not called, the main thread may terminate and
destroy the second thread before it gets a chance to print anything to the terminal.
21.4 Subset Sum
The subset sum problem is commonly encountered when studying cryptography. This
problem can be defined in different ways. Here is one definition. Consider a positive integer
k and a set of positive integers A = {a
1
, a
2
, ..., a
n
}. Is it possible to find a subset B =
{b
1
, b
2
, ..., b
m
} A such that k = b
1
+ b
2
+ ... + b
m
? Note that B cannot be the empty set
because k > 0. Since B is a subset of A, m must be less than or equal to n.
Consider the following example: A = {19, 10, 2, 9, 8} and k = 20. We have k = 10+2+8,
and thus there is a solution. Consider another example, A = {3, 5, 1, 2, 6} and k = 21. In
this case, the sum of all elements in A is 3 + 5 + 1 + 2 + 6 = 17, which is smaller than
k. Hence there is no solution. Sometimes multiple solutions are possible. For example, if
A = {1, 2, 3, 4, 5} and k is 7. Here we have the solutions: {1, 2, 4}, {3, 4}, and {2, 5}.
This problem asks you to write a program that counts the number of subsets whose sum
equals to the value of k. For a set of n elements, there are 2
n
subsets. This program restricts
the size of sets to 31 so that the number of subsets does not exceed 2
31
.
Parallel Programming Using Threads 339
21.4.1 Generate Test Cases
As always, it is good practice to think about testing a program before it is written. In
this case we can write a program that generates test cases. This generator program takes
three command-line arguments:
argv[1] is the number of elements. For this program, the value must be between 3
and 31.
argv[2] is ”1” if the set is valid, and ”0” if the set is invalid. A valid set must contain
distinct positive integers. If an element is zero or negative, then the set is invalid. If
two or more elements have the same value, then the set is also invalid.
argv[3] is ”1” if there is a solution for the test case. It is ”0” if there is no solution.
The program first generates an array of positive integers. The values are guaranteed to
be distinct because each element is larger than the previous element. If we want to generate
an invalid set, then the program chooses one of the two possibilities: Either it makes an
element zero or negative, or it makes two elements have the same value. If the set is invalid,
then the value of k is irrelevant and the generator chooses a random number for k.
For a valid set with a solution, k is the sum of some elements. Each element has a one-
third chance of being part of the sum equal to k. It is possible that no element is added. In
this case, k will be zero and the generator program corrects this situation by setting k to
the first element. If the set is valid but the generator creates a test case with no solution,
then k is greater than the sum of all elements. The output or the generator program prints
k first, and then the set elements one after another. The complete generator program is
shown below:
// testgen .c1
#in clude < stdio .h >2
#in clude < stdlib .h >3
#in clude < time .h >4
// command line argu m ents5
// argv [1]: the number of e lements6
// must be between 3 and 317
// argv [2]: "1" the array is valid8
// "0" the array is invalid9
// argv [3]: "1" if there is a s o lution10
// "0" if there is no solution11
void swap ( i n t * a , i nt * b )12
{13
int t = * a;14
* a = * b ;15
* b = t ;16
}17
void shuffl eArray ( i n t * arr , i nt num )18
{19
int ind1 , ind2 , ind3 ;20
for ( ind1 = 0; ind1 < 1000; ind1 ++)21
{22
ind2 = rand () % num ;23
ind3 = rand () % num ;24
swap (& arr [ ind2 ] , & arr [ ind3 ]);25
}26
}27
void printArr a y ( i n t * arr , in t num , in t kval )28
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset