A quotation variously ascribed to A.J.P. Taylor, Arnold Toybnee, and Winston Churchill describes history as “just one thing after another.” C# code is much the same—we write sequences of statements that will be executed one after another. Loops and conditional statements spice things up a little by changing the order, but there is always an order. While individual bits of C# code behave this way, programs as a whole do not have to.
For example, web servers are able to handle multiple requests simultaneously. The user interface for a program working on a slow operation should be able to respond if the user clicks a Cancel button before that slow work is complete. And more or less any computer bought recently will have a multicore processor capable of executing multiple pieces of code simultaneously.
C# can handle this kind of concurrent work thanks to the .NET Framework’s support for multithreading and asynchronous programming. We have a wide array of concurrency tools and there are many ways to use them—each example in the previous paragraph would use a different combination of threading mechanisms. Since there are many ways to approach concurrency problems, it’s worth drawing a clear distinction between the most common reasons for using the techniques and features this chapter describes.
Perhaps the most easily understood goal is parallel execution. A computer with a multicore processor (or maybe even multiple separate processor chips) has the capacity to run multiple bits of code simultaneously. If your program performs processor-intensive tasks, it might be able to work faster by using several cores at once. For example, video encoding is a slow, computationally complex process, and if you have, say, a quad-core computer, you might hope that by using all four cores simultaneously you’d be able to encode a video four times faster than you could with a conventional one-thing-after-another approach. As we’ll see, things never work out quite that well in practice—video encoding on four cores might turn out to run only three times as fast as it does on one core, for example. But even though results often fall short of naive expectations, the ability to perform multiple calculations at the same time—in parallel, as it were—can often provide a worthwhile speed boost. You’ll need to use some of the programming techniques in this chapter to achieve this in C#.
A less obvious (but, it turns out, more widespread) use of multithreading is multiplexing—sharing a single resource across multiple simultaneous operations. This is more or less the inverse of the previous idea—rather than taking one task and spreading it across multiple processor cores, we are trying to run more tasks than there are processor cores. Web servers do this. Interesting websites usually rely on databases, so the typical processing sequence for a web page looks like this: inspect the request, look up the necessary information in the database, sit around and wait for the database to respond, and then generate the response. If a web server were to handle requests one at a time, that “sit around and wait” part would mean servers spent large amounts of time sitting idle. So even on a computer with just one processor core, handling one request at a time would be inefficient—the CPU could be getting on with processing other requests instead of idly waiting for a response from a database. Multithreading and asynchronous programming make it possible for servers to keep multiple requests on the go simultaneously in order to make full use of the available CPU resources.
A third reason for using multithreading techniques is to ensure the responsiveness of a user interface. A typical desktop application usually has different motives for multithreading than a server application—since the program is being used by just one person, it’s probably not helpful to build an application that can work on large numbers of requests simultaneously to maximize the use of the CPU. However, even though an individual user will mostly want to do one thing at a time, it’s important that the application is still able to respond to input if the one thing being done happens to be going slowly—otherwise, the user may suspect that the application has crashed. So rather than being able to do numerous things at once we have less ambitious aims: work in progress shouldn’t stop us from being able to do something else as soon as the user asks. This involves some similar techniques to those required in multiplexing, although the need for cancellation and coordination can make user interface code more complex than server code, despite having fewer things in progress at any one time.
A related reason for employing concurrency is speculation. It may be possible to improve the responsiveness to user input by anticipating future actions, starting on the work before the user asks for it. For example, a map application might start to fetch parts of the map that haven’t scrolled into view yet so that they are ready by the time the user wants to look at them. Obviously, speculative work is sometimes wasted, but if the user has CPU resources that would otherwise be sitting idle, the benefits can outweigh the effective cost.
Although parallel execution, multiplexing, and responsiveness are distinct goals, there’s considerable overlap in the tools and techniques used to achieve them. So the ideas and features shown in this chapter are applicable to all of these goals. We’ll begin by looking at threads.
Threads execute code. They keep track of which statement to execute next, they store the values of local variables, and they remember how we got to the current method so that execution can continue back in the calling method when the current one returns. All programs require these basic services in order to get anything done, so operating systems clearly need to be able to provide at least one thread per program. Multithreading just takes that a step further, allowing several different flows of execution—several threads—to be in progress at once even within a single program.
Example 16-1 executes code on
three threads. All programs have at least one thread—the .NET Framework
creates a thread on which to call your Main
method[40]—but this example creates two more by using the Thread
class in the System.Threading
namespace. The Thread
constructor takes a delegate to a method
that it will invoke on the newly created thread when you call Start
.
Example 16-1. Creating threads explicitly
using System; using System.Threading; class Program { static void Main(string[] args) { Thread t1 = new Thread(One); Thread t2 = new Thread(Two); t1.Start(); t2.Start(); for (int i = 0; i < 100; ++i) { Console.WriteLine("Main: " + i); } } static void One() { for (int i = 0; i < 100; ++i) { Console.WriteLine("One: " + i); } } static void Two() { for (int i = 0; i < 100; ++i) { Console.WriteLine("Two: " + i); } } }
All three threads do the same thing here—they loop around 100 times, printing out a message to show how far they’ve gotten. Here are the first few lines of output I get on my system:
Main: 0 Main: 1 Main: 2 Main: 3 Main: 4 Main: 5 Main: 6 Main: 7 Two: 0 One: 0 One: 1 One: 2 One: 3 One: 4 One: 5 One: 6 One: 7 Main: 8 Main: 9 Main: 10 Main: 11 ...
You can see that the main thread managed to count up to 7 before the
others got going—this is normal,
because it takes a little while for a new thread to get up to speed, and
it’s often possible for the thread that called Start
to make considerable progress before the
threads it created do any visible work. And once they’re underway, you can
see that all three loops are making progress, although the interleaving is
a little surprising.
This illustrates an important feature of multithreaded code—it tends
to be somewhat unpredictable. This particular program can print something
different each time. We don’t want to fill the book with page after page
of this kind of output, so here’s a quick summary of how a different run
on the same machine started out: Main
got up to 7 as before, then One
printed
the number 0, and after that, Two
printed numbers from 0 all the way up to 27 before either of the other
threads managed to get any more numbers out. And just for fun, here’s what
we saw when running on a virtual machine hosted on the same hardware, but
with just two virtual cores available in the VM: One
manages to get all the way to 25 before
Main
gets a look in, and Two
doesn’t print out its first line until
One
has gotten to 41 and Main
has gotten to 31. The specifics here are
not all that interesting; the main point is the variability.
The behavior depends on things such as how many CPU cores the
computer has and what else the machine was doing at the time. The fact
that this particular example ends up with each individual thread managing
to print out relatively long sequences before other threads interrupt is a
surprising quirk—we got this output by running on a quad-core machine, so
you’d think that all three threads would be able to run more or less
independently. But this example is complicated by the fact that all the
threads are trying to print out messages to a single console window. This
is an example of contention—multiple threads fighting
over a single resource. In general, it would be our responsibility to
coordinate access, but the .NET Framework happens to resolve it for us in
the specific case of Console
output by
making threads wait if they try to use the console while another thread is
using it. So these threads are spending most of their time waiting for
their turn to print a message. Once threads start waiting for things to
happen, strange behaviors can emerge because of how they interact with the
OS scheduler.
Threads don’t correspond directly to any physical feature of your computer—a program with four threads running on a quad-core computer might end up running one thread on each core, but it doesn’t usually happen that way. For one thing, your program shares the computer with other processes, so it can’t have all the cores to itself. Moreover, one of the main ideas behind threads is to provide an abstraction that’s mostly independent of the real number of processor cores. You are free to have far more threads than cores. It’s the job of the operating system scheduler to decide which thread gets to run on any particular processor core at any one time. (Or, more accurately, which thread gets to run on any particular logical processor—see the sidebar on the next page.)
A machine will usually have lots of threads—a quick glance at the Windows Task Manager’s Performance pane indicates that this machine currently has 1,340 threads. Who’d have thought that writing a book would be such a complex activity? The extent to which this outnumbers the machine’s four CPU cores highlights the fact that threads are an abstraction. They offer the illusion that the computer has an almost endless capacity for executing concurrent tasks.
Threads are able to outnumber logical processors by this margin because most threads spend the majority of their time waiting for something to happen. Most of those 1,340 threads have called operating system APIs that have blocked—they won’t return until they have some information to provide. For example, desktop applications spend most of their time inside a Windows API call that returns messages describing mouse and keyboard input and the occasional system message (such as color scheme change notifications). If the user doesn’t click on or type into an application, and if there are no system messages, these applications sit idle—their main thread remains blocked inside the API call until there’s a message to return. This explains how a quad-core machine can support 1,340 threads while the CPU usage registers as just 1 percent.
When a blocking API is finally ready to return, the thread becomes runnable. The operating system’s scheduler is responsible for deciding which runnable threads get to use which logical processors. In an ideal world, you’ll have exactly enough runnable threads to make full and perfect use of the CPU cycles you’ve paid for. In practice, there’s usually a mismatch, so either one or more logical processors will be idle, or there will be contention for processing resources.
In the latter case—where there are more runnable threads than logical processors—the scheduler has to decide which threads currently most deserve to run. If a thread runs without blocking for a while (typically a few milliseconds) and there are other runnable threads, the OS scheduler may preempt that thread—it interrupts its execution, stores information about what it was doing at the point at which it was preempted, and gives a different thread some CPU time. If a logical processor becomes available later (either because enough threads block or because some other thread was preempted) the OS will put things back to how they were before preemption, and allow it to carry on. The time for which a thread will be allowed to run before preemption is known as a quantum.
The upshot of this is that even if you have more threads than logical processors, and all of the threads are trying to execute code simultaneously, the OS scheduler arranges for all of them to make progress, despite outnumbering the logical processors. This illusion has a price: preempting a thread and scheduling a different thread to use the CPU slows things down, and you’ll often want to use the techniques we’ll see later that try to avoid forcing the scheduler to do this.
.NET’s threading system is designed so that threads do not have to correspond directly to OS threads, but in practice they always do. At one point, Microsoft thought that .NET threads would need to be able to correspond to OS fibers, an alternative to threads where the application takes a more active part in scheduling decisions. This requirement came from a SQL Server 2005 feature that was cut shortly before the final release, so the distinction between OS threads and .NET threads is now essentially academic (although the feature could conceivably reemerge in a future version). It’s useful to be aware of this because a handful of API features are designed to accommodate this feature, and also because there are plenty of articles you may run into on the Internet written either before the feature was cut or by people who haven’t realized it was cut.
Each thread has its own call stack, which means that items that live on the stack—function arguments and local variables—are local to the thread. We can exploit this to simplify Example 16-1, which contains three almost identical loops. Example 16-2 has just one copy of the loop which is shared by all three threads.
Example 16-2. Per-thread state on the stack
using System; using System.Threading; class Program { static void Main(string[] args) { Thread t1 = new Thread(Go); Thread t2 = new Thread(Go); t1.Start("One"); t2.Start("Two"); Go("Main"); } static void Go(object name) { for (int i = 0; i < 100; ++i) { Console.WriteLine("{0}: {1}", name, i); } } }
The Go
method here contains
the common loop—it has been modified slightly to take an argument so
that each thread can print out either One
, Two
,
or Main
as before. Running this
produces similar output to the previous example. (It’s not identical, of
course, because these examples produce slightly different output every
time they run.)
We used a different overload of the Start
method—we’re
now passing an argument. And less obviously, we’re using a different
constructor overload for Thread
too—Example 16-1 used a constructor
that accepts a delegate to a method taking zero arguments, but Example 16-2 uses an overload that
accepts a delegate to a method that takes a single object argument.
This overload provides one way of passing information into a thread
when you start it—the argument we pass to Start
is passed on to the Go
method here.
This example illustrates an important point: multiple threads can
be inside the same function at any time. All three threads in Example 16-2 spend most of their time
inside the Go
method. But since each
thread gets its own stack, the values of the name
argument and the loop variable (i
) can be different for each thread.
Information that lives elsewhere is not intrinsically private to
one thread. Example 16-3
shows another variation on our example. As with Example 16-2, it uses a common Go
method to run a loop on all three threads,
but the loop variable (i
) is now a
static field of the Program
class—all
three threads share the same variable.
Example 16-3. Erroneous sharing of state between threads
using System; using System.Threading; class Program { // Visible to all threads. (Bad, in this example.) static int i; static void Main(string[] args) { i = 0; Thread t1 = new Thread(Go); Thread t2 = new Thread(Go); t1.Start("One"); t2.Start("Two"); Go("Main"); } static void Go(object name) { // Modifying shared state without suitable protection - bad! for ( ; i < 100; ++i) { Console.WriteLine("{0}: {1}", name, i); } } }
This example has a problem: all three threads will try to read and
write the shared field, and things often go wrong when you do this. You
might think that with three threads all sharing a single common counter,
with each thread incrementing that counter every time they loop and each
thread running until the counter hits 100, we’d just see all the numbers
from 0 to 99 once. But it’s not quite that simple. For one thing, you
might see all three threads print out 0, because they may all get to the
point where they’re trying to print out the first value before any of
them has gotten as far as trying to increment the counter. (Remember, a
for
loop executes its iterator
clause—the ++i
in this example—at the end of
each iteration.) Then again you might not see that—it all really depends
on when the OS scheduler lets the threads run. But there’s a subtler
problem: if two threads both attempt to execute the ++i
at the same time, we may see anomalous
results—the value of i
may end up being lower than the number of
times it has been incremented, for example. If you want to share state
between threads, you’ll need to use some of the synchronization
mechanisms discussed later in this chapter.
Be aware that using local variables is not necessarily a guarantee
that the state you’re working with lives on the stack. For example, when
using reference types (and most types are reference types) you need to
keep in mind the distinction between the variable that contains the
reference and the object to which that reference refers. Example 16-4 uses nothing but local
variables, but ends up using the same StringBuilder
object from all three
threads—each thread might have its own local variable to refer to that
object, but all three variables refer to the same object.
Example 16-4 does
something slightly unusual with the Thread
constructor. Our Go
method now requires two arguments—the
StringBuilder
and the name—but Thread
doesn’t
provide a way to pass in more than one argument; we get to choose an
argument count of either zero or one. So we’re using a lambda here to
provide a zero-argument method for Thread
, and that lambda passes the two
arguments into Go
, including the
new StringBuilder
argument. It has
also enabled us to declare that the Go
method is expecting the name to be a
string
, rather than the less
specific object
type used in the
previous example. This technique doesn’t have anything to do with
threading; it’s just a useful trick when you find yourself confronted
with an API that takes a delegate that doesn’t have enough arguments.
(And it’s not the cause of the problem here. Less concise ways of
passing the object in would have had the same problem, and so would
the use of multiple methods, which Example 16-1 illustrated.)
Example 16-4. Local variables, but shared state
using System; using System.Threading; using System.Text; class Program { static void Main(string[] args) { StringBuilder result = new StringBuilder(); // Sharing the StringBuilder between threads. BAD! Thread t1 = new Thread(() => Go(result, "One")); Thread t2 = new Thread(() => Go(result, "Two")); t1.Start(); t2.Start(); Go(result, "Main"); t1.Join(); t2.Join(); Console.WriteLine(result); } static void Go(StringBuilder sb, string name) { for (int i = 0; i < 100; ++i) { // All threads using the same StringBuilder - BAD! sb.AppendFormat("{0}: {1}", name, i); sb.AppendLine(); } } }
By the way, you’ll have noticed that this code calls Join
on both Thread
objects. The Join
method blocks until the thread has
finished—this code needs to do that because it prints the output only
once the threads are done. This is a simple example of coordinating
operations across multiple threads. However, it’s not sufficient to
avoid problems here. Looking at the output, it’s clear that all is not
well. Here are the first few lines, running on a quad-core
system:
Main: One: Two: 00 Main: 1 2 MainTwo: 3 Main: 1 2 2 Two: 3One: Two: 4 Two: One: 6Two: One: 7 One: 8 : One: 9
That’s a whole lot more chaotic than the previous examples, which
merely scrambled the order of the lines, and lost the odd increment. The
reason this has gone more obviously wrong is that with the earlier
examples, our attempt to observe the system profoundly changed its
behavior. (That happens a lot with multithreaded code.) The calls to
Console.WriteLine
were imposing some
order on the system, because the .NET Framework was forcing the threads
to take it in turns when printing their output—that’s why we don’t get
lines mixed up with one another. But Example 16-4 does all of its work in
memory using a StringBuilder
, and
since it calls Console.WriteLine
just
once when it’s done, to print the results, nothing is forcing things to
happen in any particular order, and so we can see the chaos in full
effect.
There’s another reason Console.WriteLine
is likely to have a
significant effect on behavior: it’s relatively slow. The actual work
being done by the examples so far is trivial—incrementing counters
takes very little time, and concatenating strings is more complex but
still pretty fast. Writing messages out to the screen is orders of
magnitude slower. (The same would apply if you were writing messages
to a logfile or the debugger.) So our previous examples were spending
almost all of their time inside the code added to observe the code’s
behavior, and almost no time executing the behavior we were hoping to
observe.
This sort of problem makes multithreaded code remarkably resistant to debugging—it is far more sensitive to the observer’s paradox than other kinds of code. Generally speaking, as soon as you try to examine a threading bug, the bug goes away. (It comes back as soon as you stop looking, of course. Plenty of systems have gone live with debugging code compiled in because the debugging code made certain problems “go away.” Don’t rely on that—threading problems that appear to go away when you haven’t really fixed them are really just hiding.)
Just to be clear, the reason for the chaos is that even though
each thread has its own local sb
variable held privately on that thread’s stack, they all refer to the
same StringBuilder
object—we have
three references to the same object. All three threads are trying to add
output to the same StringBuilder
at
the same time, and the result is a mess.
You need to be absolutely clear in your head on where the
information you’re working with lives, where it came from, and whether
other threads might be able to see it. Objects created by a particular
thread are OK as long as you never make them visible to other
threads—Example 16-4 hit problems
because the main thread created a StringBuilder
and then arranged for it to be
accessible from the other two threads.
This means you need to be especially careful when using nested
methods—either anonymous delegates or lambdas—because these provide a
way to share local variables between threads. Example 16-5 shows how the
problem in Example 16-3
can happen even with value type local variables. It has just one loop
count variable (i
) shared between all
the threads, but it does this without making it a field—it’s a local
variable.
Example 16-5. Local value type variables as shared state
using System; using System.Threading; class Program { static void Main(string[] args) { // Visible to all threads, thanks to use of // anonymous method. (Bad, in this example.) int i = 0; ParameterizedThreadStart go = delegate(object name) { // Modifying shared state without suitable protection - bad! for (; i < 100; ++i) { Console.WriteLine("{0}: {1}", name, i); } }; Thread t1 = new Thread(go); Thread t2 = new Thread(go); t1.Start("One"); t2.Start("Two"); go("Main"); } }
This example demonstrates that while it may be convenient to think
of value type local variables as living on the stack, it’s not always
true. Example 16-5
contains an anonymous method that makes use of the local i
variable declared by the containing method
(Main
), so the C# compiler has been
obliged to convert that variable into a field inside a generated class,
in order to make it possible for that one variable to be used from
multiple methods.
To summarize: information that really does live on the stack is private to a particular thread. Unfortunately, using local variables doesn’t necessarily guarantee that the state you’re working with is on the stack. Be wary of reference types—no matter where the reference lives, the thing it refers to will not be on the stack, so you need to understand what other code might have a reference to the object you’re using. Be wary of value types whose implementation you do not control—value types are allowed to contain fields of reference types, so you’re not guaranteed to be safe just because you’re using a value type. And be wary of lambdas and anonymous methods—they can move information off the stack and into a place where it’s accessible to multiple threads at once. We’ll see later what to do if you really have to share information across threads.
The examples we’ve seen so far create threads explicitly in order
to illustrate the operation of multiple threads. But .NET often creates
threads automatically without you having created Thread
objects. The most obvious example is
the thread that the .NET Framework calls your Main
method on, but there are others—some of
the asynchronous communication mechanisms we saw in Chapter 13 call back into your code on different threads
than the one you started work on. We’ll be seeing more examples of this
later in the chapter when we examine .NET’s Asynchronous Programming
Model.
In fact, it’s relatively unusual to create new threads explicitly. If you need concurrent execution and you’re not using some part of the .NET Framework that supplies you with threads when you need them, it’s often better to use the thread pool or the Task Parallel Library, both of which we’ll see later.
One problem with explicit thread creation is in knowing how many to create. Threads are relatively expensive—each one consumes system resources, and there are also factors that can limit the number of threads in a single process. There’s also a cost in switching between threads—the context switch that occurs when the OS scheduler moves a thread from one logical processor to another. If you have many more runnable threads than logical processors, you’ll pay this cost on a very regular basis, and it can start to have a significant effect on throughput. In an ideal world you would have no more threads than logical processors, avoiding any context switch overhead. However, most threads block from time to time, so in reality you tend to need more threads than logical processors if you want to fully use your CPU cycles. In general, you should try to keep the thread count as low as is practical—a single program that creates more than a handful per logical processor is likely to have problems.
Never build a service that creates a new thread for each incoming request. This is a classic rookie mistake, as it seems like an obvious thing to do. It appears to work for light workloads, but it runs straight into two of the biggest performance problems you can hit with threads. First, creating threads is expensive, so if each thread exists only for as long as it takes to handle a single request, you risk spending more CPU time on setting up and destroying threads than on useful work. Second, this approach doesn’t limit the number of threads, so as the system gets busy, its performance will get disproportionately worse thanks to context switching overhead and the memory footprint of the resources associated with each thread. You can avoid these problems by using either the asynchronous patterns or the thread pool techniques described later in this chapter.
Creating just enough threads is often hard, because getting the balance right depends on things such as your application’s current workload, other work in progress on the machine, and characteristics of the machine itself. Fortunately, .NET provides the thread pool to make this sort of thing easier.
The .NET Framework provides a thread pool, which is a collection of worker threads available to perform short pieces of work. The thread pool continuously adjusts the number of threads that are allowed to process work items simultaneously in an attempt to optimize throughput.
The exact algorithm used to adjust the thread count is not documented, but as a general rule, if the system is not busy, work will typically be serviced very quickly after you queue it up. But as the computer becomes busier, items will sit in the queue for longer—the thread pool tries to avoid the overheads of preemption, thread switching, and resource contention by not running too much concurrent work. When a system is already busy, trying to process more work items would probably slow it down further, and so keeping items queued up and processing fewer at a time is likely to result in better overall performance.
The simplest way to use the thread pool is through its QueueUserWorkItem
method. Example 16-6 shows
a modification to the previous examples—rather than creating threads, it
uses the thread pool. QueueUserWorkItem
takes a delegate to any
method that accepts a single object as its argument, so it’s happy with
the same Go
method as Example 16-2. (Unlike the Thread
constructor, there’s no overload that
accepts a method without an argument—the thread pool insists on there
being an argument whether you have a use for one or not.)
Example 16-6. Queuing work items for the thread pool
static void Main(string[] args) { ThreadPool.QueueUserWorkItem(Go, "One"); ThreadPool.QueueUserWorkItem(Go, "Two"); Go("Main"); // Problem: not waiting for work items to complete! }
This example has a problem: if the main thread finishes first, the program may exit before the thread pool work items complete. So this only illustrates how to start work on the thread pool. This might not be a problem in practice—it depends on your application’s typical life cycle, but you may need to add additional code to coordinate completion. Running on a quad-core machine, this particular example behaves in much the same way as the previous ones, because the thread pool ends up creating a thread for both work items. On a single-core machine, you might see a difference—it could decide to let the first item run to completion and then run the second afterward.
The thread pool is designed for fairly short pieces of work. One of the most important jobs it was originally introduced for was to handle web requests in ASP.NET web applications, so if you’re wondering how much work constitutes “fairly short,” a reasonable answer is “about as much work as it takes to generate a web page.”
.NET 4 introduces a new way to use the thread pool, the Task
Parallel Library, which offers a couple of advantages. First, it handles
certain common scenarios more efficiently than QueueUserWorkItem
. Second, it offers more
functionality. For example, tasks have much more comprehensive support
for handling errors and completion, issues Example 16-6 utterly fails to
address. If the main thread finishes before either of the work items is
complete, that example will simply exit without waiting for them! And if
you want the main thread to discover exceptions that occurred on the
thread pool threads, there’s no easy way to do that. If any of these
things is important to you, the Task Parallel Library is a better way to
use the thread pool. There’s a whole section on that later in this
chapter. For now, we’ll continue looking at some aspects of threading
that you need to know, no matter what multithreading mechanisms you may
be using.
Not all threads are equal. Some work can be done on only certain threads. For example, WPF and Windows Forms both impose a similar requirement: an object representing something in the user interface must be used only on the thread that created that object in the first place. These objects have thread affinity, meaning that they belong to one particular thread.
Not all things with thread affinity are quite as obstinate as user interface elements. For example, while some COM objects have thread affinity issues, they are usually more flexible. (COM, the Component Object Model, is the basis of various Windows technologies including ActiveX controls. It predates .NET, and we’ll see how to use it from .NET in Chapter 19.) .NET handles some of the COM thread affinity work for you, making it possible to use a COM object from any thread. The main ways in which COM’s thread affinity will affect you are that certain objects will have different performance characteristics depending on which thread you call them on, and there may be additional complications if your COM objects use callbacks.
So thread affinity just means that the thread you’re calling on makes a difference. It doesn’t always mean that using the wrong thread is guaranteed to fail—it depends on what you’re using.
If you never write multithreaded code, you never have to worry about thread affinity—if you do everything on one thread, it will always be the right one. But as soon as multiple threads get involved—either explicitly or implicitly[41]—you may need to add code to get things back on the right thread.
ASP.NET has a similar problem. It makes contextual information about the current request available to the thread handling the request, so if you use multiple threads to handle a single request, those other threads will not have access to that contextual information. Strictly speaking, this isn’t a thread affinity issue—ASP.NET can use different threads at different stages of handling a single request—but it presents the same challenge to the developer: if you start trying to use ASP.NET objects from some random thread, you will have problems.
The .NET Framework defines a solution that’s common to WPF,
Windows Forms, and ASP.NET. The SynchronizationContext
class can help you out
if you find yourself on the wrong thread when using any of these
frameworks. Example 16-7
shows how you can use this in an event handler for a GUI application—the
click handler for a button, perhaps.
Example 16-7. Handling thread affinity with SynchronizationContext
SynchronizationContext originalContext = SynchronizationContext.Current; ThreadPool.QueueUserWorkItem(delegate { string text = File.ReadAllText(@"c: emplog.txt"); originalContext.Post(delegate { myTextBox.Text = text; }, null); });
The code reads all the text in from a file, and that’s something that might take awhile. Event handlers in WPF and Windows Forms are called on the thread that the event source belongs to—a UI thread. (Or the UI thread if, like most desktop applications, you have only one UI thread.) You should never do slow work on a UI thread—thread affinity means that if your code is busy using that thread, none of the UI elements belonging to that thread will be able to do anything until you’re finished. The user interface will be unresponsive for as long as you keep the thread busy. So Example 16-7 uses the thread pool to do the work, keeping the UI thread free.
But the code wants to update the UI when it has finished—it’s
going to put the text it has retrieved from the file into a text box. Since a text box is a UI element, it has thread
affinity—we can update it only if we’re on the UI thread. This is where
SynchronizationContext
comes to the
rescue.
Before starting the slow work, Example 16-7 reads the SynchronizationContext
class’s Current
property. This static property returns
an object that represents the context you’re in when you read
it—precisely what that means will depend on what UI framework you’re
using. (The object you get back works differently depending on whether
your application uses WPF, Windows Forms, or ASP.NET.) But the exact
implementation doesn’t matter—you just need to hold on to it until you
need to get back to that context. Having grabbed the context while we
were in the click handler, we then kick off the work in the thread pool.
And once that work is complete, it calls the stored SynchronizationContext
object’s Post
method. Post
takes a delegate, and it’ll invoke that
delegate back in whatever context you were in when you grabbed the
context. So in this case, it’ll invoke our delegate on the UI thread
that the button belongs to. Since we’re back on our application’s UI
thread, we’re now able to update the text box.
There are some persistent myths surrounding threads, which sometimes lead to their overuse. As the current trend seems to be for the number of logical processors in typical machines to edge ever upward, developers sometimes feel practically obliged to write multithreaded code. Since using threads correctly is difficult and error-prone, it’s worth tackling some of these myths, in case you’re in a situation where a single-threaded solution might actually be better.
You need a thread to run code, but that’s not the only kind of work computers do. In fact, it’s a fairly unusual program that spends most of the time executing code; CPU usage of 100 percent is often a sign that a program has hung. Computers contain various kinds of specialized hardware capable of getting on with work while the CPU is either idle or off doing something else—messages can be sent and received over the network, data can be read from and written to disk, graphics can be rendered, sound can be played. Code needs to run to coordinate these activities, but that typically needs to happen at the start, when kicking off some work, and then again at the end once the work completes. In between, all the interesting work is being done by specialized hardware. The CPU has no role to play, and may well enter a low-power idle state where it’s not doing any work at all.
That’s why the fans on some computers spin up into high speed when the machine gets busy—most of the time the CPU is asleep and consuming relatively little power. The cooling system’s full capacity is needed only when the CPU is executing code for sustained periods.
Nontrivial code in real programs tends to involve multifaceted work, so the CPU might have work to do at various stages besides the start and finish, but even then, you tend to see long periods of waiting for things to happen punctuated by short bursts of CPU activity, particularly when multiple machines are involved (such as a web server and a database server). Even fast computer networks can take hundreds of microseconds to send a message, and while that may appear instantaneous to human eyes, modern CPUs are able to execute hundreds of thousands of instructions in that time. Measured against how fast CPUs run, network communications always appear glacially slow. And it’s a similar story with most I/O.
The nature of I/O is often not obvious from the way APIs are
structured. For example, look at the call to File.ReadAllText
in
Example 16-7—the
obvious way to think of that is as a method that reads all the
contents of a file off disk and returns the contents as text once it’s
finished. It seems like the thread we use to call that method is busy
doing work for as long as it takes. But in fact, most of the time the
thread is inside that method, it will almost certainly be blocked—it
won’t be in a runnable state, because it’s waiting for the disk
controller in the computer to fetch the file’s content off disk. And
unless the disk in question is a solid state drive, it could take
milliseconds to get the information into memory—that part of the
process will take orders of magnitude longer than the code inside
ReadAllText
that converts those
bytes back into a .NET string object.
Solid state drives change things only a little. You don’t need to wait for bits of metal to lumber into position, but the fact that they are physically separate components slows things down—it will take time for the disk controller hardware to send suitable messages to the drive, and for the drive to send a response. The difference between the time spent retrieving data and processing that data will not be as dramatic, but the code will still account for a small proportion.
The one situation in which this particular example might be dominated by CPU usage rather than time spent waiting for I/O is if the file in question is already in the operating system’s filesystem cache—when the OS reads data off disk it tends to keep a copy in memory for a while just in case you need it again. In that case, the disk doesn’t need to be involved at all, and it really is all CPU time. You need to be wary of this when testing performance—the first time a particular file is read will be much slower than all the rest. If you run a test hundreds of times to get a good average measurement, you’ll be ignoring the “first” and testing the “all the rest” case, but in a desktop application the one most users will notice is often the first case. For interactive code, the worst case can matter far more than the average.
An upshot of this is that using asynchronous APIs is sometimes much more effective than creating lots of threads. On the server, this can yield better throughput because you avoid the overheads associated with having more threads than you need. And on the client, it can sometimes simplify code considerably, as some asynchronous APIs let you work with a single-threaded model with no loss of responsiveness.
For years, processors managed to double CPU performance on a regular basis—a new processor would do everything that the one you bought a couple of years ago could do, but twice as fast. We were in the luxurious position where our code just got faster and faster without having to do anything, and because the growth was exponential—doubling up again and again—the cumulative effects were astounding. Computers are tens of millions of times faster than they were a few decades ago.
Sadly, that all stopped a few years ago because we started running into some harsh realities of physics. In response to this, CPU vendors have switched to providing more and more logical processors as a means of continuing to deliver processors that are, in some sense, twice as fast as the ones from a couple of years ago. They can do this because even though clock speeds have stopped doubling up, Moore’s law—that the number of transistors per chip doubles roughly every two years—is still in action for the time being.
Unfortunately, the doubling in speed between a single-core and dual-core system is hypothetical. Technically, the dual-core system might be able to perform twice as many calculations as the single-core one in any given length of time, but this is an improvement only if it’s possible to divide the work the user needs to do in a way that keeps both cores busy, and even if that’s possible, it’s effective only if other resources in the system such as memory and disk are able to provide input to the calculations fast enough for this double-speed processing.
Work often cannot progress in parallel. The second step of a calculation might depend on the results of the first step, in which case you can’t usefully run the two steps on different cores—the core running the second step would just have to sit around and wait for the result from the first step. It would probably be faster to run both steps on a single core, because that avoids the overheads of getting the results of the first step out of the first core and into the second core. Where the calculations are sequential, multiple cores don’t help.
So the nature of the work matters. Certain jobs are relatively easy to parallelize. For example, some kinds of image processing can be spread over multiple logical processors—if the processing is localized (e.g., applying a blur effect by smearing nearby pixels into one another), it’s possible for different logical processors to be working on different parts of the image. Even here you won’t get a 4x speedup on a four-core system, because some coordination might be necessary at the boundaries, and other parts of the system such as memory bandwidth may become a bottleneck, but these sorts of tasks typically see a useful improvement. However, these so-called embarrassingly parallel tasks are the exception rather than the rule—a lot of computation is sequential in practice. And of course, many problems live in a middle ground, where they can exploit parallelism up to a certain point, and no further. So there’s usually a limit to how far multithreading can help programs execute faster.
That doesn’t stop some people from trying to use as many multiple logical processors as possible, or from realizing when doing so has failed to make things better. It’s easy to be distracted by achieving high CPU usage, when the thing you really want to measure is how quickly you can get useful work done.
It’s possible to construct parallel solutions to problems that manage to use all the available CPU time on all logical processors, and yet which proceed more slowly than single-threaded code that does the same job on one logical processor. Figure 16-1 shows the CPU load reported by the Windows Task Manager for two different solutions to the same task.
The image on the left might make you feel that you’re making better use of your multicore system than the one on right. The righthand side is using far less than half the available CPU capacity. But measuring the elapsed time to complete the task, the code that produced the lefthand image took about 15 times longer to complete than the code that produced the righthand one!
The job in hand was trivial—both examples just increment a field
400 million times. Example 16-8 shows both main
loops. The Go
function is the one
that gets invoked concurrently on four threads. GoSingle
just runs multiple times in
succession to perform the iterations sequentially.
Example 16-8. Multithreaded versus single-threaded
class Program { static int Count; const int Iterations = 100000000; static void Go() { for (int i = 0; i < Iterations; ++i) { Interlocked.Increment(ref Count); } } static void GoSingle(int repeat) { for (int outer = 0; outer < repeat; ++outer) { for (int i = 0; i < Iterations; ++i) { Count += 1; } } } ...
Here’s the code that launches Go
concurrently:
Count = 0; List<Thread> threads = (from i in Enumerable.Range(0, 4) select new Thread(Go)).ToList(); threads.ForEach(t => t.Start()); threads.ForEach(t => t.Join());
This creates four threads, all of which call Go
. Next, it calls Start
on each of the threads. Having started
them all, it calls Join
on each
thread to wait for them all to complete. (We could have written three
loops here, but our use of LINQ and lambdas makes the code much more
compact. In particular, if you have a loop that invokes just one
operation on every item in a list the List<T>
class’s ForEach
method is a
less cluttered way of expressing this than a foreach
loop.) The code to launch the
single-threaded version is a lot simpler:
Count = 0; GoSingle(4);
Both produce the same result: the Count
field contains 400,000,000 at the end.
But the multithreaded version was much slower. One reason is the
difference in how the two versions increment the Count
. Here’s the line in question from the
single-threaded code:
Count += 1;
But if we try that in the multithreaded version, it doesn’t
work. It certainly makes it run nice and quickly—about three times
faster than the single-threaded version—but trying it a few times,
Count
comes to 110,460,151, then
133,533,503, then 133,888,803... The majority of increments are
getting lost—that’s the sort of thing that happens when you don’t use
suitable protection for accessing shared state. That’s why the
multithreaded version needs to do this:
Interlocked.Increment(ref Count);
The Interlocked
class in the
System.Threading
namespace provides
methods that perform certain simple operations in ways that work even
if multiple threads try to use them on the same data at the same time.
As the name suggests, Increment
increments the count, and it does this in a way that locks out any
other logical processors attempting to do the same thing at the same
time. It forces the logical processors to take it in turns.
It works—the Count
total is
correct with this code in place—but at some cost. On a quad-core
system, with all four cores burning away at 100 percent, this takes 15
times longer than the simple single-threaded solution.
In fact, the cost of Interlocked.Increment
does not fully explain
the difference. Modifying the single-threaded version to work the same
way makes it run about five times slower, but that’s still three times
faster than the multithreaded code. So a considerable amount of the
slowdown is down to the communication costs between the
processors.
Don’t take these numbers too seriously. This is clearly a
contrived example. (If we wanted it to run really fast we could just
have initialized Count
to
400,000,000 to start with, and removed the loop.) But while the
details are spurious the basic principle applies broadly: the cost of
contention between logical processors that are supposed to be
cooperating can work against you. Sometimes they merely erode the
benefit—you might see a 2.5x speedup on a quad-core system, for
example. But sometimes they really do negate the benefit—contrived
though this example may be, much worse examples have cropped up in
real systems.
Some implementations may come out worse on some systems and better on others. For example, some parallel algorithms take a considerable hit relative to their sequential counterparts in order to be able to scale well on more processors. Such an algorithm might make sense only on systems where you have a large number of processors—it might be slower than the single-threaded version on a dual-core system, but very worthwhile on a system with 16 logical processors, for example.
The bottom line is that if you want to understand whether a parallel solution is effective, you need to compare it against a single-threaded solution on the same hardware. Just because your CPU loads indicate that you’re being extremely parallel, that’s not necessarily the same as being really fast. And unless your code will only ever run on a single hardware configuration, you need to perform the same comparison on lots of different machines to get a good idea of how often a parallel solution might be the best choice.
Even when multithreaded code provides a demonstrable performance benefit, it’s very hard to get right. We’ve already seen a few bizarre behaviors in some extremely simple examples. Achieving correct behavior in a real concurrent system can be very challenging. So we’ll look at two of the most common classes of pitfalls before examining some strategies for avoiding them.
The anomalies we’ve seen so far have all been examples
of a kind of concurrency hazard known as a race,
so called because the outcome is determined by which participant gets
to a particular place first. Example 16-1 displays different output
each time it runs, because there are three threads all trying to print
to the same console window, and the only thing that determines which
one gets to print the next line is whichever happens to be the next to
call Console.WriteLine
.[42] There is no coordination between the threads, and so
they all pile in at once. It’s a relatively complicated example in
some ways, because most of the race happens where we can’t see it—that
example spends almost all of its time inside Console.WriteLine
. It’s easier to understand
races when you can see all of the code.
So consider the broken variation of Example 16-8 where the
concurrently executing Go
method used a
simple Count += 1
instead of
Interlocked.Increment
. We saw that
using the +=
operator resulted in
lost increments, but why? The +=
operator has to do
three things: first it must read the current value of Count
, then it has to add 1 to that value,
and finally it has to store the result back into the variable. The RAM
chips in your computer don’t have the ability to perform calculations,
so there’s no getting away from the fact that the value has to go into
the CPU so that it can calculate the new value, and then it has to be
written back out again so that the new value is not forgotten. There
will always be a read and then a write.
Consider what happens when two threads try to increment the same
Count
field. Let’s call them Thread
A and Thread B. Table 16-1 shows one
possible sequence of events. In this case it works out fine: Count
starts at 0, is incremented twice, and
ends up at 2.
Table 16-1. Two increments, one after the other
Count | Thread A | Thread B |
---|---|---|
0 | Read | |
0 | Add 1 (0 + 1 = 1) | |
1 | Write | |
1 | Read | |
1 | Add 1 (1 + 1 = 2) | |
2 | Write |
But it might not work so well. Table 16-2 shows what happens if the
work overlaps, as could easily happen with multiple logical
processors. Thread B reads the current Count
while Thread A was already part of the
way through the job of incrementing it. When Thread B comes to write
back its update, it has no way of knowing that Thread A has updated
the value since B did its read, so it effectively loses A’s
increment.
Table 16-2. Lost increment due to overlap
Count | Thread A | Thread B |
---|---|---|
0 | Read | |
0 | Add 1 (0 + 1 = 1) | Read |
1 | Write | Add 1 (0 + 1 = 1) |
1 | Write |
There are lots of variations on the order, some of which work fine and some of which fail. If your code makes possible an ordering that produces wrong results, sooner or later you’ll run into it.
Don’t fall into the trap of believing that a highly improbable outcome is effectively impossible. You’ll be fooling yourself—sooner or later the problem will bite. The only difference with the highly improbable problems is that they’re extremely hard to diagnose and debug.
The example shown here is about as simple as a race gets. With real code things tend to be a lot more complex, as you will probably be dealing with data structures more intricate than a single integer. But in general, if you have information which is visible to multiple threads and at least one of those threads is changing that information in any way, race conditions are likely to emerge if you don’t take steps to prevent them.
The solution to races is, on the face of it, obvious: the
threads need to take it in turns. If threads A and B simply
coordinated their operations so that either would wait until the other
was done when an update is in progress, we could avoid the problem.
Interlocked.Increment
does exactly
that, although it’s rather specialized. For the occasions when you’re
doing something more complex than incrementing a field, .NET provides
a set of synchronization mechanisms that let you force threads to take
it in turns. We’ll get to these shortly. However, this solution
introduces another class of problem.
When code waits in order to avoid stepping on other threads’ toes, it’s possible for the application to lock up, because all the threads can end up waiting for each other to finish. This tends not to happen with simple, short operations involving just a single piece of data. Lockups typically occur when a thread that already has exclusive access to some data starts waiting for something else.
The standard example involves transferring money between two bank accounts; let’s call them X and Y. Suppose two threads, A and B, are both attempting to transfer money between these two accounts; A transfers money from X to Y while B transfers from Y to X. Both threads will need to use some sort of synchronization mechanism to get exclusive access to the accounts in order to avoid race conditions of the kind previously discussed. But imagine that the following happens:
Initially, no threads are attempting to do anything to either account.
Thread A gets exclusive access to Account X.
Thread B gets exclusive access to Account Y.
Thread A attempts to get exclusive access to Account Y—it can’t because B has access, so A waits for B to relinquish Account Y.
Thread B attempts to get exclusive access to Account X—it can’t because A has access, so B waits for A to relinquish Account X.
The exact mechanism used to manage exclusive access is irrelevant because the outcome is the same: A has come to a halt waiting for B to let go of Y. But B isn’t going to let go of Y until it has managed to acquire X, and unfortunately it won’t be able to—A is in possession of X and has just come to a halt. Neither side can proceed because each is waiting for the other to let go. This is sometimes known as a deadly embrace.
This condition can cause both deadlocks and livelocks, and the distinction has to do with the mechanism used to manage exclusive access. If threads go into a blocked state while waiting for access, neither thread is runnable once we hit the deadly embrace, and that’s typically described as a deadlock—the symptom is a system that has gone idle, despite having work to be getting on with. A livelock is similar, but tends to involve synchronization mechanisms that use CPU cycles while waiting—some synchronization primitives actively poll for availability rather than blocking. Active polling is just as subject to a deadly embrace as a blocking approach, it just has different symptoms—livelocks hang with high CPU usage.
The two concurrency hazards just described—races and deadly embraces—are not the only kinds of multithreading problems. There are endless ways in which you can get into trouble in concurrent systems, so we can really only scratch the surface. For example, besides issues that can compromise the correct behavior of your code, a whole host of concurrency issues can cause performance problems. For a deep discussion of the issues and what to do about them, we recommend Concurrent Programming on Windows by Joe Duffy (Addison-Wesley).
What can we do to avoid the numerous pitfalls of multithreaded code?
There are several approaches for mitigating the difficulties of multithreading, each with a different trade-off between difficulty and flexibility.
Obviously, the simplest way to avoid the risks inherent in multithreading is not to do it at all. This doesn’t necessarily mean abandoning everything in this chapter, however. One of the asynchronous patterns can enable certain kinds of applications to get some of the benefits of asynchrony while sticking with a single-threaded programming model.
If you’re going to have multiple threads, a good way to keep things simple is to avoid sharing information between them. ASP.NET encourages this model—it uses the thread pool to handle multiple requests simultaneously, but by default each individual request runs your code on just one thread. (You can opt into an explicitly asynchronous model if you want to use multiple threads per request, but for straightforward scenarios the single-threaded style is best.) So although the web application as a whole is able to run multiple concurrent threads, those threads don’t interact.
This approach requires some discipline. There’s nothing in .NET that enforces isolation[43]—you simply have to choose not to share data between threads. In a web application, that’s relatively easy because HTTP naturally discourages stateful communications, although if you start using caching techniques to improve performance, you lose some isolation because all your requests will end up using shared objects in your cache. And any information in a static field (or any object reachable directly or indirectly from a static field) is potentially shared.
Chances are good that most multithreaded applications will have at least some information that needs to be accessed by several threads, so complete isolation may not be realistic. But maximizing isolation is a good idea—keeping as much information as possible local to individual threads means not having to worry about concurrency hazards for any of that information.
When you really have to share data, you can often avoid many
concurrency hazards by sharing only immutable
data, that is, data that cannot be altered. Fields marked with
readonly
cannot be modified after
construction—the C# compiler enforces this—so you don’t have to worry
about whether those fields are being changed by other threads as you
try to use them. You need to be careful, though—readonly
applies only to the field itself,
and not the object the field refers to if it’s a reference type. (And
even if the field is a value type, if that value itself contains
fields of reference types, the objects being referred to are not
affected by readonly
.) So as with
isolation, this is an option that requires some discipline.
If you’re writing multithreaded code, sooner or later you will probably need to have at least some information that is accessible to multiple threads, and which occasionally needs to be changed—isolation and immutability are sometimes simply not options. In this case, you’ll need to synchronize access to the data—for example, anytime shared information is being modified, you’ll need to make sure no other threads are trying to read or write that information. This requires the most discipline of any of the solutions described here, and is likely to be the most complex, but it offers the most flexibility.
The .NET Framework provides a wide range of features to help you synchronize the way your threads access shared information, and these are the topic of the next section.
There are two important ways in which the operations of multiple threads may need to be coordinated. When you have shared modifiable data, it needs to be possible to make threads take it in turns to access that data. But it’s also often important for threads to be able to discover when something has happened—a thread might want to enter a blocking state until such time as it has useful work to do, for example. So some synchronization primitives provide notification rather than exclusive access. Some offer a combination of the two.
The most widely used synchronization primitive in .NET is the monitor. It is supported directly by the .NET Framework—any object can be used with this facility—and also by C#, which provides a special keyword for working with monitors. Monitors offer both mutual exclusion and notification.
The simplest use of a monitor is to ensure that threads take it in turns to access shared state. Example 16-9 shows some code that would need the kind of protection a monitor can provide before we could use it from multiple threads. It is designed for handling lists of recently used strings—you might use this sort of code to provide a recently used file list on an application’s File menu. This code makes no attempt to protect itself in the face of multithreading.
Example 16-9. Code unsuitable for multithreading
class MostRecentlyUsed { private List<string> items = new List<string>(); private int maxItems; public MostRecentlyUsed(int maximumItemCount) { maxItems = maximumItemCount; } public void UseItem(string item) { // If the item was already in the list, and isn't the first // item, remove it from its current position, since we're // about to make it this first item. int itemIndex = items.IndexOf(item); if (itemIndex > 0) { items.RemoveAt(itemIndex); } // If the item's already the first, we don't need to do anything. if (itemIndex != 0) { items.Insert(0, item); // Ensure we have no more than the maximum specified // number of items. if (items.Count > maxItems) { items.RemoveAt(items.Count - 1); } } } public IEnumerable<string> GetItems() { return items.ToArray(); } }
Example 16-10 is some test code to exercise the class.
Example 16-10. Testing the MostRecentlyUsed class
const int Iterations = 10000; static void TestMru(MostRecentlyUsed mru) { // Initializing random number generator with thread ID ensures // each thread provides different data. (Although it also makes // each test run different, which may not be ideal.) Random r = new Random(Thread.CurrentThread.ManagedThreadId); string[] items = { "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight" }; for (int i = 0; i < Iterations; ++i) { mru.UseItem(items[r.Next(items.Length)]); } }
Example 16-10 just feeds
in strings from a fixed set of items at random. Calling this test
function from just one thread produces the expected results: at the end,
the MostRecentlyUsed
object just
returns the most recent items put into it by this test. However, the
multithreaded test in Example 16-11
causes something quite different to happen.
Example 16-11. Executing a multithreaded test
MostRecentlyUsed mru = new MostRecentlyUsed(4); const int TestThreadCount = 2; List<Thread> threads = (from i in Enumerable.Range(0, TestThreadCount) select new Thread(() => TestMru(mru))).ToList(); threads.ForEach(t => t.Start()); threads.ForEach(t => t.Join()); foreach (string item in mru.GetItems()) { Console.WriteLine(item); }
This example crashes on a multicore machine—after awhile, it
throws an ArgumentOutOfRangeException
. It doesn’t crash
at the same place on every run; it crashes inside either of the two
calls to the List<T>
class’s
RemoveAt
method.
The exceptions occur due to races. For instance, consider this line of code from Example 16-9:
items.RemoveAt(items.Count - 1);
This reads the value of the Count
property, then subtracts 1 to get the
index of the last item in the list, and then removes that last item. The
race here is that some other thread may manage to remove an item from
the list in between this thread reading the Count
property and calling RemoveAt
. This causes the method to throw an
ArgumentOutOfRangeException
, because
we end up asking it to remove an item at an index that’s after the final
item.
In fact, we’re lucky we got an exception at all. The List<T>
class makes no guarantees when
we use it from multiple threads. Here’s what the documentation for the
class says in the Thread Safety section:
Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
This means that it’s our problem to make sure we never try to use
a List<T>
instance from more
than one thread at a time. It could fail in subtler ways than
crashing—it could corrupt data, for example.
List<T>
is not unusual.
Most types in the .NET Framework class library make no guarantees of
thread safety for instance members.
We could add similar documentation to our MostRecentlyUsed
class, declaring that it does
not make any guarantees either. In fact, that might well be the best
option—it’s very difficult for an individual class to guarantee to work
correctly in all possible multithreading scenarios. Only the application
that uses the class really knows what constitutes correct behavior. For
example, it might be necessary for a MostRecentlyUsed
object to be kept in sync
with some other object, in which case the application is going to have
to manage all synchronization itself, and there’s nothing useful that
our class could do on its own. This is one reason why the lack of thread
safety guarantees is so widespread in the class libraries—there isn’t a
good general-purpose definition of thread-safe for
individual types.
If we decide to make it the application’s problem, how would that look? We don’t have a real application here, only a test, so our test code would need to synchronize its calls into our object. Example 16-12 shows a suitably modified version of the test method from Example 16-10. (Note that Example 16-11 adds code that also uses the same object, so you might think we need to make a similar modification there. However, it waits until all the test threads have finished before touching the object, so its reads won’t overlap with their writes, making locking superfluous. Therefore, Example 16-12 is sufficient in this case.)
Example 16-12. Synchronization in the calling code
static void TestMru(MostRecentlyUsed mru) { Random r = new Random(Thread.CurrentThread.ManagedThreadId); string[] items = { "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight" }; for (int i = 0; i < Iterations; ++i) { lock (mru) { mru.UseItem(items[r.Next(items.Length)]); } } }
The only modification here is to wrap the call to the MostRecentlyUsed
type’s UseItem
method with a
lock
block. The C# lock
syntax generates some code that uses the
Monitor
class, along with some
exception handling. Here’s what the lock
block in Example 16-12 is equivalent
to:
MostRecentlyUsed referenceToLock = mru); bool lockAcquired = false; try { Monitor.Enter(referenceToLock, ref lockAcquired); mru.UseItem(items[r.Next(items.Length)]); } finally { if (lockAcquired) { Monitor.Exit(referenceToLock); } }
(This is what C# 4.0 generates. Older versions do something
slightly simpler that mishandles an obscure and unusual failure mode.
But the basic idea is the same in either case. The generated code copies
the mru
reference into a separate
variable to ensure correct operation even if the code inside the lock
block were to change mru
.)
The documentation says that Monitor.Enter
acquires an exclusive lock on
the object passed as the first argument, but what exactly does that
mean? Well, the first thread to do this will find that Monitor.Enter
returns immediately. Any other
threads that try to make the same call on the same object will be made
to wait—Monitor.Enter
on those other
threads will not return until the thread that currently owns the lock
releases it by calling Monitor.Exit
.
Only one thread can hold the lock at any time, so if multiple other
threads are waiting for the same object’s lock in Monitor.Enter
, the .NET Framework will pick
just one as the next owner of the lock, and the other threads will
continue to be blocked.
Holding a lock on an object has only one effect: it prevents any other thread from acquiring a lock on that object. It does nothing else. In particular, it does not prevent other threads from using that object. So it would be a mistake to think that acquiring the lock on an object means you have locked the object. That may sound like splitting hairs, but it’s the difference between working and broken code.
Use of monitors is entirely a matter of convention—it is up to you
to decide which objects’ locks you use to protect which information.
Example 16-12 happens to
acquire a lock on the very object whose state is being protected, but in
fact, it’s quite common—preferable, even—to create separate
objects whose only job is to be the thing on which you acquire a lock.
There are a couple of reasons for this. First, you’ll often want
multiple pieces of data to fall under the protection of a single
lock—perhaps updates to our MostRecentlyUsed
object need to be done in
conjunction with changes to other state within the application, such as
a history-tracking service. When multiple objects are involved,
arbitrarily choosing one of those objects to act as the lock target is
likely to make your code harder to follow, because it may not be clear
to someone reading your code that you’re using that object’s lock to
protect multiple objects rather than just the one whose lock you’re
acquiring. If you create a special object whose only purpose is locking,
this makes it clear to anyone reading your code that she needs to think
about what state that lock protects.
The other reason to avoid acquiring a lock on the object you want
to synchronize access to is that you can’t always be sure that the
object doesn’t acquire a lock on itself—some developers write lock(this)
inside instance methods when trying
to make thread-safe objects (for whatever definition of thread-safe they
have chosen). It is a bad practice to acquire a lock on your this
reference, of course, because you can’t
necessarily know whether someone using your object will decide to try to
acquire a lock on it for his own purposes—internal locking is an
implementation detail, but your this
reference is public, and you don’t usually want implementation details
to be public.
So in short, you shouldn’t try to acquire a lock on any object you are trying to synchronize access to.
Bearing all that in mind, what should we do if we want to try to
make our MostRecentlyUsed
class more robust in
multithreaded environments? First, we need to decide what multithreading
scenarios we want to support. Simply declaring that we want the type to
be thread-safe is meaningless.
So let’s say that we want to allow multiple threads to call
UseItem
and GetItems
simultaneously without causing
exceptions. That’s a pretty weak guarantee—notice we’ve said nothing
about what state the object will be in afterward, merely that it won’t
actually blow up. Surely it would be better to guarantee to handle the
calls in the order in which they were made. Unfortunately, we can’t do
this if the locking logic lives entirely inside the class. The OS
scheduler might decide to preempt a thread moments after it called
UseItem
, and before it has had a
chance to get to any of our synchronization code.
For example, consider what could happen if Thread A calls UseItem
, and then before that call returns,
Thread B also calls UseItem
, and
before either returns, Thread C calls GetItems
. It’s fairly easy to think of at
least five reasonable outcomes. GetItems
might return neither of the items
passed in by A and B. It might return both, and there are two ways it
could do this—GetItems
returns an
ordered list and either A or B might come first. Or it could return just
one—either the one passed by A or just the one passed by B. If you need
coordination across multiple calls like this it’s
not going to be possible to do that inside MostRecentlyUsed
, because you only have the
opportunity to start synchronization work once calls are already
underway. This is another reason why synchronization code usually
belongs at the application level and not in the individual objects. So
about the best we can hope to achieve within this class is to prevent
the exceptions from occurring when it’s used from multiple threads.
Example 16-13 does this.
Example 16-13. Adding locking to a class
class MostRecentlyUsed { private List<string> items = new List<string>(); private int maxItems; private object lockObject = new object(); public MostRecentlyUsed(int maximumItemCount) { maxItems = maximumItemCount; } public void UseItem(string item) { lock (lockObject) { // If the item was already in the list, and isn't the first item, // remove it from its current position, since we're about to make // it this first item. int itemIndex = items.IndexOf(item); if (itemIndex > 0) { items.RemoveAt(itemIndex); } // If the item's already the first, we don't need to do anything. if (itemIndex != 0) { items.Insert(0, item); // Ensure we have no more than the maximum specified // number of items. if (items.Count > maxItems) { items.RemoveAt(items.Count - 1); } } } } public IEnumerable<string> GetItems() { lock (lockObject) { return items.ToArray(); } } }
Notice that we added a new field, lockObject
, which holds a reference to an
object whose only job is to be the thing on which we acquire a lock. And
we simply acquire this lock inside the methods that work with the list
of items. We have to hold the lock for the whole of the UseItem
method, because the code looks at the
state of the items list right at the start, and then the rest of its
operation is guided by what it found. The code simply won’t work if the
items list changes halfway through, and so we hold on to the lock for
the duration.
In this particular case, holding the lock for the whole method is unlikely to cause problems because this method won’t take long to run. But as a general rule, you want to avoid holding locks for any longer than necessary. The longer you hold a lock, the greater the chances of some other thread wanting to acquire the same lock, and being forced to wait. It’s a particularly bad idea to call code that might make a request over a network and wait for a response while you’re holding a lock (e.g., holding a lock while making a request to a database).
Be particularly wary of acquiring multiple locks—holding on to a lock while attempting to acquire another is a good recipe for deadlock. Sometimes it’s inevitable, though, in which case you need to devise a strategy to avoid deadlocks. That’s beyond the scope of this book, but if you find yourself in this situation lock leveling is a suitable solution to this problem—searching the Web for “lock leveling for multithreading” would be a good place to start.
As we mentioned several pages ago, the Monitor
class isn’t just about locking. It
also provides a form of notification.
Suppose we want to write some code that tests our
MostRecentlyUsed
class in
multithreaded scenarios. Even relatively simple tests pose a
challenge: for example, what if we want to verify that after a call to
UseItem
has returned
on one thread, the item it passed in becomes visible as the first item
returned if some different thread calls GetItems
? We’re not testing concurrent
use—we’re just testing sequential operations, where one thing happens
on one thread and then something else happens on another. How would we
write a test that coordinated these steps across threads? We need one
thread to wait until the other has done something. We could just use
the Thread
class’s Join
method again,
waiting for the first thread to exit. But what if we don’t want to let
it exit? We might want to perform a sequence of operations, with each
thread taking it in turns.
Monitors can help with this—as well as protecting shared state,
they also provide a way to discover when that state may have changed.
The Monitor
class provides a
Wait
method that
operates in conjunction with either a method called Pulse
or the related
PulseAll
. A thread that is waiting
for something to change can call Wait
, which will block until some other
thread calls Pulse
or PulseAll
. You must already hold the lock on
the object you pass as an argument to Wait
, Pulse
, or PulseAll
. Calling them without possessing
the lock will result in an exception.
Example 16-14 uses this
mechanism to provide the ability for one thread to wait for a second
thread to do something. The class’s only interesting state is a single
bool
field, canGo
, which is initially false
but will be set to true
when the second thread does whatever
we’re waiting for—that thread will call GoNow
to indicate this. Since this field is
going to be used from multiple threads, we need synchronization, so
WaitForIt
also has a lockObject
field which refers to an object
whose only job is to be the object for which we acquire a lock in
order to protect access to canGo
.
You should never attempt to acquire a lock directly on a
bool
, or on any other value type.
You can acquire a lock only on a reference type, so if you attempt
to pass a bool
to Monitor.Enter
, the C# compiler will do
what it always does when you pass a value to a method that expects
an object: it will create code that generates a box for the bool
, as we saw in Chapter 4. You would be acquiring
a lock on that box, not on the bool
itself. That’s a problem, because you
get a new box every time, and so your locking would do nothing
useful.
The lock
keyword in
C# prevents you from trying to acquire a lock on a value—you’ll get
a compiler error if you try. But if you call the Monitor
class’s
methods directly, C# will not prevent you from making this mistake.
So this is another good reason to get into the habit of creating an
object that is separate from the state it protects, and acquiring
locks on that object.
Example 16-14. Coordinating threads with Monitor
class WaitForIt { private bool canGo; private object lockObject = new object(); public void WaitUntilReady() { lock (lockObject) { while (!canGo) { Monitor.Wait(lockObject); } } } public void GoNow() { lock (lockObject) { canGo = true; // Wake me up, before you go go. Monitor.PulseAll(lockObject); } } }
Both methods in this example acquire the lock before doing
anything, because both inspect the canGo
field, and we expect these to be
called on different threads. WaitUntilReady
then sits in a loop
until that field is true
. Each time
it goes around the loop, it calls Monitor.Wait
. This has three effects: first,
it relinquishes the lock—that’s important, because otherwise, the
thread that called GoNow
would never get
as far as setting the canGo
field;
second, it makes the thread calling WaitUntilReady
block
until some other thread calls either Pulse
or PulseAll
for lockObject
; third, when Wait
returns, it reacquires the lock.
Why use a loop here? Wouldn’t an if
statement followed by a single call to
Wait
work? In this case it would,
but in general it’s surprisingly easy to end up generating spurious
notifications. Suppose we modified this example so that as well as
offering a GoNow
, we had a third
method called OhHangOnAMinute
which put the canGo
field back to
false
—the class becomes a gate
which can open and close. It would be possible that by the time
WaitUntilReady
is woken up after
a call to GoNow
, the field had
already transitioned back to false
because of a call to OhHangOnAMinute
.
And while that can’t happen with this simpler example, in general it’s good to get in the habit of checking to see if the condition you were waiting for really holds when you come out of a wait, and be prepared to wait again if it doesn’t.
The GoNow
method acquires the
lock to make sure it’s safe to modify the canGo
field, which it sets to true
. Then it calls PulseAll
—this tells the .NET Framework to
wake up all threads currently waiting on lockObject
as soon as we release the lock.
(Pulse
would just release a single
thread, but since our WaitForIt
class just has two states—not ready and ready—it needs to release all
waiting threads when it becomes ready.) GoNow
then returns, releasing the lock as
the flow of execution leaves the lock
block, which means that any threads
waiting inside WaitUntilReady
are
now no longer blocked waiting for the pulse.
However, if multiple threads are waiting, they won’t all start
running at once, because Monitor.Wait
reacquires the lock before
returning. It relinquishes the lock only temporarily while it waits—it
insists that we hold the lock before calling it, and we will be
holding the lock again when it returns. Consequently, if PulseAll
happened to release multiple
threads, they still have to take it in turns as they come out of
Wait
.
When WaitUntilReady
gets to
proceed, the loop will check canGo
again, and this time it will be true
and the loop will finish. The code will
then leave the lock
block,
releasing the lock on lockObject
,
enabling the next waiting thread (if there are any) to do the same
thing—and so all waiting threads will become unblocked one after
another.
The monitor’s close integration between locking and
notification may seem a little odd—it’s even getting in our way
here. This example would work perfectly well if all waiting threads
were released simultaneously, instead of having to wait while they
acquire the lock in turn. But in fact, the combined locking and
notification is critical to most uses of Pulse
and Wait
. Notifications concern a change to
shared state, so it’s vital that code that raises notifications be
in possession of the lock, and also that when code discovers a
notification, it is in possession of the lock so that it can look at
the modified state immediately. Without this, all sorts of subtle
races can occur in the gap between notification and lock acquisition
or the gap between releasing a lock and waiting for notification.
Example 16-15 shows a simple program that
uses the WaitForIt
class from Example 16-14. It creates a thread
that waits for a while and then calls the GoNow
method. The main thread waits for that
to happen by calling the WaitUntilReady
method after starting the
thread.
Example 16-15. Using WaitForIt
class Program { static void Main(string[] args) { WaitForIt waiter = new WaitForIt(); ThreadStart twork = delegate { Console.WriteLine("Thread running..."); Thread.Sleep(1000); Console.WriteLine("Notifying"); waiter.GoNow(); Console.WriteLine("Notified"); Thread.Sleep(1000); Console.WriteLine("Thread exiting..."); }; Thread t = new Thread(twork); Console.WriteLine("Starting new thread"); t.Start(); Console.WriteLine("Waiting for thread to get going"); waiter.WaitUntilReady(); Console.WriteLine("Wait over"); } }
The output shows why this sort of coordination is often necessary:
Starting new thread Waiting for thread to get going Thread running... Notifying Notified Wait over Thread exiting...
Notice that the new thread didn’t start up immediately—the main
thread prints its “Waiting for thread to get going” message after
calling Start
to run the
thread, but this message appears before the new thread prints “Thread
running...” which is the very first thing that thread does. In other
words, just because the Thread
class’s Start
method has returned,
you have no guarantee that the newly created thread has actually done
anything yet. Only through the use of coordination mechanisms such as
Wait
and Pulse
can we impose some kind of order
across multiple threads.
Never use Thread.Sleep
to
try to solve ordering problems in production code—it’s not a
dependable or efficient technique. Example 16-15 uses it to make the coordination
problems more visible, but while it can be used to amplify or
explore problems in examples, you cannot rely on it, because it
makes no guarantees—making one thread sleep for a while to give
another thread a chance to catch up does not guarantee that the
other thread will catch up, particularly on
systems that experience heavy load.
The main thread happens not to get to run immediately after the
other thread calls GoNow
. (Or at
least, if it did run, it didn’t run for long enough to print out its
“Wait over” message—the other thread got in there first with its
“Notified” message.) You might see slightly different results each
time you run. Even though we can impose a little bit of order there’s
still going to be quite a lot of unpredictability in the exact order
of events. As you design multithreaded code, it’s important to be very
clear about how much order you are imposing with locking and
notifications—in this case, we are guaranteeing that our main thread
cannot possibly get to the line that prints out “Wait over” before the
second thread has reached the line that calls GoNow
, but that’s the only constraint—the
progress of the two threads could still be interleaved in numerous
different ways. You should never assume that the detailed order of
events you observe in practice will necessarily always happen.
While the Monitor
class and
the C# lock
keyword are the most
widely used synchronization mechanisms, there are some
alternatives.
Monitors are very useful and are typically a good first choice for locking, but there are some scenarios in which more specialized alternatives might offer slightly better performance or greater flexibility. Since it’s relatively unusual to use these alternatives, we won’t illustrate them in detail—this section will just describe what’s there and when it might be useful to you.
To understand why the alternatives exist, it’s useful to know something more about the capabilities and limitations of monitors. Monitors are designed for use within a single appdomain—you cannot use them to synchronize operations between processes, nor between appdomains sharing a process. Cross-process coordination is possible in Windows, but you need to use other mechanisms to do that. One reason for this is that monitors try to avoid getting the OS scheduler involved in locking where possible.
If your code hits a lock
statement (or calls Monitor.Enter
directly) for an object on which no other thread currently has a lock,
the .NET Framework is able to handle this situation efficiently. It does
not need to make calls into the operating system. Monitors can do this
because they’re appdomain-local; to ensure cross-appdomain
synchronization typically means getting help from the OS, and once you
need to call into the OS, lock acquisition becomes many times more
expensive. So when there’s no contention, monitors work really well. But
once blocking occurs—either due to contention or because of an explicit
call to Wait
—the OS scheduler has to
get involved because it’s the only thing that can move a thread between
runnable and blocked states. This is usually a good thing, because it
means the thread can wait efficiently; once a thread becomes blocked it
doesn’t consume CPU cycles, and the CPU is free to get on with other
useful work, or it may be able to go into its power-efficient idle state
when no threads need to run, which is particularly important for laptops
running on battery power. However, there are some situations where the
cost of getting the OS involved outweighs the benefits. This brings us
to the first of the specialized lock types.
SpinLock
, which is
new in .NET 4, provides similar locking functionality to monitors, but
when contention occurs it will just sit in a loop checking and
checking and checking again to see if the lock has become free yet—the
thread will consume CPU cycles while it does this. The “spin” in
SpinLock
refers to the code
spinning around in this loop waiting for the lock.
That might sound like an awful idea compared to the nice
power-efficient wait state that a thread can enter into when blocking
on a monitor. And often, it’s exactly as bad an idea as it sounds. The
majority of the time you won’t want to use SpinLock
. But it offers one possible
advantage: because it never falls back onto the OS scheduler, it’s a
more lightweight construct than a monitor, and so
if you very rarely encounter contention for a
particular lock, it might be cheaper. (And if the contention does
occur but is extremely short-lived, it’s possible that on a multicore
system the cost of spinning very briefly might actually be lower than
the cost of getting the scheduler involved. In general, spinlock
contention on a single-processor system is bad, although the SpinLock
implementation
mitigates this a little by yielding[44] when it fails to acquire the lock on a single-processor
machine.)
SpinLock
is a value type—a
struct. This contributes to its lightweight nature, as it can live
inside other objects, rather than needing its own space on the heap.
Of course, this also means you need to be careful not to assign a
SpinLock
into a local variable,
because you’d end up making a copy, and locking that copy would not
be useful.
Never use SpinLock
without
comparative performance tests: test all the performance metrics you
care about using ordinary monitors and compare that against how the
same test suite works when you replace a particular lock with a
SpinLock
, and consider switching
only if the tests demonstrate a clear benefit. If you don’t have a
test infrastructure capable of verifying that you meet all of your
performance requirements, or if you don’t have quantitative, clearly
specified performance requirements, your project is not ready for
SpinLock
.
For some reason, a lot of developers just love to be armchair performance tuners. An astounding amount of time and effort is wasted on mailing lists, on web forums, and in internal company meetings on heated debates over which constructs are theoretically faster than others. Sadly, empirical testing of any kind rarely enters into the equation in such discussions.
If someone tries to claim by logical argument alone that a
particular technique is faster, be highly suspicious of her claims.
Exotic and specialized synchronization primitives such as SpinLock
seem to bring out the worst in
these people. (And that’s the main reason we’re even mentioning
it—sooner or later you’ll have to deal with someone who has become
obsessed with finding a way to use SpinLock
.) Testing and measurement is the
only reliable path to performance.
Earlier in this chapter, we suggested immutability as a way to avoid having to synchronize access to your data—in the .NET Framework, it’s safe for any number of threads to read the same data simultaneously as long as no threads are trying to modify that data at the same time. Sometimes you may find yourself in the frustrating situation of having data that is almost read-only—perhaps a website contains a message of the day which, presumably, changes only once a day, but which you may need to incorporate into each of the hundreds of web pages your busy website serves up every second of the day.
The monitor’s mutually exclusive locking—where only one thread at a time gets to acquire a lock—seems ill-suited to this scenario. You could find that this shared data becomes a bottleneck—all threads are taking it in turns to access it, even though that really needs to happen only once a day. This is where reader/writer locks come in.
The idea behind a reader/writer lock is that when you acquire the lock, you declare whether you need to modify the data or just read it. The lock will allow any number of threads to get a read lock simultaneously, but if a thread tries to get a write lock, it’ll have to wait until all the current readers are done, and then, once a thread has a write lock, other threads attempting to get a lock of any kind will be made to wait until the write lock has been released. In other words, this kind of lock supports any number of simultaneous readers, but writers get exclusive access. (The precise details are, as ever, a little more complex, as a lock of this kind has to avoid the scenario where a never-ending stream of readers causes a writer to wait forever. New readers may be made to wait even when other readers are currently active simply to clear the decks for a waiting writer.)
While this sounds good in theory, the practical benefits can sometimes fall short of what theory might suggest. You shouldn’t even contemplate using a reader/writer lock unless you are seeing performance problems with a simpler monitor-based solution. This kind of lock is more complex, and so it’s quite possible you’ll end up making things slower, particularly in cases where you weren’t seeing much contention. (The example we gave of a website with a message of the day is quite likely to fall into that category. If the message is just a string, how long does it take to get hold of a reference to a string, really? Even with hundreds of requests per second, the chances of contention are probably pretty small.)
It doesn’t help that the first implementation offered by the
.NET Framework—the ReaderWriterLock
—was, frankly, not very
good. Your monitor-based solution had to be in a world of pain before
ReaderWriterLock
looked preferable.
Unfortunately, some of the problems of this lock couldn’t be fixed
without risking breaking existing code, so .NET 3.5 introduced a much
better replacement, ReaderWriterLockSlim
. If you need
reader/writer locking, you should use this newer variant unless you
absolutely have to support older versions of .NET. Be aware that
unlike ReaderWriterLock
, ReaderWriterLockSlim
implements IDisposable
, so you need to arrange for it
to be disposed at the right time. This means that if you use it as an
implementation detail of a class, your class will probably need to
implement IDisposable
too.
The Mutex
class
provides a similar style of locking to monitor. The name is short for
mutually exclusive indicating that only one
thread can acquire the lock at any time. A mutex is significantly more
expensive than a monitor because it always gets the OS scheduler
involved. And it does that because mutexes work across process boundaries—you can create a Mutex
object with a name, and if another
process in the same Windows login session creates another Mutex
object with the same name, that object
refers to the same underlying Windows synchronization object. So to
acquire a Mutex
, you don’t merely
have to be the only thread in your application in possession of the
lock; you will be the only thread on the whole login session in
possession of the lock. (You can even make a global mutex that spans
all login sessions, meaning that yours will be the only thread on the
entire machine in possession of the lock.)
If you create a mutex without a name, it will be local to the
process, but it still relies on the OS because a Mutex
is essentially a wrapper around a
mutex object provided by the Windows kernel.
Monitors are not just for locking, of course—they offer coordination facilities through pulsing and waiting. And the .NET Framework offers some slightly more specialized types for coordination too.
Events provide a very similar service to the WaitForIt
class we built in Example 16-14—an event is
effectively a Boolean variable you can wait on. And rather than being a
simple one-shot mechanism as in Example 16-14, an event can go back and
forth between its two states.
.NET offers ManualResetEvent
and AutoResetEvent
classes. The
latter automatically reverts to
its default state when letting waiting threads go, whereas the manual
one remains in its so-called signaled state until
you explicitly reset it.
AutoResetEvent
can be
problematic. There isn’t necessarily any correspondence between the
number of times you signal it and the number of times it releases
threads. If you signal it twice in a row when no threads are waiting,
it doesn’t keep count—it’s just as signaled after the second signal as
it was after the first one. This can lead to bugs where you
occasionally miss signals, and your code can grind to a halt. Approach
with caution.
These types are wrappers around the underlying Windows event
synchronization primitive, so as with Mutex
, you can use events for cross-process
coordination. And of course, this also means that you incur the cost of
getting the OS scheduler involved.
.NET 4 introduces an alternative called ManualResetEventSlim
. This will use
busy-waiting techniques similar to
a spinlock for short waits. So just like the Monitor
, it gets the scheduler involved only
when a wait is necessary. Therefore, unless you really need the extra
features available from the nonslim version (e.g., you need
cross-process synchronization) the ManualResetEventSlim
class is a better choice
than ManualResetEvent
if you’re using
.NET 4 or later.
.NET 4 introduces the CountdownEvent
class, which provides a handy
solution to a fairly common problem: knowing when you’re done. Remember
back in Example 16-6, we
ran into an issue with the thread pool. We queued up a couple of pieces
of work, but we had no way of knowing when they were done. One solution
to that would be to use the Task Parallel Library, which we’ll get to
shortly, but an alternative would have been to use the CountdownEvent
class.
CountdownEvent
is very simple.
For each piece of work you start, you call AddCount
(or if you know how many pieces of
work there will be up front, you can pass that number into the
constructor). For each piece of work that completes you call Signal
. And if you need to wait for all
outstanding work to complete (e.g., before your program exits), just
call Wait
.
The System.Collections.Concurrent
namespace
provides various collection classes that are designed to be used in
multithreaded environments. They look a little different from the normal
collection classes because they are designed to be used without needing
any locking, which means they can’t offer features that rely on things
staying consistent from one moment to the next. Numerical indexing is
out, for example, because the number of items in the collection may
change, as we saw when trying to use List<T>
in a multithreaded fashion in
Example 16-11. So these are not
thread-safe versions of normal collection classes—they are collections
whose APIs are designed to support multithreaded use without the caller
needing to use locks.
BlockingCollection
is not just
a multithreaded collection; it also offers associated coordination. It
provides a way for threads to sit and wait for items to become available
in the collection. Its Take
method will block
if the collection is empty. Once data becomes available, Take
will return one item. Any number of
threads may be waiting inside Take
at
any time, and other threads are free to call Add
. If you Add
enough items that all the threads waiting
to Take
are satisfied, and then you
keep calling Add
, that’s when items
start to get added to the collection. And if the collection is nonempty,
calls to Take
will return
immediately.
This allows you to have one or more threads dedicated to
processing work items generated by
other threads. The BlockingCollection
acts as a kind of buffer—if you generate items faster than you process
them, they will queue up in the BlockingCollection
, and if the processing
threads catch up, they will block efficiently until more items come
along.
You could use this in a WPF application that needs to do slow work in the background—the UI thread could add work into a blocking collection, and then one or more worker threads could take items from the collection and process them. This is not hugely different from using the thread pool, but it gives you the opportunity to limit the number of worker threads—if you had just a single thread that performs background work, you might be able to get away with much simpler synchronization code, because all your background work is always done by the same thread.
We’ve looked at how to create threads explicitly and at the tools available for ensuring that our programs function correctly in a multithreaded world. Next we’re going to look at asynchronous programming models, where we don’t explicitly create new threads. We will need the locking and synchronization techniques we just explored because we are still working in a concurrent world; it’s just a slightly different programming style.
Some things are intrinsically slow. Reading all of the audio data off a CD, downloading a large file from a server at the end of a low-bandwidth connection on the opposite side of the world, or playing a sound—all of these processes have constraints that mean they’ll take a long time to complete, maybe seconds, minutes, or even hours. How should these sorts of operations look to the programmer?
One simple answer is that they don’t have to look different than
faster operations. Our code consists of a sequence of statements—one thing
after another—and some statements take longer than others. This has the
useful property of being easy to understand. For example, if our code
calls the WebClient
class’s DownloadString
method,
our program doesn’t move on to the next step until the download is
complete, and so we can know not just what our code does, but also the
order in which it does it.
This style of API is sometimes described as synchronous—the time at which the API returns is determined by the time at which the operation finishes; execution progresses through the code in sync with the work being done. These are also sometimes known as blocking APIs, because they block the calling thread from further progress until work is complete.
Blocking APIs are problematic for user interfaces because the blocked thread can’t do anything else while slow work is in progress. Thread affinity means that code which responds to user input has to run on the UI thread, so if you’re keeping that thread busy, the UI will become unresponsive. It’s really annoying to use programs that stop responding to user input when they’re working—these applications seem to freeze anytime something takes too long, making them very frustrating to use. Failing to respond to user input within 100 ms is enough to disrupt the user’s concentration. (And it gets worse if your program’s user interface uses animation—the occasional glitch of just 15 ms is enough to make a smooth animation turn into something disappointingly choppy.)
Threads offer one solution to this: if you do all your potentially slow work on threads that aren’t responsible for handling user input, your application can remain responsive. However, this can sometimes seem like an overcomplicated solution—in a lot of cases, slow operations don’t work synchronously under the covers. Take fundamental operations such as reading and writing data from and to devices such as network cards or disks, for example. The kernel-mode device drivers that manage disk and network I/O are instructed by the operating system to start doing some work, and the OS expects the driver to configure the hardware to perform the necessary work and then return control to the operating system almost immediately—on the inside, Windows is built around the assumption that most slow work proceeds asynchronously, that there’s no need for code to progress strictly in sync with the work.
This asynchronous model is not limited to the internals of Windows—there are asynchronous public APIs. These typically return very quickly, long before the work in question is complete, and you then use either a notification mechanism or polling to discover when the work is finished. The exact details vary from one API to another, but these basic principles are universal. Many synchronous APIs really are just some code that starts an asynchronous operation and then makes the thread sleep until the operation completes.
An asynchronous API sounds like a pretty good fit for what we need to build responsive interactive applications.[45] So it seems somewhat ludicrous to create multiple threads in order to use synchronous APIs without losing responsiveness, when those synchronous APIs are just wrappers on top of intrinsically asynchronous underpinnings. Rather than creating new threads, we may as well just use asynchronous APIs directly where they are available, cutting out the middle man.
.NET defines two common patterns for asynchronous operations. There’s a low-level pattern which is powerful and corresponds efficiently to how Windows does things under the covers. And then there’s a slightly higher-level pattern which is less flexible but considerably simpler to use in GUI code.
The Asynchronous Programming Model (APM) is a pattern that many asynchronous APIs in the .NET Framework conform to. It defines common mechanisms for discovering when work is complete, for collecting the results of completed work, and for reporting errors that occurred during the asynchronous operation.
APIs that use the APM offer pairs of methods, starting with
Begin
and End
. For example, the
Socket
class in the System.Net.Sockets
namespace offers numerous
instances of this pattern: BeginAccept
and EndAccept
, BeginSend
and EndSend
, BeginConnect
and EndConnect
, and so on.
The exact signature of the Begin
method depends on what it does. For
example, a socket’s BeginConnect
needs the address to which you’d like to connect, whereas BeginReceive
needs to
know where you’d like to put the data and how much you’re ready to
receive. But the APM requires all Begin
methods to have the same final two
parameters: the method must take an AsyncCallback
delegate and an object
. And it also requires the method to
return an implementation of the IAsyncResult
interface. Here’s an example from
the Dns
class in System.Net
:
public static IAsyncResult BeginGetHostEntry( string hostNameOrAddress, AsyncCallback requestCallback, object stateObject )
Callers may pass a null
AsyncCallback
. But if they pass a non-null reference, the type
implementing the APM is required to invoke the callback once the
operation is complete. The AsyncCallback
delegate signature requires the
callback method to accept an IAsyncResult
argument—the APM implementation
will pass in the same IAsyncResult
to
this completion callback as it returns from the Begin
method. This object represents an
asynchronous operation in progress—many classes can have multiple
operations in progress simultaneously, and the IAsyncResult
distinguishes between
them.
Example 16-16 shows
one way to use this pattern. It calls the asynchronous BeginGetHostEntry
method
provided by the Dns
class. This looks
up the IP address for a computer, so it takes a string—the name of the
computer to find. And then it takes the two standard final APM
arguments—a delegate and an object. We can pass anything we like as the
object—the function we call doesn’t actually use it, it just hands it
back to us later. We could pass null
because our example doesn’t need the argument, but we’re passing a
number just to demonstrate where it comes out. The reason the APM offers
this argument is so that if you have multiple simultaneous asynchronous
operations in progress at once, you have a convenient way to associate
information with each operation. (This mattered much more in older
versions of C#, which didn’t offer anonymous methods or lambdas—back
then this argument was the easiest way to pass data into the
callback.)
Example 16-16. Using the Asynchronous Programming Model
class Program { static void Main(string[] args) { Dns.BeginGetHostEntry("oreilly.com", OnGetHostEntryComplete, 42); Console.ReadKey(); } static void OnGetHostEntryComplete(IAsyncResult iar) { IPHostEntry result = Dns.EndGetHostEntry(iar); Console.WriteLine(result.AddressList[0]); Console.WriteLine(iar.AsyncState); } }
The Main
method waits until a
key is pressed—much like with work items in the thread pool, having
active asynchronous requests will not keep the process alive, so the
program would exit before finishing its work without that ReadKey
. (A more robust approach for a real
program that needed to wait for work to complete would be to use the
CountdownEvent
described
earlier.)
The Dns
class will call the
OnGetHostEntryComplete
method once it has finished its lookup. Notice that the first thing we
do is call the EndGetHostEntry
method—the other half of the APM. The End
method always takes
the IAsyncResult
object corresponding
to the call—recall that this identifies the call in progress, so this is
how EndGetHostEntry
knows which
particular lookup operation we want to get the results for.
The APM says nothing about which thread your callback will be
called on. In practice, it’s often a thread pool thread, but not
always. Some individual implementations might make guarantees about
what sort of thread you’ll be called on, but most don’t. And since you
don’t usually know what thread the callback occurred on, you will need
to take the same precautions you would when writing multithreaded code
where you explicitly create new threads. For example, in a WPF or
Windows Forms application, you’d need to use the SynchronizationContext
class or an
equivalent mechanism to get back to a UI thread if you wanted to make
updates to the UI when an asynchronous operation completes.
The End
method in the APM
returns any data that comes out of the operation. In this case, there’s
a single return value of IPHostEntry
,
but some implementations may return more by having out
or ref
arguments. Example 16-16
then prints the results, and finally prints the AsyncState
property of the IAsyncResult
, which will be 42—this is where
the value we passed as the final argument to BeginGetHostEntry
pops out.
This is not the only way to use the Asynchronous Programming
Model—you are allowed to pass null
as
the delegate argument. You have three other options, all revolving
around the IAsyncResult
object
returned by the Begin
call. You can
poll the IsCompleted
property to test for
completion. You can call the End
method at any time—if the work is
not finished this will block until it completes.[46] Or you can use the AsyncWaitHandle
property—this returns an
object that is a wrapper around a Win32 synchronization handle that will
become signaled when the work is complete. (That last one is rarely
used, and has some complications regarding ownership and lifetime of the
handle, which are described in the MSDN documentation. We mention this
technique only out of a pedantic sense of duty to completeness.)
You are required to call the End
method at some point, no matter how you
choose to wait for completion. Even if you don’t care about the
outcome of the operation you must still call the End
method. If you don’t, the operation
might leak resources.
Asynchronous operations can throw exceptions. If the exception is
the result of bad input, such as a null reference where an object is
required, the Begin
method will throw
an exception. But it’s possible that something failed while the
operation was in progress—perhaps
we lost network connectivity partway through some work. In this case,
the End
method will throw an
exception.
The Asynchronous Programming Model is widely used in the .NET
Framework class library, and while it is an efficient and flexible way
to support asynchronous operations, it’s slightly awkward to use in user
interfaces. The completion callback typically happens on some random
thread, so you can’t update the UI in that callback. And the support for
multiple simultaneous operations, possible because each operation is
represented by a distinct IAsyncResult
object, may be useful in server
environments, but it’s often just an unnecessary complication for
client-side code. So there’s an alternative pattern better suited to the
UI.
Some classes offer an alternative pattern for asynchronous
programming. You start an operation by calling a method whose name
typically ends in Async; for example, the WebClient
class’s DownloadDataAsync
method. And unlike the APM, you do not pass a delegate to the method.
Completion is indicated through an event, such as the DownloadDataCompleted
event. Classes that implement this pattern are required to use the
SynchronizationContext
class (or the
related AsyncOperationManager
) to
ensure that the event is raised in the same context in which the
operation was started. So in a user interface, this means that
completion events are raised on the UI thread.
This is, in effect, a single-threaded asynchronous model. You have the responsiveness benefits of asynchronous handling of slow operations, with fewer complications than multithreaded code. So in scenarios where this pattern is an option, it’s usually the best choice, as it is far simpler than the alternatives. It’s not always available, because some classes offer only the APM. (And some don’t offer any kind of asynchronous API, in which case you’d need to use one of the other multithreading mechanisms in this chapter to maintain a responsive UI.)
Single-threaded asynchronous code is more complex than sequential code, of course, so there’s still scope for trouble. For example, you need to be careful that you don’t attempt to set multiple asynchronous operations in flight simultaneously that might conflict. Also, components that implement this pattern call you back on the right thread only if you use them from the right thread in the first place—if you use a mixture of this pattern and other multithreading mechanisms, be aware that operations you kick off from worker threads will not complete on the UI thread.
There are two optional features of the event-based asynchronous
model. Some classes also offer progress change notification events, such
as the WebClient
class’s DownloadProgressChanged
event. (Such events
are also raised on the original thread.) And there may be cancellation
support. For example, WebClient
offers a CancelAsync
method.
There’s no fundamental need for code to use either the APM or the
event-based asynchronous pattern. These are just conventions. You will
occasionally come across code that uses its own unusual solution for
asynchronous operation. This can happen when the design of the code in
question is constrained by external influences—for example, the System.Threading
namespace defines an Overlapped
class that provides a managed
representation of a Win32 asynchronous mechanism. Win32 does not have
any direct equivalent to either of the .NET asynchronous patterns, and
just tends to use function pointers for callbacks. .NET’s Overlapped
class mimics this by accepting a
delegate as an argument to a method. Conceptually, this isn’t very
different from the APM, it just happens not to conform exactly to the
pattern.
The standard asynchronous patterns are useful, but they are somewhat low-level. If you need to coordinate multiple operations, they leave you with a lot of work to do, particularly when it comes to robust error handling or cancellation. The Task Parallel Library provides a more comprehensive scheme for working with multiple concurrent operations.
.NET 4 introduces the Task Parallel Library (TPL), a set of
classes in the System.Threading.Tasks
namespace that help coordinate concurrent work. In some respects, the TPL
superficially resembles the thread pool, in that you submit small work
items (or tasks) and the TPL will take care of
working out how many threads to use at once in order to run your work. But
the TPL provides various services not available through direct use of the
thread pool, especially in the areas of error handling, cancellation, and
managing relationships between tasks.
You can associate tasks with one another—for example, tasks can have a parent-child relationship, which provides a way to wait until all of the tasks related to some higher-level operation are complete. You can also arrange for the completion of one task to kick off another related task.
Error handling gets tricky with asynchronous and concurrent code—what do you do if you’ve built an operation out of 20 related concurrent tasks, and just one of them fails while the rest are either in progress, already complete, or not yet started? The TPL provides a system for bringing work to an orderly halt, and collecting in a single place all of the errors that occurred.
The mechanisms required to halt work in the event of an error are also useful if you want to be able to stop work in progress for some other reason, such as when the user clicks a Cancel button.
We’ll start by looking at the most important concept in the TPL, which is, unsurprisingly, the task.
A task is some work your program needs to do.
It is represented by an instance of the Task
class, defined in the System.Threading.Tasks
namespace. This does
not define exactly how the work will be done. The work for a task might
be a method to be executed, but a task could also involve asynchronous
work that executes without needing to tie up a thread—the TPL has
support for creating task objects that work with APM implementations,
for instance. Example 16-17 shows how to
create new tasks that execute code.
Example 16-17. Executing code with tasks
using System; using System.Threading.Tasks; namespace TplExamples { class Program { static void Main(string[] args) { Task.Factory.StartNew(Go, "One"); Task.Factory.StartNew(Go, "Two"); Console.ReadKey(); } static void Go(object name) { for (int i = 0; i < 100; ++i) { Console.WriteLine("{0}: {1}", name, i); } } } }
The Task
class provides a
static Factory
property that returns
a TaskFactory
object, which can be
used to create new tasks. The TPL defines the TaskFactory
abstraction so that it’s possible
to plug in different task creation strategies. The default factory
returned by Task.Factory
creates new
tasks that execute code on the thread pool, but it’s possible to create
factories that do something else. For example, you could make a task
factory that creates tasks that will run on a UI thread.
A factory’s StartNew
method creates
a code-based task. You pass it a delegate—it’ll accept either a method
with no arguments or a method that takes a single object as an argument.
If you want to pass more arguments, you can use the same lambda-based
trick we saw in Example 16-4.
Example 16-18 uses this to pass
two arguments to Go
, while using the
overload of StartNew
that takes a
zero-argument method. (The empty ()
tells C# to build a zero-argument lambda, which becomes the method
StartNew
invokes.)
Example 16-18. Passing more arguments with lambdas
static void Main(string[] args) { Task.Factory.StartNew(() => Go("One", 100)); Task.Factory.StartNew(() => Go("Two", 500)); Console.ReadKey(); } static void Go(string name, int iterations) { for (int i = 0; i < iterations; ++i) { Console.WriteLine("{0}: {1}", name, i); } }
These last two examples look pretty similar to the thread pool
examples from earlier, and they suffer from the same problem: they don’t
know when the work is complete, so we’ve used the dubious solution of
waiting for the user to press a key so that the program doesn’t exit
until the work is done. Fortunately, tasks provide a much better
solution to this: we can wait until they are finished. Task
provides a Wait
method that blocks
until the task is complete. This is an instance method, so we’d call it
once for each task. There’s also a static WaitAll
method that
takes an array of Task
objects and
blocks until they are all complete, illustrated in Example 16-19. (This method uses the params
modifier on its one argument, so we can
just pass each task as though it were a separate argument. The C#
compiler will take the two tasks Example 16-19 passes
to WaitAll
and wrap them in an array
for us.)
Example 16-19. Task.WaitAll
static void Main(string[] args) { Task t1 = Task.Factory.StartNew(() => Go("One", 100)); Task t2 = Task.Factory.StartNew(() => Go("Two", 500)); Task.WaitAll(t1, t2); }
Alternatively, we could create a parent task that contains both of these tasks as children.
If you write a code-based task that creates new tasks
from within an existing task, you can make those new tasks children of
the task in progress. Example 16-20 creates
the same two tasks as the previous examples, but does so inside
another task, passing in the AttachedToParent
member of the TaskCreateOptions
enumeration to establish
the parent-child relationship.
Example 16-20. Task with children
static void Main(string[] args) { Task t = Task.Factory.StartNew(() => { Task.Factory.StartNew(() => Go("One", 100), TaskCreationOptions.AttachedToParent); Task.Factory.StartNew(() => Go("Two", 500), TaskCreationOptions.AttachedToParent); }); t.Wait(); }
Notice that this example calls Wait
only on the parent task. For a task to
be considered complete, not only must it have finished running, but so
must all of its children. (And that means that if the children have
children of their own, those all have to complete too.) So there’s no
need to list all the tasks and pass them to WaitAll
if there’s a single top-level task
that all the rest descend from.
Although code-based tasks are superficially similar to thread pool work items, the TPL is designed to let you use much smaller tasks than would work efficiently when using the thread pool directly. The TPL encourages fine-grained concurrency—the idea is that you provide it with a large number of small work items, which gives it plenty of freedom to work out how to allocate that work across logical processors. This is sometimes described as overexpression of concurrency. The theory is that as newer computers come out with more and more logical processors, code that overexpresses its concurrency will be able to take advantage of the higher logical processor count.
The TPL uses the CLR thread pool internally, so it might seem
surprising that the TPL is able to handle small work items more
efficiently, but the TPL provides access to some features added to the
thread pool in .NET 4, which you can’t use with the ThreadPool
class. The ThreadPool
class typically starts work in
the order you queued it up, so it’s a FIFO (first
in, first out) queue. (This is absolutely not guaranteed by the
documentation, but the fact that the ThreadPool
has behaved this way for years
means that changing this behavior would doubtless break lots of code.)
But when you set up work as Task
objects the thread pool works differently. Each logical processor gets
its own separate queue, and typically processes tasks in its queue in
LIFO (last in, first out) order. This turns out
to be far more efficient in a lot of scenarios, particularly when the
work items are small. This ordering is not strict, by the way; if one
logical processor manages to empty its work queue while others still
have plenty to do, the idle processor may steal
work from another processors, and will do so from the back end of its
queue. (If you’re wondering about the rationale behind how the thread
pool orders tasks, see the sidebar below.)
The examples we’ve seen so far simply perform work and return no results. But a task can produce a result.
The Task<TResult>
class
derives from Task
and adds a
Result
property which, once the
task is complete, contains the result produced by the task. Task<TResult>
represents a concept
known in some concurrent programming literature as a future—it represents some work that
will produce a result at some point.
The TaskFactory.StartNew
method we’ve already used can create either kind of task—it has
overloads that accept methods that return a result. (So you can pass a
Func<TResult>
or Func<object, TResult>
, instead of the
Action
or Action<object>
passed in the previous
examples.) These overloads return a Task<TResult>
. (Alternatively, you can
call StartNew
on the Task<TResult>.Factory
static
property.)
You can start a Task<TResult>
and then call Wait
to wait for the result, or you can read
the Result
property, which will
call Wait
for you if the result is
not yet available. But blocking until the task is complete may not be
especially useful—it’s just a very roundabout way of invoking the code
synchronously. In fact, you might sometimes want to do this—you might
create multiple child tasks and then wait for all of them to complete,
and you’d be able to take advantage of the TPL’s common
exception-handling framework to manage any errors. However, it’s often
useful to be able to provide some sort of callback method to be
invoked once the task completes, rather than blocking. You can do this
with a continuation.
A continuation is a task that gets invoked when another
tasks completes.[47] The Task
class
provides a ContinueWith
method
that lets you provide the code for that continuation task. It requires
a delegate that takes as its single argument the task that just
completed. ContinueWith
offers
overloads that allow the delegate to return a value (in which case the
continuation task will be another Task<TResult>
), or not to return a
value (in which case the continuation task will just be a Task
). ContinueWith
returns the Task
object that represents the
continuation. So you can string these things together:
static void Main(string[] args) { Task t = Task.Factory.StartNew(() => Go("One", 100)) .ContinueWith(t1 => Go("Two", 500)) .ContinueWith(t2 => Go("Three", 200)); t.Wait(); }
This will execute the three tasks one after another. Notice that
the t
variable here refers to the
third task—the final continuation. So t.Wait()
will wait until all the tasks are
complete—it doesn’t need to wait for the first two because the final
task can’t even start until the others are finished; waiting for the
last task implicitly means waiting for all three here.
Continuations are slightly more interesting when the initial task produces a result—the continuation can then do something with the output. For example, you might have a task that fetches some data from a server and then have a continuation that puts the result into the user interface. Of course, we need to be on the correct thread to update the UI, but the TPL can help us with this.
The TaskScheduler
class is responsible for working out when and how to execute tasks. If
you don’t specify a scheduler, you’ll end up with the default one,
which uses the thread pool. But you can provide other schedulers when
creating tasks—both StartNew
and ContinueWith
offer overloads that accept a
scheduler. The TPL offers a scheduler that uses the SynchronizationContext
, which can run tasks
on the UI thread. Example 16-21 shows
how to use this in an event handler in a WPF application.
Example 16-21. Continuation on a UI thread
void OnButtonClick(object sender, RoutedEventArgs e) { TaskScheduler uiScheduler = TaskScheduler.FromCurrentSynchronizationContext(); Task<string>.Factory.StartNew(GetData) .ContinueWith((task) => UpdateUi(task.Result), uiScheduler); } string GetData() { WebClient w = new WebClient(); return w.DownloadString("http://oreilly.com/"); } void UpdateUi(string info) { myTextBox.Text = info; }
This example creates a task that returns a string, using the
default scheduler. This task will invoke the GetData
function on a thread pool thread.
But it also sets up a continuation using a TaskScheduler
that was obtained by calling
FromCurrentSynchronizationContext
.
This grabs the SynchronizationContext
class’s Current
property and returns a scheduler
that uses that context to run all tasks. Since the continuation
specifies that it wants to use this scheduler, it will run the
UpdateUi
method on
the UI thread.
The upshot is that GetData
runs on a thread pool thread, and then its return value is passed into
UpdateUi
on the UI thread.
We could use a similar trick to work with APM implementations, because task factories provide methods for creating APM-based tasks.
TaskFactory
and
TaskFactory<TResult>
provide
various overloads of a FromAsync
method. You
can pass this the Begin
and End
methods from an
APM implementation, along with the arguments you’d like to pass, and
it will return a Task
or Task<TResult>
that executes the
asynchronous operation, instead of one that invokes a delegate. Example 16-22 uses this to
wrap the asynchronous methods we used from the Dns
class in earlier examples in a
task.
Example 16-22. Creating a task from an APM implementation
TaskScheduler uiScheduler = TaskScheduler.FromCurrentSynchronizationContext(); Task<IPHostEntry>.Factory.FromAsync( Dns.BeginGetHostEntry, Dns.EndGetHostEntry, "oreilly.com", null) .ContinueWith((task) => UpdateUi(task.Result.AddressList[0].ToString()), uiScheduler);
FromAsync
offers overloads
for versions of the APM that take zero, one, two, or three arguments,
which covers the vast majority of APM implementations. As well as
passing the Begin
and End
methods, we also pass the arguments, and
the additional object argument that all APM Begin
methods accept. (For the minority of
APM implementations that either require more arguments or have
out
or ref
parameters, there’s an overload of
FromAsync
that accepts an IAsyncResult
instead. This requires slightly
more code, but enables you to wrap any APM implementation as a
task.)
We’ve seen the main ways to create tasks, and to set up associations between them either with parent-child relationships or through continuations. But what happens if you want to stop some work after you’ve started it? Neither the thread pool nor the APM supports cancellation, but the TPL does.
Cancellation of asynchronous operations is surprisingly tricky. There are lots of awkward race conditions to contend with. The operation you’re trying to cancel might already have finished by the time you try to cancel it. Or if it hasn’t it might have gotten beyond the point where it is able to stop, in which case cancellation is doomed to fail. Or work might have failed, or be about to fail when you cancel it. And even when cancellation is possible, it might take awhile to do. Handling and testing every possible combination is difficult enough when you have just one operation, but if you have multiple related tasks, it gets a whole lot harder.
Fortunately, .NET 4 introduces a new cancellation model that provides a well thought out and thoroughly tested solution to the common cancellation problems. This cancellation model is not limited to the TPL—you are free to use it on its own, and it also crops up in other parts of the .NET Framework. (The data parallelism classes we’ll be looking at later can use it, for example.)
If you want to be able to cancel an operation, you must pass it a
CancellationToken
. A cancellation
token allows the operation to discover whether the operation has been
canceled—it provides an IsCancellationRequested
property—and it’s also
possible to pass a delegate to its Register
method in
order to be called back if cancellation happens.
CancellationToken
only provides
facilities for discovering that cancellation has been requested. It does
not provide the ability to initiate cancellation. That is provided by a
separate class called CancellationTokenSource
. The reason for
splitting the discovery and control of cancellation across two types is
that it would otherwise be impossible to provide a task with
cancellation notifications without also granting that task the
capability of initiating cancellation. CancellationTokenSource
is a factory of
cancellation tokens—you ask it for a token and then pass that into the
operation you want to be able to cancel. Example 16-23 is similar to Example 16-21, but it passes a cancellation
token to StartNew
, and then uses the
source to cancel the operation if the user clicks a Cancel
button.
Example 16-23. Ineffectual cancellation
private CancellationTokenSource cancelSource; void OnButtonClick(object sender, RoutedEventArgs e) { cancelSource = new CancellationTokenSource(); TaskScheduler uiScheduler = TaskScheduler.FromCurrentSynchronizationContext(); Task<string>.Factory.StartNew(GetData, cancelSource.Token) .ContinueWith((task) => UpdateUi(task.Result), uiScheduler); } void OnCancelClick(object sender, RoutedEventArgs e) { if (cancelSource != null) { cancelSource.Cancel(); } } string GetData() { WebClient w = new WebClient(); return w.DownloadString("http://oreilly.com/"); } void UpdateUi(string info) { cancelSource = null; myTextBox.Text = info; }
In fact, cancellation isn’t very effective in this example because this particular task consists of code that makes a single blocking method call. Cancellation will usually do nothing here in practice—the only situation in which it would have an effect is if the user managed to click Cancel before the task had even begun to execute. This illustrates an important issue: cancellation is never forced—it uses a cooperative approach, because the only alternative is killing the thread executing the work. And while that would be possible, forcibly terminating threads tends to leave the process in an uncertain state—it’s usually impossible to know whether the thread you just zapped happened to be in the middle of modifying some shared state. Since this leaves your program’s integrity in doubt, the only thing you can safely do next is kill the whole program, which is a bit drastic. So the cancellation model requires cooperation on the part of the task in question. The only situation in which cancellation would have any effect in this particular example is if the user managed to click the Cancel button before the task had even begun.
If you have divided your work into numerous relatively short
tasks, cancellation is more useful—if you cancel tasks that have been
queued up but not yet started, they will never run at all. Tasks already
in progress will continue to run, but if all your tasks are short, you
won’t have to wait long. If you have long-running tasks, however, you
will need to be able to detect cancellation and act on it if you want to
handle cancellation swiftly. This means you will have to arrange for the
code you run as part of the tasks to have access to the cancellation
token, and they must test the IsCancellationRequested
property from time to
time.
Cancellation isn’t the only reason a task or set of tasks might stop before finishing—things might be brought to a halt by exceptions.
A task can complete in one of three ways: it can run to
completion, it can be canceled, or it can fault.
The Task
object’s TaskStatus
property reflects this through
RanToCompletion
, Canceled
, and Faulted
values, respectively, and if the task
enters the Faulted
state, its
IsFaulted
property also becomes
true
. A code-based task will enter
the Faulted
state if its method
throws an exception. You can retrieve the exception information from the
task’s Exception
property. This
returns an AggregateException
, which
contains a list of exceptions in its InnerExceptions
property. It’s a list because
certain task usage patterns can end up hitting multiple exceptions; for
example, you might have multiple failing child tasks.
If you don’t check the IsFaulted
property and instead just attempt to
proceed, either by calling Wait
or by
attempting to fetch the Result
of a
Task<TResult>
, the AggregateException
will be thrown back into
your code.
It’s possible to write code that never looks for the exception.
Example 16-17 starts two tasks, and since
it ignores the Task
objects returned
by StartNew
, it clearly never does
anything more with the tasks. If they were children of another task that
wouldn’t matter—if you ignore
exceptions in child tasks they end up causing the parent task to fault.
But these are not child tasks, so if exceptions occur during their
execution, the program won’t notice. However, the TPL tries hard to make
sure you don’t ignore such exceptions—it uses a feature of the garbage
collector called finalization to discover when a
Task
that faulted is about to be
collected without your program ever having noticed the exception. When
it detects this, it throws the AggregateException
, which will cause your
program to crash unless you’ve configured your process to deal with
unhandled exceptions. (The .NET Framework runs all finalizers on a
dedicated thread, and it’s this thread that the TPL throws the exception
on.) The TaskScheduler
class offers
an UnobservedTaskException
event that
lets you customize the way these unhandled exceptions are dealt
with.
The upshot is that you should write error handling for any
nonchild tasks that could throw. One way to do this is to provide a
continuation specifically for error handling. The ContinueWith
method
takes an optional argument whose type is the TaskContinuationOptions
enumeration, which has
an OnlyOnFaulted
value—you could use
this to build a continuation that will run only when an unanticipated
exception occurs. (Of course, unanticipated exceptions are always bad
news because, by definition, you weren’t expecting them and therefore
have no idea what state your program is in. So you probably need to
terminate the program, which is what would have happened anyway if you
hadn’t written any error handling. However, you do get to write errors
to your logs, and perhaps make an emergency attempt to write out unsaved
data somewhere in the hope of recovering it when the program restarts.)
But in general, it’s preferable to handle errors by putting normal
try
...catch
blocks inside your code so that the
exceptions never make it out into the TPL in the first place.
The final concurrency feature we’re going to look at is data parallelism. This is where concurrency is driven by having lots of data items, rather than by explicitly creating numerous tasks or threads. It can be a simple approach to parallelism because you don’t have to tell the .NET Framework anything about how you want it to split up the work.
With tasks, the .NET Framework has no idea how many tasks you plan to create when you create the first one, but with data parallelism, it has the opportunity to see more of the problem before deciding how to spread the load across the available logical processors. So in some scenarios, it may be able to make more efficient use of the available resources.
The Parallel
class provides
a couple of methods for performing data-driven parallel execution. Its
For
and ForEach
methods are
similar in concept to C# for
and
foreach
loops, but rather than
iterating through collections one item at a time, on a system with
multiple logical processors available it will process multiple items
simultaneously.
Example 16-24 uses Parallel.For
. This code calculates pixel
values for a fractal known as the Mandelbrot set, a popular parallelism
demonstration because each pixel value can be calculated entirely
independently of all the others, so the scope for parallel execution is
effectively endless (unless machines with more logical processors than
pixels become available). And since it’s a relatively expensive
computation, the benefits of parallel execution are easy to see.
Normally, this sort of code would contain two nested for
loops, one to iterate over the rows of
pixels and one to iterate over the columns in each row. In this example,
the outer loop has been replaced with a Parallel.For
. (So this particular code cannot
exploit more processors than it calculates lines of pixels—therefore, we
don’t quite have scope for per-pixel parallelism, but since you would
typically generate an image a few hundred pixels tall, there is still a
reasonable amount of scope for concurrency here.)
Example 16-24. Parallel.For
static int[,] CalculateMandelbrotValues(int pixelWidth, int pixelHeight,
double left, double top, double width, double height, int maxIterations)
{
int[,] results = new int[pixelWidth, pixelHeight];
// Non-parallel version of following line would have looked like this:
// for(int pixelY = 0; pixelY < pixelHeight; ++pixelY)
Parallel.For(0, pixelHeight, pixelY =>
{
double y = top + (pixelY * height) / (double) pixelHeight;
for (int pixelX = 0; pixelX < pixelWidth; ++pixelX)
{
double x = left + (pixelX * width) / (double) pixelWidth;
// Note: this lives in the System.Numerics namespace in the
// System.Numerics assembly.
Complex c = new Complex(x, y);
Complex z = new Complex();
int iter;
for (iter = 1; z.Magnitude < 2 && iter < maxIterations; ++iter)
{
z = z * z + c;
}
if (iter == maxIterations) { iter = 0; }
results[pixelX, pixelY] = iter;
}
});
return results;
}
This structure, seen in the preceding code:
Parallel.For(0, pixelHeight, pixelY => { ... });
iterates over the same range as this:
for(int pixelY = 0, pixelY < pixelHeight; ++pixelY) { ... }
The syntax isn’t identical because Parallel.For
is just a method, not a language
feature. The first two arguments indicate the range—the start value is
inclusive (i.e., it will start from the specified value), but the end
value is exclusive (it stops one short of the end value). The final
argument to Parallel.For
is a
delegate that takes the iteration variable as its argument. Example 16-24 uses a lambda, whose minimal syntax introduces
the least possible extra clutter over a normal for
loop.
Parallel.For
will attempt to
execute the delegate on multiple logical processors simultaneously,
using the thread pool to attempt to make full, efficient use of the
available processors. The way it distributes the iterations across
logical processors may come as a surprise, though. It doesn’t simply
give the first row to the first logical processor, the second row to the
second logical processor, and so on. It carves the available rows into
chunks, and so the second logical processor will find itself starting
several rows ahead of the first. And it may decide to subdivide further
depending on the progress your code makes. So you must not rely on the
iteration being done in any particular order. It does this chunking to
avoid subdividing the work into pieces that are too small to handle
efficiently. Ideally, each CPU should be given work in lumps that are
large enough to minimize context switching and synchronization
overheads, but small enough that each CPU can be kept busy while there’s
work to be done. This chunking is one reason why data parallelism can
sometimes be more efficient than using tasks directly—the parallelism
gets to be exactly as fine-grained as necessary and no more so,
minimizing overheads.
Arguably, calling Example 16-24 data
parallelism is stretching a point—the “data” here is just the
numbers being fed into the calculations. Parallel.For
is no more or less data-oriented
than a typical for
loop with an
int
loop counter—it just iterates a
numeric variable over a particular range in a list. However, you could
use exactly the same construct to iterate over a range of data instead
of a range of numbers. Alternatively, there’s Parallel.ForEach
, which is very similar in use
to Parallel.For
, except, as you’d
expect, it iterates over any IEnumerable<T>
like a C# foreach
loop, instead of using a range of
integers. It reads ahead into the enumeration to perform chunking. (And
if you provide it with an IList<T>
it will use the list’s indexer
to implement a more efficient partitioning strategy.)
There’s another way to perform parallel iteration over enumerable data: PLINQ.
Parallel LINQ (PLINQ) is a LINQ provider that enables any
IEnumerable<T>
to be processed
using normal LINQ query syntax, but in a way that works in parallel. On
the face of it, it’s deceptively simple. This:
var pq = from x in someList where x.SomeProperty > 42 select x.Frob(x.Bar);
will use LINQ to Objects, assuming that someList
implements IEnumerable<T>
. Here’s the PLINQ
version:
var pq = from x in someList.AsParallel() where x.SomeProperty > 42 select x.Frob(x.Bar);
The only difference here is the addition of a call to AsParallel
, an extension method that the
ParallelEnumerable
class makes
available on all IEnumerable<T>
implementations. It’s available to any code that has brought the
System.Linq
namespace into scope with
a suitable using
declaration.
AsParallel
returns a ParallelQuery<T>
, which means that the
normal LINQ to Objects implementation of the standard LINQ operators no
longer applies. All the same operators are available, but now they’re
supplied by ParallelEnumerable
, which is able to
execute certain operators in parallel.
Not all queries will execute in parallel. Some LINQ operators essentially force things to be done in a certain order, so PLINQ will inspect the structure of your query to decide which parts, if any, it can usefully run in parallel.
Iterating over the results with foreach
can restrict the extent to which the
query can execute in parallel, because foreach
asks for items one at a time—upstream
parts of the query may still be able to execute concurrently, but the
final results will be sequential. If you’d like to execute code for each
item and to allow work to proceed in parallel even for this final
processing step, PLINQ offers a ForAll
operator:
pq.ForAll(x => x.DoSomething());
This will execute the delegate once for each item the query returns, and can do so in parallel—it will use as many logical processors concurrently as possible to evaluate the query and to call the delegate you provide.
This means that all the usual multithreading caveats apply for the
code you run from ForAll
. In fact,
PLINQ can be a little dangerous as it’s not that obvious that your code
is going to run on multiple threads—it manages to make parallel code
look just a bit too normal. This is not always a problem—LINQ tends to
encourage a functional style of programming in its queries, meaning that
most of the data involved will be used in a read-only fashion, which
makes dealing with threading much simpler. But code executed by ForAll
is useful only if it has no side
effects, so you need to be careful with whatever you put in
there.
To exploit the potential of multicore CPUs, you’ll need to run code on multiple threads. Threads can also be useful for keeping user interfaces responsive in the face of slow operations, although asynchronous programming techniques can be a better choice than creating threads explicitly. While you can create threads explicitly, the thread pool—used either directly or through the Task Parallel Library—is often preferable because it makes it easier for your code to adapt to the available CPU resources on the target machine. For code that needs to process large collections of data or perform uniform calculations across large ranges of numbers, data parallelism can help parallelize your execution without adding too much complication to your code.
No matter what multithreading mechanisms you use, you are likely to
need the synchronization and locking primitives to ensure that your code
avoids concurrency hazards such as
races. The monitor facility built into every .NET object, and exposed
through the Monitor
class and C#
lock
keyword, is usually the best
mechanism to use, but some more specialized primitives are available that
can work better if you happen to find yourself in one of the scenarios for
which they are designed.
[40] In fact, the CLR creates some utility threads for various purposes, so if you inspect the process’s thread count, you’ll see more than one.
[41] Always remember that even if you have not created any threads explicitly, that doesn’t mean you’re necessarily writing single-threaded code. Some .NET Framework classes will bring extra threads into play implicitly. For example, the CLR’s garbage collector runs finalizers on a distinct thread.
[42] More accurately, it’s whichever thread acquires the lock
Console.WriteLine
uses
internally to serialize access to the console.
[43] .NET does have an isolation mechanism: you can divide code into so-called appdomains. But this adds its own complications and is designed for slightly more coarse-grained divisions, and it’s really not well suited to this problem. ASP.NET can use this to isolate multiple web applications sharing a process, but does not use it to isolate individual requests.
[44] Yielding is when a thread tells the OS scheduler that it wants to give other threads a turn using the CPU now, rather than waiting to be preempted. If there are no other runnable threads, yielding does nothing, and the thread will continue to run.
[45] Asynchronous APIs tend to be used slightly differently in server-side code in web applications. There, they are most useful for when an application needs to communicate with multiple different external services to handle a single request.
[46] This isn’t always supported. For example, if you attempt such
an early call on an End
method
for a networking operation on the UI thread in a Silverlight
application, you’ll get an exception.
[47] In case you’ve come across continuations in the sense meant by languages such as Scheme that offer call with current continuation, be aware that this is not the same idea. There’s a tenuous connection in the sense that both represent the ability to continue some work sometime later, but they’re really quite different.