Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 16. Threads and Asynchronous Code

A quotation variously ascribed to A.J.P. Taylor, Arnold Toybnee, and Winston Churchill describes history as “just one thing after another.” C# code is much the same—we write sequences of statements that will be executed one after another. Loops and conditional statements spice things up a little by changing the order, but there is always an order. While individual bits of C# code behave this way, programs as a whole do not have to.

For example, web servers are able to handle multiple requests simultaneously. The user interface for a program working on a slow operation should be able to respond if the user clicks a Cancel button before that slow work is complete. And more or less any computer bought recently will have a multicore processor capable of executing multiple pieces of code simultaneously.

C# can handle this kind of concurrent work thanks to the .NET Framework’s support for multithreading and asynchronous programming. We have a wide array of concurrency tools and there are many ways to use them—each example in the previous paragraph would use a different combination of threading mechanisms. Since there are many ways to approach concurrency problems, it’s worth drawing a clear distinction between the most common reasons for using the techniques and features this chapter describes.

Perhaps the most easily understood goal is parallel execution. A computer with a multicore processor (or maybe even multiple separate processor chips) has the capacity to run multiple bits of code simultaneously. If your program performs processor-intensive tasks, it might be able to work faster by using several cores at once. For example, video encoding is a slow, computationally complex process, and if you have, say, a quad-core computer, you might hope that by using all four cores simultaneously you’d be able to encode a video four times faster than you could with a conventional one-thing-after-another approach. As we’ll see, things never work out quite that well in practice—video encoding on four cores might turn out to run only three times as fast as it does on one core, for example. But even though results often fall short of naive expectations, the ability to perform multiple calculations at the same time—in parallel, as it were—can often provide a worthwhile speed boost. You’ll need to use some of the programming techniques in this chapter to achieve this in C#.

A less obvious (but, it turns out, more widespread) use of multithreading is multiplexing—sharing a single resource across multiple simultaneous operations. This is more or less the inverse of the previous idea—rather than taking one task and spreading it across multiple processor cores, we are trying to run more tasks than there are processor cores. Web servers do this. Interesting websites usually rely on databases, so the typical processing sequence for a web page looks like this: inspect the request, look up the necessary information in the database, sit around and wait for the database to respond, and then generate the response. If a web server were to handle requests one at a time, that “sit around and wait” part would mean servers spent large amounts of time sitting idle. So even on a computer with just one processor core, handling one request at a time would be inefficient—the CPU could be getting on with processing other requests instead of idly waiting for a response from a database. Multithreading and asynchronous programming make it possible for servers to keep multiple requests on the go simultaneously in order to make full use of the available CPU resources.

A third reason for using multithreading techniques is to ensure the responsiveness of a user interface. A typical desktop application usually has different motives for multithreading than a server application—since the program is being used by just one person, it’s probably not helpful to build an application that can work on large numbers of requests simultaneously to maximize the use of the CPU. However, even though an individual user will mostly want to do one thing at a time, it’s important that the application is still able to respond to input if the one thing being done happens to be going slowly—otherwise, the user may suspect that the application has crashed. So rather than being able to do numerous things at once we have less ambitious aims: work in progress shouldn’t stop us from being able to do something else as soon as the user asks. This involves some similar techniques to those required in multiplexing, although the need for cancellation and coordination can make user interface code more complex than server code, despite having fewer things in progress at any one time.

A related reason for employing concurrency is speculation. It may be possible to improve the responsiveness to user input by anticipating future actions, starting on the work before the user asks for it. For example, a map application might start to fetch parts of the map that haven’t scrolled into view yet so that they are ready by the time the user wants to look at them. Obviously, speculative work is sometimes wasted, but if the user has CPU resources that would otherwise be sitting idle, the benefits can outweigh the effective cost.

Although parallel execution, multiplexing, and responsiveness are distinct goals, there’s considerable overlap in the tools and techniques used to achieve them. So the ideas and features shown in this chapter are applicable to all of these goals. We’ll begin by looking at threads.

Threads

Threads execute code. They keep track of which statement to execute next, they store the values of local variables, and they remember how we got to the current method so that execution can continue back in the calling method when the current one returns. All programs require these basic services in order to get anything done, so operating systems clearly need to be able to provide at least one thread per program. Multithreading just takes that a step further, allowing several different flows of execution—several threads—to be in progress at once even within a single program.

Example 16-1 executes code on three threads. All programs have at least one thread—the .NET Framework creates a thread on which to call your Main method^[40]—but this example creates two more by using the Thread class in the System.Threading namespace. The Thread constructor takes a delegate to a method that it will invoke on the newly created thread when you call Start.

Example 16-1. Creating threads explicitly

using System;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        Thread t1 = new Thread(One);
        Thread t2 = new Thread(Two);

        t1.Start();
        t2.Start();

        for (int i = 0; i < 100; ++i)
        {
            Console.WriteLine("Main: " + i);
        }

    }

    static void One()
    {
        for (int i = 0; i < 100; ++i)
        {
            Console.WriteLine("One: " + i);
        }
    }

    static void Two()
    {
        for (int i = 0; i < 100; ++i)
        {
            Console.WriteLine("Two: " + i);
        }
    }
}

All three threads do the same thing here—they loop around 100 times, printing out a message to show how far they’ve gotten. Here are the first few lines of output I get on my system:

Main: 0
Main: 1
Main: 2
Main: 3
Main: 4
Main: 5
Main: 6
Main: 7
Two: 0
One: 0
One: 1
One: 2
One: 3
One: 4
One: 5
One: 6
One: 7
Main: 8
Main: 9
Main: 10
Main: 11
...

You can see that the main thread managed to count up to 7 before the others got going—this is normal, because it takes a little while for a new thread to get up to speed, and it’s often possible for the thread that called Start to make considerable progress before the threads it created do any visible work. And once they’re underway, you can see that all three loops are making progress, although the interleaving is a little surprising.

This illustrates an important feature of multithreaded code—it tends to be somewhat unpredictable. This particular program can print something different each time. We don’t want to fill the book with page after page of this kind of output, so here’s a quick summary of how a different run on the same machine started out: Main got up to 7 as before, then One printed the number 0, and after that, Two printed numbers from 0 all the way up to 27 before either of the other threads managed to get any more numbers out. And just for fun, here’s what we saw when running on a virtual machine hosted on the same hardware, but with just two virtual cores available in the VM: One manages to get all the way to 25 before Main gets a look in, and Two doesn’t print out its first line until One has gotten to 41 and Main has gotten to 31. The specifics here are not all that interesting; the main point is the variability.

The behavior depends on things such as how many CPU cores the computer has and what else the machine was doing at the time. The fact that this particular example ends up with each individual thread managing to print out relatively long sequences before other threads interrupt is a surprising quirk—we got this output by running on a quad-core machine, so you’d think that all three threads would be able to run more or less independently. But this example is complicated by the fact that all the threads are trying to print out messages to a single console window. This is an example of contention—multiple threads fighting over a single resource. In general, it would be our responsibility to coordinate access, but the .NET Framework happens to resolve it for us in the specific case of Console output by making threads wait if they try to use the console while another thread is using it. So these threads are spending most of their time waiting for their turn to print a message. Once threads start waiting for things to happen, strange behaviors can emerge because of how they interact with the OS scheduler.

Threads and the OS Scheduler

Threads don’t correspond directly to any physical feature of your computer—a program with four threads running on a quad-core computer might end up running one thread on each core, but it doesn’t usually happen that way. For one thing, your program shares the computer with other processes, so it can’t have all the cores to itself. Moreover, one of the main ideas behind threads is to provide an abstraction that’s mostly independent of the real number of processor cores. You are free to have far more threads than cores. It’s the job of the operating system scheduler to decide which thread gets to run on any particular processor core at any one time. (Or, more accurately, which thread gets to run on any particular logical processor—see the sidebar on the next page.)

A machine will usually have lots of threads—a quick glance at the Windows Task Manager’s Performance pane indicates that this machine currently has 1,340 threads. Who’d have thought that writing a book would be such a complex activity? The extent to which this outnumbers the machine’s four CPU cores highlights the fact that threads are an abstraction. They offer the illusion that the computer has an almost endless capacity for executing concurrent tasks.

Threads are able to outnumber logical processors by this margin because most threads spend the majority of their time waiting for something to happen. Most of those 1,340 threads have called operating system APIs that have blocked—they won’t return until they have some information to provide. For example, desktop applications spend most of their time inside a Windows API call that returns messages describing mouse and keyboard input and the occasional system message (such as color scheme change notifications). If the user doesn’t click on or type into an application, and if there are no system messages, these applications sit idle—their main thread remains blocked inside the API call until there’s a message to return. This explains how a quad-core machine can support 1,340 threads while the CPU usage registers as just 1 percent.

Logical Processor, Cores, and Simultaneous Multithreading

Unlike the software threads created in Example 16-1, a logical processor is a real, physical thing. It’s a part of a CPU capable of running one piece of code at a time. In the pictures that CPU vendors sometimes produce showing the innards of a processor, it’s possible to identify the discrete areas of the chip that correspond to each logical processor. For this reason, a logical processor is also sometimes called a hardware thread. You can see how many logical processors a machine has in the Windows Task Manager—its Performance tab shows a CPU usage graph for each logical processor.

There are several different approaches to providing multiple hardware threads. A few years ago it was simple—a single CPU could do only one thing at a time, so you had exactly as many logical processors as you had CPU chips in your computer. But there are now a couple of ways to have multiple logical processors on a single chip.

A multicore CPU is conceptually fairly straightforward: roughly speaking, it’s a single chip that happens to have multiple processors on it. But there’s another technology known as simultaneous multithreading or SMT (or hyperthreading, in Intel’s marketing terminology) in which a single core is able to execute multiple pieces of code simultaneously.

SMT requires less hardware than full multicore—in SMT some of the processing resources are shared. For example, a core might have only one piece of hardware capable of performing floating-point division operations, and there might also be just one piece of hardware dedicated to floating-point multiplication. If one hardware thread wants to multiply at the same time another hardware thread running on the same core wants to divide, those operations would be able to proceed in parallel, but if both want to perform division at the same time, one will have to wait until the other finishes. SMT processors have multiple sets of some resources—each hardware thread has its own set of registers, for example, and may have its own local hardware for certain frequently used arithmetic operations. So by duplicating only some of the hardware, SMT aims to cram multiple hardware threads into less silicon than a full multicore approach can, at the cost of less parallelism when threads end up competing for shared resources within the CPU.

Some CPUs use both techniques. For example, in some quad-core CPUs each core uses SMT to support two logical processors, so the CPU offers a total of eight logical processors. And of course, a computer might contain more than one processor chip—high-end motherboards offer multiple CPU slots. A machine with two quad-core processors with two SMT hardware threads per core would offer 16 logical processors, for example.

When a blocking API is finally ready to return, the thread becomes runnable. The operating system’s scheduler is responsible for deciding which runnable threads get to use which logical processors. In an ideal world, you’ll have exactly enough runnable threads to make full and perfect use of the CPU cycles you’ve paid for. In practice, there’s usually a mismatch, so either one or more logical processors will be idle, or there will be contention for processing resources.

In the latter case—where there are more runnable threads than logical processors—the scheduler has to decide which threads currently most deserve to run. If a thread runs without blocking for a while (typically a few milliseconds) and there are other runnable threads, the OS scheduler may preempt that thread—it interrupts its execution, stores information about what it was doing at the point at which it was preempted, and gives a different thread some CPU time. If a logical processor becomes available later (either because enough threads block or because some other thread was preempted) the OS will put things back to how they were before preemption, and allow it to carry on. The time for which a thread will be allowed to run before preemption is known as a quantum.

The upshot of this is that even if you have more threads than logical processors, and all of the threads are trying to execute code simultaneously, the OS scheduler arranges for all of them to make progress, despite outnumbering the logical processors. This illusion has a price: preempting a thread and scheduling a different thread to use the CPU slows things down, and you’ll often want to use the techniques we’ll see later that try to avoid forcing the scheduler to do this.

Note

.NET’s threading system is designed so that threads do not have to correspond directly to OS threads, but in practice they always do. At one point, Microsoft thought that .NET threads would need to be able to correspond to OS fibers, an alternative to threads where the application takes a more active part in scheduling decisions. This requirement came from a SQL Server 2005 feature that was cut shortly before the final release, so the distinction between OS threads and .NET threads is now essentially academic (although the feature could conceivably reemerge in a future version). It’s useful to be aware of this because a handful of API features are designed to accommodate this feature, and also because there are plenty of articles you may run into on the Internet written either before the feature was cut or by people who haven’t realized it was cut.

The Stack

Each thread has its own call stack, which means that items that live on the stack—function arguments and local variables—are local to the thread. We can exploit this to simplify Example 16-1, which contains three almost identical loops. Example 16-2 has just one copy of the loop which is shared by all three threads.

Example 16-2. Per-thread state on the stack

using System;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        Thread t1 = new Thread(Go);
        Thread t2 = new Thread(Go);

        t1.Start("One");
        t2.Start("Two");

        Go("Main");
    }

    static void Go(object name)
    {
        for (int i = 0; i < 100; ++i)
        {
            Console.WriteLine("{0}: {1}", name, i);
        }
    }
}

The Go method here contains the common loop—it has been modified slightly to take an argument so that each thread can print out either One, Two, or Main as before. Running this produces similar output to the previous example. (It’s not identical, of course, because these examples produce slightly different output every time they run.)

Note

We used a different overload of the Start method—we’re now passing an argument. And less obviously, we’re using a different constructor overload for Thread too—Example 16-1 used a constructor that accepts a delegate to a method taking zero arguments, but Example 16-2 uses an overload that accepts a delegate to a method that takes a single object argument. This overload provides one way of passing information into a thread when you start it—the argument we pass to Start is passed on to the Go method here.

This example illustrates an important point: multiple threads can be inside the same function at any time. All three threads in Example 16-2 spend most of their time inside the Go method. But since each thread gets its own stack, the values of the name argument and the loop variable (i) can be different for each thread.

Information that lives elsewhere is not intrinsically private to one thread. Example 16-3 shows another variation on our example. As with Example 16-2, it uses a common Go method to run a loop on all three threads, but the loop variable (i) is now a static field of the Program class—all three threads share the same variable.

Example 16-3. Erroneous sharing of state between threads

using System;
using System.Threading;

class Program
{
    // Visible to all threads. (Bad, in this example.)
    static int i;

    static void Main(string[] args)
    {
        i = 0;
        Thread t1 = new Thread(Go);
        Thread t2 = new Thread(Go);

        t1.Start("One");
        t2.Start("Two");

        Go("Main");
    }

    static void Go(object name)
    {
        // Modifying shared state without suitable protection - bad!
        for ( ; i < 100; ++i)
        {
            Console.WriteLine("{0}: {1}", name, i);
        }
    }
}

This example has a problem: all three threads will try to read and write the shared field, and things often go wrong when you do this. You might think that with three threads all sharing a single common counter, with each thread incrementing that counter every time they loop and each thread running until the counter hits 100, we’d just see all the numbers from 0 to 99 once. But it’s not quite that simple. For one thing, you might see all three threads print out 0, because they may all get to the point where they’re trying to print out the first value before any of them has gotten as far as trying to increment the counter. (Remember, a for loop executes its iterator clause—the ++i in this example—at the end of each iteration.) Then again you might not see that—it all really depends on when the OS scheduler lets the threads run. But there’s a subtler problem: if two threads both attempt to execute the ++i at the same time, we may see anomalous results—the value of i may end up being lower than the number of times it has been incremented, for example. If you want to share state between threads, you’ll need to use some of the synchronization mechanisms discussed later in this chapter.

Be aware that using local variables is not necessarily a guarantee that the state you’re working with lives on the stack. For example, when using reference types (and most types are reference types) you need to keep in mind the distinction between the variable that contains the reference and the object to which that reference refers. Example 16-4 uses nothing but local variables, but ends up using the same StringBuilder object from all three threads—each thread might have its own local variable to refer to that object, but all three variables refer to the same object.

Note

Example 16-4 does something slightly unusual with the Thread constructor. Our Go method now requires two arguments—the StringBuilder and the name—but Thread doesn’t provide a way to pass in more than one argument; we get to choose an argument count of either zero or one. So we’re using a lambda here to provide a zero-argument method for Thread, and that lambda passes the two arguments into Go, including the new StringBuilder argument. It has also enabled us to declare that the Go method is expecting the name to be a string, rather than the less specific object type used in the previous example. This technique doesn’t have anything to do with threading; it’s just a useful trick when you find yourself confronted with an API that takes a delegate that doesn’t have enough arguments. (And it’s not the cause of the problem here. Less concise ways of passing the object in would have had the same problem, and so would the use of multiple methods, which Example 16-1 illustrated.)

Example 16-4. Local variables, but shared state

using System;
using System.Threading;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        StringBuilder result = new StringBuilder();

        // Sharing the StringBuilder between threads. BAD!
        Thread t1 = new Thread(() => Go(result, "One"));
        Thread t2 = new Thread(() => Go(result, "Two"));

        t1.Start();
        t2.Start();

        Go(result, "Main");

        t1.Join();
        t2.Join();

        Console.WriteLine(result);
    }

    static void Go(StringBuilder sb, string name)
    {
        for (int i = 0; i < 100; ++i)
        {
            // All threads using the same StringBuilder - BAD!
            sb.AppendFormat("{0}: {1}", name, i);
            sb.AppendLine();
        }
    }
}

By the way, you’ll have noticed that this code calls Join on both Thread objects. The Join method blocks until the thread has finished—this code needs to do that because it prints the output only once the threads are done. This is a simple example of coordinating operations across multiple threads. However, it’s not sufficient to avoid problems here. Looking at the output, it’s clear that all is not well. Here are the first few lines, running on a quad-core system:

Main: One: Two: 00
Main:
1

2
MainTwo: 3
Main: 1
2
2
Two: 3One: Two: 4
Two: One: 6Two: One: 7
One: 8
: One: 9

That’s a whole lot more chaotic than the previous examples, which merely scrambled the order of the lines, and lost the odd increment. The reason this has gone more obviously wrong is that with the earlier examples, our attempt to observe the system profoundly changed its behavior. (That happens a lot with multithreaded code.) The calls to Console.WriteLine were imposing some order on the system, because the .NET Framework was forcing the threads to take it in turns when printing their output—that’s why we don’t get lines mixed up with one another. But Example 16-4 does all of its work in memory using a StringBuilder, and since it calls Console.WriteLine just once when it’s done, to print the results, nothing is forcing things to happen in any particular order, and so we can see the chaos in full effect.

Note

There’s another reason Console.WriteLine is likely to have a significant effect on behavior: it’s relatively slow. The actual work being done by the examples so far is trivial—incrementing counters takes very little time, and concatenating strings is more complex but still pretty fast. Writing messages out to the screen is orders of magnitude slower. (The same would apply if you were writing messages to a logfile or the debugger.) So our previous examples were spending almost all of their time inside the code added to observe the code’s behavior, and almost no time executing the behavior we were hoping to observe.

This sort of problem makes multithreaded code remarkably resistant to debugging—it is far more sensitive to the observer’s paradox than other kinds of code. Generally speaking, as soon as you try to examine a threading bug, the bug goes away. (It comes back as soon as you stop looking, of course. Plenty of systems have gone live with debugging code compiled in because the debugging code made certain problems “go away.” Don’t rely on that—threading problems that appear to go away when you haven’t really fixed them are really just hiding.)

Just to be clear, the reason for the chaos is that even though each thread has its own local sb variable held privately on that thread’s stack, they all refer to the same StringBuilder object—we have three references to the same object. All three threads are trying to add output to the same StringBuilder at the same time, and the result is a mess.

You need to be absolutely clear in your head on where the information you’re working with lives, where it came from, and whether other threads might be able to see it. Objects created by a particular thread are OK as long as you never make them visible to other threads—Example 16-4 hit problems because the main thread created a StringBuilder and then arranged for it to be accessible from the other two threads.

This means you need to be especially careful when using nested methods—either anonymous delegates or lambdas—because these provide a way to share local variables between threads. Example 16-5 shows how the problem in Example 16-3 can happen even with value type local variables. It has just one loop count variable (i) shared between all the threads, but it does this without making it a field—it’s a local variable.

Example 16-5. Local value type variables as shared state

using System;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        // Visible to all threads, thanks to use of
        // anonymous method. (Bad, in this example.)
        int i = 0;

        ParameterizedThreadStart go = delegate(object name)
        {
            // Modifying shared state without suitable protection - bad!
            for (; i < 100; ++i)
            {
                Console.WriteLine("{0}: {1}", name, i);
            }
        };


        Thread t1 = new Thread(go);
        Thread t2 = new Thread(go);

        t1.Start("One");
        t2.Start("Two");

        go("Main");
    }
}

This example demonstrates that while it may be convenient to think of value type local variables as living on the stack, it’s not always true. Example 16-5 contains an anonymous method that makes use of the local i variable declared by the containing method (Main), so the C# compiler has been obliged to convert that variable into a field inside a generated class, in order to make it possible for that one variable to be used from multiple methods.

To summarize: information that really does live on the stack is private to a particular thread. Unfortunately, using local variables doesn’t necessarily guarantee that the state you’re working with is on the stack. Be wary of reference types—no matter where the reference lives, the thing it refers to will not be on the stack, so you need to understand what other code might have a reference to the object you’re using. Be wary of value types whose implementation you do not control—value types are allowed to contain fields of reference types, so you’re not guaranteed to be safe just because you’re using a value type. And be wary of lambdas and anonymous methods—they can move information off the stack and into a place where it’s accessible to multiple threads at once. We’ll see later what to do if you really have to share information across threads.

The examples we’ve seen so far create threads explicitly in order to illustrate the operation of multiple threads. But .NET often creates threads automatically without you having created Thread objects. The most obvious example is the thread that the .NET Framework calls your Main method on, but there are others—some of the asynchronous communication mechanisms we saw in Chapter 13 call back into your code on different threads than the one you started work on. We’ll be seeing more examples of this later in the chapter when we examine .NET’s Asynchronous Programming Model.

In fact, it’s relatively unusual to create new threads explicitly. If you need concurrent execution and you’re not using some part of the .NET Framework that supplies you with threads when you need them, it’s often better to use the thread pool or the Task Parallel Library, both of which we’ll see later.

One problem with explicit thread creation is in knowing how many to create. Threads are relatively expensive—each one consumes system resources, and there are also factors that can limit the number of threads in a single process. There’s also a cost in switching between threads—the context switch that occurs when the OS scheduler moves a thread from one logical processor to another. If you have many more runnable threads than logical processors, you’ll pay this cost on a very regular basis, and it can start to have a significant effect on throughput. In an ideal world you would have no more threads than logical processors, avoiding any context switch overhead. However, most threads block from time to time, so in reality you tend to need more threads than logical processors if you want to fully use your CPU cycles. In general, you should try to keep the thread count as low as is practical—a single program that creates more than a handful per logical processor is likely to have problems.

Warning

Never build a service that creates a new thread for each incoming request. This is a classic rookie mistake, as it seems like an obvious thing to do. It appears to work for light workloads, but it runs straight into two of the biggest performance problems you can hit with threads. First, creating threads is expensive, so if each thread exists only for as long as it takes to handle a single request, you risk spending more CPU time on setting up and destroying threads than on useful work. Second, this approach doesn’t limit the number of threads, so as the system gets busy, its performance will get disproportionately worse thanks to context switching overhead and the memory footprint of the resources associated with each thread. You can avoid these problems by using either the asynchronous patterns or the thread pool techniques described later in this chapter.

Creating just enough threads is often hard, because getting the balance right depends on things such as your application’s current workload, other work in progress on the machine, and characteristics of the machine itself. Fortunately, .NET provides the thread pool to make this sort of thing easier.

The Thread Pool

The .NET Framework provides a thread pool, which is a collection of worker threads available to perform short pieces of work. The thread pool continuously adjusts the number of threads that are allowed to process work items simultaneously in an attempt to optimize throughput.

The exact algorithm used to adjust the thread count is not documented, but as a general rule, if the system is not busy, work will typically be serviced very quickly after you queue it up. But as the computer becomes busier, items will sit in the queue for longer—the thread pool tries to avoid the overheads of preemption, thread switching, and resource contention by not running too much concurrent work. When a system is already busy, trying to process more work items would probably slow it down further, and so keeping items queued up and processing fewer at a time is likely to result in better overall performance.

The simplest way to use the thread pool is through its QueueUserWorkItem method. Example 16-6 shows a modification to the previous examples—rather than creating threads, it uses the thread pool. QueueUserWorkItem takes a delegate to any method that accepts a single object as its argument, so it’s happy with the same Go method as Example 16-2. (Unlike the Thread constructor, there’s no overload that accepts a method without an argument—the thread pool insists on there being an argument whether you have a use for one or not.)

Example 16-6. Queuing work items for the thread pool

static void Main(string[] args)
{
    ThreadPool.QueueUserWorkItem(Go, "One");
    ThreadPool.QueueUserWorkItem(Go, "Two");

    Go("Main");
    // Problem: not waiting for work items to complete!
}

This example has a problem: if the main thread finishes first, the program may exit before the thread pool work items complete. So this only illustrates how to start work on the thread pool. This might not be a problem in practice—it depends on your application’s typical life cycle, but you may need to add additional code to coordinate completion. Running on a quad-core machine, this particular example behaves in much the same way as the previous ones, because the thread pool ends up creating a thread for both work items. On a single-core machine, you might see a difference—it could decide to let the first item run to completion and then run the second afterward.

The thread pool is designed for fairly short pieces of work. One of the most important jobs it was originally introduced for was to handle web requests in ASP.NET web applications, so if you’re wondering how much work constitutes “fairly short,” a reasonable answer is “about as much work as it takes to generate a web page.”

.NET 4 introduces a new way to use the thread pool, the Task Parallel Library, which offers a couple of advantages. First, it handles certain common scenarios more efficiently than QueueUserWorkItem. Second, it offers more functionality. For example, tasks have much more comprehensive support for handling errors and completion, issues Example 16-6 utterly fails to address. If the main thread finishes before either of the work items is complete, that example will simply exit without waiting for them! And if you want the main thread to discover exceptions that occurred on the thread pool threads, there’s no easy way to do that. If any of these things is important to you, the Task Parallel Library is a better way to use the thread pool. There’s a whole section on that later in this chapter. For now, we’ll continue looking at some aspects of threading that you need to know, no matter what multithreading mechanisms you may be using.

Thread Affinity and Context

Not all threads are equal. Some work can be done on only certain threads. For example, WPF and Windows Forms both impose a similar requirement: an object representing something in the user interface must be used only on the thread that created that object in the first place. These objects have thread affinity, meaning that they belong to one particular thread.

Note

Not all things with thread affinity are quite as obstinate as user interface elements. For example, while some COM objects have thread affinity issues, they are usually more flexible. (COM, the Component Object Model, is the basis of various Windows technologies including ActiveX controls. It predates .NET, and we’ll see how to use it from .NET in Chapter 19.) .NET handles some of the COM thread affinity work for you, making it possible to use a COM object from any thread. The main ways in which COM’s thread affinity will affect you are that certain objects will have different performance characteristics depending on which thread you call them on, and there may be additional complications if your COM objects use callbacks.

So thread affinity just means that the thread you’re calling on makes a difference. It doesn’t always mean that using the wrong thread is guaranteed to fail—it depends on what you’re using.

If you never write multithreaded code, you never have to worry about thread affinity—if you do everything on one thread, it will always be the right one. But as soon as multiple threads get involved—either explicitly or implicitly^[41]—you may need to add code to get things back on the right thread.

ASP.NET has a similar problem. It makes contextual information about the current request available to the thread handling the request, so if you use multiple threads to handle a single request, those other threads will not have access to that contextual information. Strictly speaking, this isn’t a thread affinity issue—ASP.NET can use different threads at different stages of handling a single request—but it presents the same challenge to the developer: if you start trying to use ASP.NET objects from some random thread, you will have problems.

The .NET Framework defines a solution that’s common to WPF, Windows Forms, and ASP.NET. The SynchronizationContext class can help you out if you find yourself on the wrong thread when using any of these frameworks. Example 16-7 shows how you can use this in an event handler for a GUI application—the click handler for a button, perhaps.

Example 16-7. Handling thread affinity with SynchronizationContext

SynchronizationContext originalContext = SynchronizationContext.Current;

ThreadPool.QueueUserWorkItem(delegate
    {
        string text = File.ReadAllText(@"c:	emplog.txt");

        originalContext.Post(delegate
        {
            myTextBox.Text = text;
        }, null);
    });

The code reads all the text in from a file, and that’s something that might take awhile. Event handlers in WPF and Windows Forms are called on the thread that the event source belongs to—a UI thread. (Or the UI thread if, like most desktop applications, you have only one UI thread.) You should never do slow work on a UI thread—thread affinity means that if your code is busy using that thread, none of the UI elements belonging to that thread will be able to do anything until you’re finished. The user interface will be unresponsive for as long as you keep the thread busy. So Example 16-7 uses the thread pool to do the work, keeping the UI thread free.

But the code wants to update the UI when it has finished—it’s going to put the text it has retrieved from the file into a text box. Since a text box is a UI element, it has thread affinity—we can update it only if we’re on the UI thread. This is where SynchronizationContext comes to the rescue.

Before starting the slow work, Example 16-7 reads the SynchronizationContext class’s Current property. This static property returns an object that represents the context you’re in when you read it—precisely what that means will depend on what UI framework you’re using. (The object you get back works differently depending on whether your application uses WPF, Windows Forms, or ASP.NET.) But the exact implementation doesn’t matter—you just need to hold on to it until you need to get back to that context. Having grabbed the context while we were in the click handler, we then kick off the work in the thread pool. And once that work is complete, it calls the stored SynchronizationContext object’s Post method. Post takes a delegate, and it’ll invoke that delegate back in whatever context you were in when you grabbed the context. So in this case, it’ll invoke our delegate on the UI thread that the button belongs to. Since we’re back on our application’s UI thread, we’re now able to update the text box.

Common Thread Misconceptions

There are some persistent myths surrounding threads, which sometimes lead to their overuse. As the current trend seems to be for the number of logical processors in typical machines to edge ever upward, developers sometimes feel practically obliged to write multithreaded code. Since using threads correctly is difficult and error-prone, it’s worth tackling some of these myths, in case you’re in a situation where a single-threaded solution might actually be better.

Myth: Threads are necessary to get work done

You need a thread to run code, but that’s not the only kind of work computers do. In fact, it’s a fairly unusual program that spends most of the time executing code; CPU usage of 100 percent is often a sign that a program has hung. Computers contain various kinds of specialized hardware capable of getting on with work while the CPU is either idle or off doing something else—messages can be sent and received over the network, data can be read from and written to disk, graphics can be rendered, sound can be played. Code needs to run to coordinate these activities, but that typically needs to happen at the start, when kicking off some work, and then again at the end once the work completes. In between, all the interesting work is being done by specialized hardware. The CPU has no role to play, and may well enter a low-power idle state where it’s not doing any work at all.

Note

That’s why the fans on some computers spin up into high speed when the machine gets busy—most of the time the CPU is asleep and consuming relatively little power. The cooling system’s full capacity is needed only when the CPU is executing code for sustained periods.

Nontrivial code in real programs tends to involve multifaceted work, so the CPU might have work to do at various stages besides the start and finish, but even then, you tend to see long periods of waiting for things to happen punctuated by short bursts of CPU activity, particularly when multiple machines are involved (such as a web server and a database server). Even fast computer networks can take hundreds of microseconds to send a message, and while that may appear instantaneous to human eyes, modern CPUs are able to execute hundreds of thousands of instructions in that time. Measured against how fast CPUs run, network communications always appear glacially slow. And it’s a similar story with most I/O.

The nature of I/O is often not obvious from the way APIs are structured. For example, look at the call to File.ReadAllText in Example 16-7—the obvious way to think of that is as a method that reads all the contents of a file off disk and returns the contents as text once it’s finished. It seems like the thread we use to call that method is busy doing work for as long as it takes. But in fact, most of the time the thread is inside that method, it will almost certainly be blocked—it won’t be in a runnable state, because it’s waiting for the disk controller in the computer to fetch the file’s content off disk. And unless the disk in question is a solid state drive, it could take milliseconds to get the information into memory—that part of the process will take orders of magnitude longer than the code inside ReadAllText that converts those bytes back into a .NET string object.

Note

Solid state drives change things only a little. You don’t need to wait for bits of metal to lumber into position, but the fact that they are physically separate components slows things down—it will take time for the disk controller hardware to send suitable messages to the drive, and for the drive to send a response. The difference between the time spent retrieving data and processing that data will not be as dramatic, but the code will still account for a small proportion.

The one situation in which this particular example might be dominated by CPU usage rather than time spent waiting for I/O is if the file in question is already in the operating system’s filesystem cache—when the OS reads data off disk it tends to keep a copy in memory for a while just in case you need it again. In that case, the disk doesn’t need to be involved at all, and it really is all CPU time. You need to be wary of this when testing performance—the first time a particular file is read will be much slower than all the rest. If you run a test hundreds of times to get a good average measurement, you’ll be ignoring the “first” and testing the “all the rest” case, but in a desktop application the one most users will notice is often the first case. For interactive code, the worst case can matter far more than the average.

An upshot of this is that using asynchronous APIs is sometimes much more effective than creating lots of threads. On the server, this can yield better throughput because you avoid the overheads associated with having more threads than you need. And on the client, it can sometimes simplify code considerably, as some asynchronous APIs let you work with a single-threaded model with no loss of responsiveness.

Myth: Multiple logical processors will necessarily make things faster

For years, processors managed to double CPU performance on a regular basis—a new processor would do everything that the one you bought a couple of years ago could do, but twice as fast. We were in the luxurious position where our code just got faster and faster without having to do anything, and because the growth was exponential—doubling up again and again—the cumulative effects were astounding. Computers are tens of millions of times faster than they were a few decades ago.

Sadly, that all stopped a few years ago because we started running into some harsh realities of physics. In response to this, CPU vendors have switched to providing more and more logical processors as a means of continuing to deliver processors that are, in some sense, twice as fast as the ones from a couple of years ago. They can do this because even though clock speeds have stopped doubling up, Moore’s law—that the number of transistors per chip doubles roughly every two years—is still in action for the time being.

Unfortunately, the doubling in speed between a single-core and dual-core system is hypothetical. Technically, the dual-core system might be able to perform twice as many calculations as the single-core one in any given length of time, but this is an improvement only if it’s possible to divide the work the user needs to do in a way that keeps both cores busy, and even if that’s possible, it’s effective only if other resources in the system such as memory and disk are able to provide input to the calculations fast enough for this double-speed processing.

Work often cannot progress in parallel. The second step of a calculation might depend on the results of the first step, in which case you can’t usefully run the two steps on different cores—the core running the second step would just have to sit around and wait for the result from the first step. It would probably be faster to run both steps on a single core, because that avoids the overheads of getting the results of the first step out of the first core and into the second core. Where the calculations are sequential, multiple cores don’t help.

So the nature of the work matters. Certain jobs are relatively easy to parallelize. For example, some kinds of image processing can be spread over multiple logical processors—if the processing is localized (e.g., applying a blur effect by smearing nearby pixels into one another), it’s possible for different logical processors to be working on different parts of the image. Even here you won’t get a 4x speedup on a four-core system, because some coordination might be necessary at the boundaries, and other parts of the system such as memory bandwidth may become a bottleneck, but these sorts of tasks typically see a useful improvement. However, these so-called embarrassingly parallel tasks are the exception rather than the rule—a lot of computation is sequential in practice. And of course, many problems live in a middle ground, where they can exploit parallelism up to a certain point, and no further. So there’s usually a limit to how far multithreading can help programs execute faster.

That doesn’t stop some people from trying to use as many multiple logical processors as possible, or from realizing when doing so has failed to make things better. It’s easy to be distracted by achieving high CPU usage, when the thing you really want to measure is how quickly you can get useful work done.

Myth: Maxing the CPU must mean we’re going really fast

It’s possible to construct parallel solutions to problems that manage to use all the available CPU time on all logical processors, and yet which proceed more slowly than single-threaded code that does the same job on one logical processor. Figure 16-1 shows the CPU load reported by the Windows Task Manager for two different solutions to the same task.

The image on the left might make you feel that you’re making better use of your multicore system than the one on right. The righthand side is using far less than half the available CPU capacity. But measuring the elapsed time to complete the task, the code that produced the lefthand image took about 15 times longer to complete than the code that produced the righthand one!

The job in hand was trivial—both examples just increment a field 400 million times. Example 16-8 shows both main loops. The Go function is the one that gets invoked concurrently on four threads. GoSingle just runs multiple times in succession to perform the iterations sequentially.

Figure 16-1. Using all logical processors (left) versus just one (right)

Example 16-8. Multithreaded versus single-threaded

class Program
{
    static int Count;
    const int Iterations = 100000000;

    static void Go()
    {
        for (int i = 0; i < Iterations; ++i)
        {
            Interlocked.Increment(ref Count);
        }
    }

    static void GoSingle(int repeat)
    {
        for (int outer = 0; outer < repeat; ++outer)
        {
            for (int i = 0; i < Iterations; ++i)
            {
                Count += 1;
            }
        }
    }

    ...

Here’s the code that launches Go concurrently:

Count = 0;
List<Thread> threads = (from i in Enumerable.Range(0, 4)
                        select new Thread(Go)).ToList();

threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

This creates four threads, all of which call Go. Next, it calls Start on each of the threads. Having started them all, it calls Join on each thread to wait for them all to complete. (We could have written three loops here, but our use of LINQ and lambdas makes the code much more compact. In particular, if you have a loop that invokes just one operation on every item in a list the List<T> class’s ForEach method is a less cluttered way of expressing this than a foreach loop.) The code to launch the single-threaded version is a lot simpler:

Count = 0;
GoSingle(4);

Both produce the same result: the Count field contains 400,000,000 at the end. But the multithreaded version was much slower. One reason is the difference in how the two versions increment the Count. Here’s the line in question from the single-threaded code:

Count += 1;

But if we try that in the multithreaded version, it doesn’t work. It certainly makes it run nice and quickly—about three times faster than the single-threaded version—but trying it a few times, Count comes to 110,460,151, then 133,533,503, then 133,888,803... The majority of increments are getting lost—that’s the sort of thing that happens when you don’t use suitable protection for accessing shared state. That’s why the multithreaded version needs to do this:

Interlocked.Increment(ref Count);

The Interlocked class in the System.Threading namespace provides methods that perform certain simple operations in ways that work even if multiple threads try to use them on the same data at the same time. As the name suggests, Increment increments the count, and it does this in a way that locks out any other logical processors attempting to do the same thing at the same time. It forces the logical processors to take it in turns.

It works—the Count total is correct with this code in place—but at some cost. On a quad-core system, with all four cores burning away at 100 percent, this takes 15 times longer than the simple single-threaded solution.

In fact, the cost of Interlocked.Increment does not fully explain the difference. Modifying the single-threaded version to work the same way makes it run about five times slower, but that’s still three times faster than the multithreaded code. So a considerable amount of the slowdown is down to the communication costs between the processors.

Don’t take these numbers too seriously. This is clearly a contrived example. (If we wanted it to run really fast we could just have initialized Count to 400,000,000 to start with, and removed the loop.) But while the details are spurious the basic principle applies broadly: the cost of contention between logical processors that are supposed to be cooperating can work against you. Sometimes they merely erode the benefit—you might see a 2.5x speedup on a quad-core system, for example. But sometimes they really do negate the benefit—contrived though this example may be, much worse examples have cropped up in real systems.

Note

Some implementations may come out worse on some systems and better on others. For example, some parallel algorithms take a considerable hit relative to their sequential counterparts in order to be able to scale well on more processors. Such an algorithm might make sense only on systems where you have a large number of processors—it might be slower than the single-threaded version on a dual-core system, but very worthwhile on a system with 16 logical processors, for example.

The bottom line is that if you want to understand whether a parallel solution is effective, you need to compare it against a single-threaded solution on the same hardware. Just because your CPU loads indicate that you’re being extremely parallel, that’s not necessarily the same as being really fast. And unless your code will only ever run on a single hardware configuration, you need to perform the same comparison on lots of different machines to get a good idea of how often a parallel solution might be the best choice.

Multithreaded Coding Is Hard

Even when multithreaded code provides a demonstrable performance benefit, it’s very hard to get right. We’ve already seen a few bizarre behaviors in some extremely simple examples. Achieving correct behavior in a real concurrent system can be very challenging. So we’ll look at two of the most common classes of pitfalls before examining some strategies for avoiding them.

Race conditions

The anomalies we’ve seen so far have all been examples of a kind of concurrency hazard known as a race, so called because the outcome is determined by which participant gets to a particular place first. Example 16-1 displays different output each time it runs, because there are three threads all trying to print to the same console window, and the only thing that determines which one gets to print the next line is whichever happens to be the next to call Console.WriteLine.^[42] There is no coordination between the threads, and so they all pile in at once. It’s a relatively complicated example in some ways, because most of the race happens where we can’t see it—that example spends almost all of its time inside Console.WriteLine. It’s easier to understand races when you can see all of the code.

So consider the broken variation of Example 16-8 where the concurrently executing Go method used a simple Count += 1 instead of Interlocked.Increment. We saw that using the += operator resulted in lost increments, but why? The += operator has to do three things: first it must read the current value of Count, then it has to add 1 to that value, and finally it has to store the result back into the variable. The RAM chips in your computer don’t have the ability to perform calculations, so there’s no getting away from the fact that the value has to go into the CPU so that it can calculate the new value, and then it has to be written back out again so that the new value is not forgotten. There will always be a read and then a write.

Consider what happens when two threads try to increment the same Count field. Let’s call them Thread A and Thread B. Table 16-1 shows one possible sequence of events. In this case it works out fine: Count starts at 0, is incremented twice, and ends up at 2.

Table 16-1. Two increments, one after the other

Count	Thread A	Thread B
0	Read `Count` (0)
0	Add 1 (0 + 1 = 1)
1	Write `Count` (1)
1		Read `Count` (1)
1		Add 1 (1 + 1 = 2)
2		Write `Count` (2)

But it might not work so well. Table 16-2 shows what happens if the work overlaps, as could easily happen with multiple logical processors. Thread B reads the current Count while Thread A was already part of the way through the job of incrementing it. When Thread B comes to write back its update, it has no way of knowing that Thread A has updated the value since B did its read, so it effectively loses A’s increment.

Table 16-2. Lost increment due to overlap

Count	Thread A	Thread B
0	Read `Count` (0)
0	Add 1 (0 + 1 = 1)	Read `Count` (0)
1	Write `Count` (1)	Add 1 (0 + 1 = 1)
1		Write `Count`

There are lots of variations on the order, some of which work fine and some of which fail. If your code makes possible an ordering that produces wrong results, sooner or later you’ll run into it.

Warning

Don’t fall into the trap of believing that a highly improbable outcome is effectively impossible. You’ll be fooling yourself—sooner or later the problem will bite. The only difference with the highly improbable problems is that they’re extremely hard to diagnose and debug.

The example shown here is about as simple as a race gets. With real code things tend to be a lot more complex, as you will probably be dealing with data structures more intricate than a single integer. But in general, if you have information which is visible to multiple threads and at least one of those threads is changing that information in any way, race conditions are likely to emerge if you don’t take steps to prevent them.

The solution to races is, on the face of it, obvious: the threads need to take it in turns. If threads A and B simply coordinated their operations so that either would wait until the other was done when an update is in progress, we could avoid the problem. Interlocked.Increment does exactly that, although it’s rather specialized. For the occasions when you’re doing something more complex than incrementing a field, .NET provides a set of synchronization mechanisms that let you force threads to take it in turns. We’ll get to these shortly. However, this solution introduces another class of problem.

Deadlocks and livelocks

When code waits in order to avoid stepping on other threads’ toes, it’s possible for the application to lock up, because all the threads can end up waiting for each other to finish. This tends not to happen with simple, short operations involving just a single piece of data. Lockups typically occur when a thread that already has exclusive access to some data starts waiting for something else.

The standard example involves transferring money between two bank accounts; let’s call them X and Y. Suppose two threads, A and B, are both attempting to transfer money between these two accounts; A transfers money from X to Y while B transfers from Y to X. Both threads will need to use some sort of synchronization mechanism to get exclusive access to the accounts in order to avoid race conditions of the kind previously discussed. But imagine that the following happens:

Initially, no threads are attempting to do anything to either account.
Thread A gets exclusive access to Account X.
Thread B gets exclusive access to Account Y.
Thread A attempts to get exclusive access to Account Y—it can’t because B has access, so A waits for B to relinquish Account Y.
Thread B attempts to get exclusive access to Account X—it can’t because A has access, so B waits for A to relinquish Account X.

The exact mechanism used to manage exclusive access is irrelevant because the outcome is the same: A has come to a halt waiting for B to let go of Y. But B isn’t going to let go of Y until it has managed to acquire X, and unfortunately it won’t be able to—A is in possession of X and has just come to a halt. Neither side can proceed because each is waiting for the other to let go. This is sometimes known as a deadly embrace.

This condition can cause both deadlocks and livelocks, and the distinction has to do with the mechanism used to manage exclusive access. If threads go into a blocked state while waiting for access, neither thread is runnable once we hit the deadly embrace, and that’s typically described as a deadlock—the symptom is a system that has gone idle, despite having work to be getting on with. A livelock is similar, but tends to involve synchronization mechanisms that use CPU cycles while waiting—some synchronization primitives actively poll for availability rather than blocking. Active polling is just as subject to a deadly embrace as a blocking approach, it just has different symptoms—livelocks hang with high CPU usage.

Warning

The two concurrency hazards just described—races and deadly embraces—are not the only kinds of multithreading problems. There are endless ways in which you can get into trouble in concurrent systems, so we can really only scratch the surface. For example, besides issues that can compromise the correct behavior of your code, a whole host of concurrency issues can cause performance problems. For a deep discussion of the issues and what to do about them, we recommend Concurrent Programming on Windows by Joe Duffy (Addison-Wesley).

What can we do to avoid the numerous pitfalls of multithreaded code?

Multithreading Survival Strategies

There are several approaches for mitigating the difficulties of multithreading, each with a different trade-off between difficulty and flexibility.

Abstinence

Obviously, the simplest way to avoid the risks inherent in multithreading is not to do it at all. This doesn’t necessarily mean abandoning everything in this chapter, however. One of the asynchronous patterns can enable certain kinds of applications to get some of the benefits of asynchrony while sticking with a single-threaded programming model.

Isolation

If you’re going to have multiple threads, a good way to keep things simple is to avoid sharing information between them. ASP.NET encourages this model—it uses the thread pool to handle multiple requests simultaneously, but by default each individual request runs your code on just one thread. (You can opt into an explicitly asynchronous model if you want to use multiple threads per request, but for straightforward scenarios the single-threaded style is best.) So although the web application as a whole is able to run multiple concurrent threads, those threads don’t interact.

This approach requires some discipline. There’s nothing in .NET that enforces isolation^[43]—you simply have to choose not to share data between threads. In a web application, that’s relatively easy because HTTP naturally discourages stateful communications, although if you start using caching techniques to improve performance, you lose some isolation because all your requests will end up using shared objects in your cache. And any information in a static field (or any object reachable directly or indirectly from a static field) is potentially shared.

Chances are good that most multithreaded applications will have at least some information that needs to be accessed by several threads, so complete isolation may not be realistic. But maximizing isolation is a good idea—keeping as much information as possible local to individual threads means not having to worry about concurrency hazards for any of that information.

Immutability

When you really have to share data, you can often avoid many concurrency hazards by sharing only immutable data, that is, data that cannot be altered. Fields marked with readonly cannot be modified after construction—the C# compiler enforces this—so you don’t have to worry about whether those fields are being changed by other threads as you try to use them. You need to be careful, though—readonly applies only to the field itself, and not the object the field refers to if it’s a reference type. (And even if the field is a value type, if that value itself contains fields of reference types, the objects being referred to are not affected by readonly.) So as with isolation, this is an option that requires some discipline.

Synchronization

If you’re writing multithreaded code, sooner or later you will probably need to have at least some information that is accessible to multiple threads, and which occasionally needs to be changed—isolation and immutability are sometimes simply not options. In this case, you’ll need to synchronize access to the data—for example, anytime shared information is being modified, you’ll need to make sure no other threads are trying to read or write that information. This requires the most discipline of any of the solutions described here, and is likely to be the most complex, but it offers the most flexibility.

The .NET Framework provides a wide range of features to help you synchronize the way your threads access shared information, and these are the topic of the next section.

Synchronization Primitives

There are two important ways in which the operations of multiple threads may need to be coordinated. When you have shared modifiable data, it needs to be possible to make threads take it in turns to access that data. But it’s also often important for threads to be able to discover when something has happened—a thread might want to enter a blocking state until such time as it has useful work to do, for example. So some synchronization primitives provide notification rather than exclusive access. Some offer a combination of the two.

Monitor

The most widely used synchronization primitive in .NET is the monitor. It is supported directly by the .NET Framework—any object can be used with this facility—and also by C#, which provides a special keyword for working with monitors. Monitors offer both mutual exclusion and notification.

The simplest use of a monitor is to ensure that threads take it in turns to access shared state. Example 16-9 shows some code that would need the kind of protection a monitor can provide before we could use it from multiple threads. It is designed for handling lists of recently used strings—you might use this sort of code to provide a recently used file list on an application’s File menu. This code makes no attempt to protect itself in the face of multithreading.

Example 16-9. Code unsuitable for multithreading

class MostRecentlyUsed
{
    private List<string> items = new List<string>();
    private int maxItems;

    public MostRecentlyUsed(int maximumItemCount)
    {
        maxItems = maximumItemCount;
    }

    public void UseItem(string item)
    {
        // If the item was already in the list, and isn't the first
        // item, remove it from its current position, since we're
        // about to make it this first item.
        int itemIndex = items.IndexOf(item);
        if (itemIndex > 0)
        {
            items.RemoveAt(itemIndex);
        }

        // If the item's already the first, we don't need to do anything.
        if (itemIndex != 0)
        {
            items.Insert(0, item);

            // Ensure we have no more than the maximum specified
            // number of items.
            if (items.Count > maxItems)
            {
                items.RemoveAt(items.Count - 1);
            }
        }
    }

    public IEnumerable<string> GetItems()
    {
        return items.ToArray();
    }
}

Example 16-10 is some test code to exercise the class.

Example 16-10. Testing the MostRecentlyUsed class

const int Iterations = 10000;

static void TestMru(MostRecentlyUsed mru)
{
    // Initializing random number generator with thread ID ensures
    // each thread provides different data. (Although it also makes
    // each test run different, which may not be ideal.)
    Random r = new Random(Thread.CurrentThread.ManagedThreadId);
    string[] items = { "One", "Two", "Three", "Four", "Five",
                       "Six", "Seven", "Eight" };
    for (int i = 0; i < Iterations; ++i)
    {
        mru.UseItem(items[r.Next(items.Length)]);
    }
}

Example 16-10 just feeds in strings from a fixed set of items at random. Calling this test function from just one thread produces the expected results: at the end, the MostRecentlyUsed object just returns the most recent items put into it by this test. However, the multithreaded test in Example 16-11 causes something quite different to happen.

Example 16-11. Executing a multithreaded test

MostRecentlyUsed mru = new MostRecentlyUsed(4);

const int TestThreadCount = 2;
List<Thread> threads = (from i in Enumerable.Range(0, TestThreadCount)
                        select new Thread(() => TestMru(mru))).ToList();

threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

foreach (string item in mru.GetItems())
{
    Console.WriteLine(item);
}

This example crashes on a multicore machine—after awhile, it throws an ArgumentOutOfRangeException. It doesn’t crash at the same place on every run; it crashes inside either of the two calls to the List<T> class’s RemoveAt method.

The exceptions occur due to races. For instance, consider this line of code from Example 16-9:

items.RemoveAt(items.Count - 1);

This reads the value of the Count property, then subtracts 1 to get the index of the last item in the list, and then removes that last item. The race here is that some other thread may manage to remove an item from the list in between this thread reading the Count property and calling RemoveAt. This causes the method to throw an ArgumentOutOfRangeException, because we end up asking it to remove an item at an index that’s after the final item.

In fact, we’re lucky we got an exception at all. The List<T> class makes no guarantees when we use it from multiple threads. Here’s what the documentation for the class says in the Thread Safety section:

Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

This means that it’s our problem to make sure we never try to use a List<T> instance from more than one thread at a time. It could fail in subtler ways than crashing—it could corrupt data, for example.

Note

List<T> is not unusual. Most types in the .NET Framework class library make no guarantees of thread safety for instance members.

We could add similar documentation to our MostRecentlyUsed class, declaring that it does not make any guarantees either. In fact, that might well be the best option—it’s very difficult for an individual class to guarantee to work correctly in all possible multithreading scenarios. Only the application that uses the class really knows what constitutes correct behavior. For example, it might be necessary for a MostRecentlyUsed object to be kept in sync with some other object, in which case the application is going to have to manage all synchronization itself, and there’s nothing useful that our class could do on its own. This is one reason why the lack of thread safety guarantees is so widespread in the class libraries—there isn’t a good general-purpose definition of thread-safe for individual types.

If we decide to make it the application’s problem, how would that look? We don’t have a real application here, only a test, so our test code would need to synchronize its calls into our object. Example 16-12 shows a suitably modified version of the test method from Example 16-10. (Note that Example 16-11 adds code that also uses the same object, so you might think we need to make a similar modification there. However, it waits until all the test threads have finished before touching the object, so its reads won’t overlap with their writes, making locking superfluous. Therefore, Example 16-12 is sufficient in this case.)

Example 16-12. Synchronization in the calling code

static void TestMru(MostRecentlyUsed mru)
{
    Random r = new Random(Thread.CurrentThread.ManagedThreadId);
    string[] items = { "One", "Two", "Three", "Four", "Five",
                       "Six", "Seven", "Eight" };
    for (int i = 0; i < Iterations; ++i)
    {
        lock (mru)
        {
            mru.UseItem(items[r.Next(items.Length)]);
        }
    }
}

The only modification here is to wrap the call to the MostRecentlyUsed type’s UseItem method with a lock block. The C# lock syntax generates some code that uses the Monitor class, along with some exception handling. Here’s what the lock block in Example 16-12 is equivalent to:

MostRecentlyUsed referenceToLock = mru);
bool lockAcquired = false;
try
{
    Monitor.Enter(referenceToLock, ref lockAcquired);
    mru.UseItem(items[r.Next(items.Length)]);
}
finally
{
    if (lockAcquired)
    {
        Monitor.Exit(referenceToLock);
    }
}

(This is what C# 4.0 generates. Older versions do something slightly simpler that mishandles an obscure and unusual failure mode. But the basic idea is the same in either case. The generated code copies the mru reference into a separate variable to ensure correct operation even if the code inside the lock block were to change mru.)

The documentation says that Monitor.Enter acquires an exclusive lock on the object passed as the first argument, but what exactly does that mean? Well, the first thread to do this will find that Monitor.Enter returns immediately. Any other threads that try to make the same call on the same object will be made to wait—Monitor.Enter on those other threads will not return until the thread that currently owns the lock releases it by calling Monitor.Exit. Only one thread can hold the lock at any time, so if multiple other threads are waiting for the same object’s lock in Monitor.Enter, the .NET Framework will pick just one as the next owner of the lock, and the other threads will continue to be blocked.

Note

Holding a lock on an object has only one effect: it prevents any other thread from acquiring a lock on that object. It does nothing else. In particular, it does not prevent other threads from using that object. So it would be a mistake to think that acquiring the lock on an object means you have locked the object. That may sound like splitting hairs, but it’s the difference between working and broken code.

Use of monitors is entirely a matter of convention—it is up to you to decide which objects’ locks you use to protect which information. Example 16-12 happens to acquire a lock on the very object whose state is being protected, but in fact, it’s quite common—preferable, even—to create separate objects whose only job is to be the thing on which you acquire a lock. There are a couple of reasons for this. First, you’ll often want multiple pieces of data to fall under the protection of a single lock—perhaps updates to our MostRecentlyUsed object need to be done in conjunction with changes to other state within the application, such as a history-tracking service. When multiple objects are involved, arbitrarily choosing one of those objects to act as the lock target is likely to make your code harder to follow, because it may not be clear to someone reading your code that you’re using that object’s lock to protect multiple objects rather than just the one whose lock you’re acquiring. If you create a special object whose only purpose is locking, this makes it clear to anyone reading your code that she needs to think about what state that lock protects.

The other reason to avoid acquiring a lock on the object you want to synchronize access to is that you can’t always be sure that the object doesn’t acquire a lock on itself—some developers write lock(this) inside instance methods when trying to make thread-safe objects (for whatever definition of thread-safe they have chosen). It is a bad practice to acquire a lock on your this reference, of course, because you can’t necessarily know whether someone using your object will decide to try to acquire a lock on it for his own purposes—internal locking is an implementation detail, but your this reference is public, and you don’t usually want implementation details to be public.

So in short, you shouldn’t try to acquire a lock on any object you are trying to synchronize access to.

Bearing all that in mind, what should we do if we want to try to make our MostRecentlyUsed class more robust in multithreaded environments? First, we need to decide what multithreading scenarios we want to support. Simply declaring that we want the type to be thread-safe is meaningless.

So let’s say that we want to allow multiple threads to call UseItem and GetItems simultaneously without causing exceptions. That’s a pretty weak guarantee—notice we’ve said nothing about what state the object will be in afterward, merely that it won’t actually blow up. Surely it would be better to guarantee to handle the calls in the order in which they were made. Unfortunately, we can’t do this if the locking logic lives entirely inside the class. The OS scheduler might decide to preempt a thread moments after it called UseItem, and before it has had a chance to get to any of our synchronization code.

For example, consider what could happen if Thread A calls UseItem, and then before that call returns, Thread B also calls UseItem, and before either returns, Thread C calls GetItems. It’s fairly easy to think of at least five reasonable outcomes. GetItems might return neither of the items passed in by A and B. It might return both, and there are two ways it could do this—GetItems returns an ordered list and either A or B might come first. Or it could return just one—either the one passed by A or just the one passed by B. If you need coordination across multiple calls like this it’s not going to be possible to do that inside MostRecentlyUsed, because you only have the opportunity to start synchronization work once calls are already underway. This is another reason why synchronization code usually belongs at the application level and not in the individual objects. So about the best we can hope to achieve within this class is to prevent the exceptions from occurring when it’s used from multiple threads. Example 16-13 does this.

Example 16-13. Adding locking to a class

class MostRecentlyUsed
{
    private List<string> items = new List<string>();
    private int maxItems;
    private object lockObject = new object();

    public MostRecentlyUsed(int maximumItemCount)
    {
        maxItems = maximumItemCount;
    }

    public void UseItem(string item)
    {
        lock (lockObject)
        {
            // If the item was already in the list, and isn't the first item,
            // remove it from its current position, since we're about to make
            // it this first item.
            int itemIndex = items.IndexOf(item);
            if (itemIndex > 0)
            {
                items.RemoveAt(itemIndex);
            }

            // If the item's already the first, we don't need to do anything.
            if (itemIndex != 0)
            {
                items.Insert(0, item);

                // Ensure we have no more than the maximum specified
                // number of items.
                if (items.Count > maxItems)
                {
                    items.RemoveAt(items.Count - 1);
                }
            }
        }
    }

    public IEnumerable<string> GetItems()
    {
        lock (lockObject)
        {
            return items.ToArray();
        }
    }
}

Notice that we added a new field, lockObject, which holds a reference to an object whose only job is to be the thing on which we acquire a lock. And we simply acquire this lock inside the methods that work with the list of items. We have to hold the lock for the whole of the UseItem method, because the code looks at the state of the items list right at the start, and then the rest of its operation is guided by what it found. The code simply won’t work if the items list changes halfway through, and so we hold on to the lock for the duration.

In this particular case, holding the lock for the whole method is unlikely to cause problems because this method won’t take long to run. But as a general rule, you want to avoid holding locks for any longer than necessary. The longer you hold a lock, the greater the chances of some other thread wanting to acquire the same lock, and being forced to wait. It’s a particularly bad idea to call code that might make a request over a network and wait for a response while you’re holding a lock (e.g., holding a lock while making a request to a database).

Warning

Be particularly wary of acquiring multiple locks—holding on to a lock while attempting to acquire another is a good recipe for deadlock. Sometimes it’s inevitable, though, in which case you need to devise a strategy to avoid deadlocks. That’s beyond the scope of this book, but if you find yourself in this situation lock leveling is a suitable solution to this problem—searching the Web for “lock leveling for multithreading” would be a good place to start.

As we mentioned several pages ago, the Monitor class isn’t just about locking. It also provides a form of notification.

Notification

Suppose we want to write some code that tests our MostRecentlyUsed class in multithreaded scenarios. Even relatively simple tests pose a challenge: for example, what if we want to verify that after a call to UseItem has returned on one thread, the item it passed in becomes visible as the first item returned if some different thread calls GetItems? We’re not testing concurrent use—we’re just testing sequential operations, where one thing happens on one thread and then something else happens on another. How would we write a test that coordinated these steps across threads? We need one thread to wait until the other has done something. We could just use the Thread class’s Join method again, waiting for the first thread to exit. But what if we don’t want to let it exit? We might want to perform a sequence of operations, with each thread taking it in turns.

Monitors can help with this—as well as protecting shared state, they also provide a way to discover when that state may have changed. The Monitor class provides a Wait method that operates in conjunction with either a method called Pulse or the related PulseAll. A thread that is waiting for something to change can call Wait, which will block until some other thread calls Pulse or PulseAll. You must already hold the lock on the object you pass as an argument to Wait, Pulse, or PulseAll. Calling them without possessing the lock will result in an exception.

Example 16-14 uses this mechanism to provide the ability for one thread to wait for a second thread to do something. The class’s only interesting state is a single bool field, canGo, which is initially false but will be set to true when the second thread does whatever we’re waiting for—that thread will call GoNow to indicate this. Since this field is going to be used from multiple threads, we need synchronization, so WaitForIt also has a lockObject field which refers to an object whose only job is to be the object for which we acquire a lock in order to protect access to canGo.

Warning

You should never attempt to acquire a lock directly on a bool, or on any other value type. You can acquire a lock only on a reference type, so if you attempt to pass a bool to Monitor.Enter, the C# compiler will do what it always does when you pass a value to a method that expects an object: it will create code that generates a box for the bool, as we saw in Chapter 4. You would be acquiring a lock on that box, not on the bool itself. That’s a problem, because you get a new box every time, and so your locking would do nothing useful.

The lock keyword in C# prevents you from trying to acquire a lock on a value—you’ll get a compiler error if you try. But if you call the Monitor class’s methods directly, C# will not prevent you from making this mistake. So this is another good reason to get into the habit of creating an object that is separate from the state it protects, and acquiring locks on that object.

Example 16-14. Coordinating threads with Monitor

class WaitForIt
{
    private bool canGo;
    private object lockObject = new object();

    public void WaitUntilReady()
    {
        lock (lockObject)
        {
            while (!canGo)
            {
                Monitor.Wait(lockObject);
            }
        }
    }

    public void GoNow()
    {
        lock (lockObject)
        {
            canGo = true;
            // Wake me up, before you go go.
            Monitor.PulseAll(lockObject);
        }
    }
}

Both methods in this example acquire the lock before doing anything, because both inspect the canGo field, and we expect these to be called on different threads. WaitUntilReady then sits in a loop until that field is true. Each time it goes around the loop, it calls Monitor.Wait. This has three effects: first, it relinquishes the lock—that’s important, because otherwise, the thread that called GoNow would never get as far as setting the canGo field; second, it makes the thread calling WaitUntilReady block until some other thread calls either Pulse or PulseAll for lockObject; third, when Wait returns, it reacquires the lock.

Note

Why use a loop here? Wouldn’t an if statement followed by a single call to Wait work? In this case it would, but in general it’s surprisingly easy to end up generating spurious notifications. Suppose we modified this example so that as well as offering a GoNow, we had a third method called OhHangOnAMinute which put the canGo field back to false—the class becomes a gate which can open and close. It would be possible that by the time WaitUntilReady is woken up after a call to GoNow, the field had already transitioned back to false because of a call to OhHangOnAMinute.

And while that can’t happen with this simpler example, in general it’s good to get in the habit of checking to see if the condition you were waiting for really holds when you come out of a wait, and be prepared to wait again if it doesn’t.

The GoNow method acquires the lock to make sure it’s safe to modify the canGo field, which it sets to true. Then it calls PulseAll—this tells the .NET Framework to wake up all threads currently waiting on lockObject as soon as we release the lock. (Pulse would just release a single thread, but since our WaitForIt class just has two states—not ready and ready—it needs to release all waiting threads when it becomes ready.) GoNow then returns, releasing the lock as the flow of execution leaves the lock block, which means that any threads waiting inside WaitUntilReady are now no longer blocked waiting for the pulse.

However, if multiple threads are waiting, they won’t all start running at once, because Monitor.Wait reacquires the lock before returning. It relinquishes the lock only temporarily while it waits—it insists that we hold the lock before calling it, and we will be holding the lock again when it returns. Consequently, if PulseAll happened to release multiple threads, they still have to take it in turns as they come out of Wait.

When WaitUntilReady gets to proceed, the loop will check canGo again, and this time it will be true and the loop will finish. The code will then leave the lock block, releasing the lock on lockObject, enabling the next waiting thread (if there are any) to do the same thing—and so all waiting threads will become unblocked one after another.

Note

The monitor’s close integration between locking and notification may seem a little odd—it’s even getting in our way here. This example would work perfectly well if all waiting threads were released simultaneously, instead of having to wait while they acquire the lock in turn. But in fact, the combined locking and notification is critical to most uses of Pulse and Wait. Notifications concern a change to shared state, so it’s vital that code that raises notifications be in possession of the lock, and also that when code discovers a notification, it is in possession of the lock so that it can look at the modified state immediately. Without this, all sorts of subtle races can occur in the gap between notification and lock acquisition or the gap between releasing a lock and waiting for notification.

Example 16-15 shows a simple program that uses the WaitForIt class from Example 16-14. It creates a thread that waits for a while and then calls the GoNow method. The main thread waits for that to happen by calling the WaitUntilReady method after starting the thread.

Example 16-15. Using WaitForIt

class Program
{
    static void Main(string[] args)
    {
        WaitForIt waiter = new WaitForIt();

        ThreadStart twork = delegate
        {
            Console.WriteLine("Thread running...");
            Thread.Sleep(1000);
            Console.WriteLine("Notifying");
            waiter.GoNow();
            Console.WriteLine("Notified");
            Thread.Sleep(1000);
            Console.WriteLine("Thread exiting...");
        };

        Thread t = new Thread(twork);
        Console.WriteLine("Starting new thread");
        t.Start();
        Console.WriteLine("Waiting for thread to get going");
        waiter.WaitUntilReady();
        Console.WriteLine("Wait over");

    }
}

The output shows why this sort of coordination is often necessary:

Starting new thread
Waiting for thread to get going
Thread running...
Notifying
Notified
Wait over
Thread exiting...

Notice that the new thread didn’t start up immediately—the main thread prints its “Waiting for thread to get going” message after calling Start to run the thread, but this message appears before the new thread prints “Thread running...” which is the very first thing that thread does. In other words, just because the Thread class’s Start method has returned, you have no guarantee that the newly created thread has actually done anything yet. Only through the use of coordination mechanisms such as Wait and Pulse can we impose some kind of order across multiple threads.

Warning

Never use Thread.Sleep to try to solve ordering problems in production code—it’s not a dependable or efficient technique. Example 16-15 uses it to make the coordination problems more visible, but while it can be used to amplify or explore problems in examples, you cannot rely on it, because it makes no guarantees—making one thread sleep for a while to give another thread a chance to catch up does not guarantee that the other thread will catch up, particularly on systems that experience heavy load.

The main thread happens not to get to run immediately after the other thread calls GoNow. (Or at least, if it did run, it didn’t run for long enough to print out its “Wait over” message—the other thread got in there first with its “Notified” message.) You might see slightly different results each time you run. Even though we can impose a little bit of order there’s still going to be quite a lot of unpredictability in the exact order of events. As you design multithreaded code, it’s important to be very clear about how much order you are imposing with locking and notifications—in this case, we are guaranteeing that our main thread cannot possibly get to the line that prints out “Wait over” before the second thread has reached the line that calls GoNow, but that’s the only constraint—the progress of the two threads could still be interleaved in numerous different ways. You should never assume that the detailed order of events you observe in practice will necessarily always happen.

While the Monitor class and the C# lock keyword are the most widely used synchronization mechanisms, there are some alternatives.

Other Lock Types

Monitors are very useful and are typically a good first choice for locking, but there are some scenarios in which more specialized alternatives might offer slightly better performance or greater flexibility. Since it’s relatively unusual to use these alternatives, we won’t illustrate them in detail—this section will just describe what’s there and when it might be useful to you.

To understand why the alternatives exist, it’s useful to know something more about the capabilities and limitations of monitors. Monitors are designed for use within a single appdomain—you cannot use them to synchronize operations between processes, nor between appdomains sharing a process. Cross-process coordination is possible in Windows, but you need to use other mechanisms to do that. One reason for this is that monitors try to avoid getting the OS scheduler involved in locking where possible.

If your code hits a lock statement (or calls Monitor.Enter directly) for an object on which no other thread currently has a lock, the .NET Framework is able to handle this situation efficiently. It does not need to make calls into the operating system. Monitors can do this because they’re appdomain-local; to ensure cross-appdomain synchronization typically means getting help from the OS, and once you need to call into the OS, lock acquisition becomes many times more expensive. So when there’s no contention, monitors work really well. But once blocking occurs—either due to contention or because of an explicit call to Wait—the OS scheduler has to get involved because it’s the only thing that can move a thread between runnable and blocked states. This is usually a good thing, because it means the thread can wait efficiently; once a thread becomes blocked it doesn’t consume CPU cycles, and the CPU is free to get on with other useful work, or it may be able to go into its power-efficient idle state when no threads need to run, which is particularly important for laptops running on battery power. However, there are some situations where the cost of getting the OS involved outweighs the benefits. This brings us to the first of the specialized lock types.

SpinLock

SpinLock, which is new in .NET 4, provides similar locking functionality to monitors, but when contention occurs it will just sit in a loop checking and checking and checking again to see if the lock has become free yet—the thread will consume CPU cycles while it does this. The “spin” in SpinLock refers to the code spinning around in this loop waiting for the lock.

That might sound like an awful idea compared to the nice power-efficient wait state that a thread can enter into when blocking on a monitor. And often, it’s exactly as bad an idea as it sounds. The majority of the time you won’t want to use SpinLock. But it offers one possible advantage: because it never falls back onto the OS scheduler, it’s a more lightweight construct than a monitor, and so if you very rarely encounter contention for a particular lock, it might be cheaper. (And if the contention does occur but is extremely short-lived, it’s possible that on a multicore system the cost of spinning very briefly might actually be lower than the cost of getting the scheduler involved. In general, spinlock contention on a single-processor system is bad, although the SpinLock implementation mitigates this a little by yielding^[44] when it fails to acquire the lock on a single-processor machine.)

Note

SpinLock is a value type—a struct. This contributes to its lightweight nature, as it can live inside other objects, rather than needing its own space on the heap. Of course, this also means you need to be careful not to assign a SpinLock into a local variable, because you’d end up making a copy, and locking that copy would not be useful.

Never use SpinLock without comparative performance tests: test all the performance metrics you care about using ordinary monitors and compare that against how the same test suite works when you replace a particular lock with a SpinLock, and consider switching only if the tests demonstrate a clear benefit. If you don’t have a test infrastructure capable of verifying that you meet all of your performance requirements, or if you don’t have quantitative, clearly specified performance requirements, your project is not ready for SpinLock.

Warning

For some reason, a lot of developers just love to be armchair performance tuners. An astounding amount of time and effort is wasted on mailing lists, on web forums, and in internal company meetings on heated debates over which constructs are theoretically faster than others. Sadly, empirical testing of any kind rarely enters into the equation in such discussions.

If someone tries to claim by logical argument alone that a particular technique is faster, be highly suspicious of her claims. Exotic and specialized synchronization primitives such as SpinLock seem to bring out the worst in these people. (And that’s the main reason we’re even mentioning it—sooner or later you’ll have to deal with someone who has become obsessed with finding a way to use SpinLock.) Testing and measurement is the only reliable path to performance.

Reader/writer locks

Earlier in this chapter, we suggested immutability as a way to avoid having to synchronize access to your data—in the .NET Framework, it’s safe for any number of threads to read the same data simultaneously as long as no threads are trying to modify that data at the same time. Sometimes you may find yourself in the frustrating situation of having data that is almost read-only—perhaps a website contains a message of the day which, presumably, changes only once a day, but which you may need to incorporate into each of the hundreds of web pages your busy website serves up every second of the day.

The monitor’s mutually exclusive locking—where only one thread at a time gets to acquire a lock—seems ill-suited to this scenario. You could find that this shared data becomes a bottleneck—all threads are taking it in turns to access it, even though that really needs to happen only once a day. This is where reader/writer locks come in.

The idea behind a reader/writer lock is that when you acquire the lock, you declare whether you need to modify the data or just read it. The lock will allow any number of threads to get a read lock simultaneously, but if a thread tries to get a write lock, it’ll have to wait until all the current readers are done, and then, once a thread has a write lock, other threads attempting to get a lock of any kind will be made to wait until the write lock has been released. In other words, this kind of lock supports any number of simultaneous readers, but writers get exclusive access. (The precise details are, as ever, a little more complex, as a lock of this kind has to avoid the scenario where a never-ending stream of readers causes a writer to wait forever. New readers may be made to wait even when other readers are currently active simply to clear the decks for a waiting writer.)

While this sounds good in theory, the practical benefits can sometimes fall short of what theory might suggest. You shouldn’t even contemplate using a reader/writer lock unless you are seeing performance problems with a simpler monitor-based solution. This kind of lock is more complex, and so it’s quite possible you’ll end up making things slower, particularly in cases where you weren’t seeing much contention. (The example we gave of a website with a message of the day is quite likely to fall into that category. If the message is just a string, how long does it take to get hold of a reference to a string, really? Even with hundreds of requests per second, the chances of contention are probably pretty small.)

It doesn’t help that the first implementation offered by the .NET Framework—the ReaderWriterLock—was, frankly, not very good. Your monitor-based solution had to be in a world of pain before ReaderWriterLock looked preferable. Unfortunately, some of the problems of this lock couldn’t be fixed without risking breaking existing code, so .NET 3.5 introduced a much better replacement, ReaderWriterLockSlim. If you need reader/writer locking, you should use this newer variant unless you absolutely have to support older versions of .NET. Be aware that unlike ReaderWriterLock, ReaderWriterLockSlim implements IDisposable, so you need to arrange for it to be disposed at the right time. This means that if you use it as an implementation detail of a class, your class will probably need to implement IDisposable too.

Mutexes

The Mutex class provides a similar style of locking to monitor. The name is short for mutually exclusive indicating that only one thread can acquire the lock at any time. A mutex is significantly more expensive than a monitor because it always gets the OS scheduler involved. And it does that because mutexes work across process boundaries—you can create a Mutex object with a name, and if another process in the same Windows login session creates another Mutex object with the same name, that object refers to the same underlying Windows synchronization object. So to acquire a Mutex, you don’t merely have to be the only thread in your application in possession of the lock; you will be the only thread on the whole login session in possession of the lock. (You can even make a global mutex that spans all login sessions, meaning that yours will be the only thread on the entire machine in possession of the lock.)

If you create a mutex without a name, it will be local to the process, but it still relies on the OS because a Mutex is essentially a wrapper around a mutex object provided by the Windows kernel.

Other Coordination Mechanisms

Monitors are not just for locking, of course—they offer coordination facilities through pulsing and waiting. And the .NET Framework offers some slightly more specialized types for coordination too.

Events

Events provide a very similar service to the WaitForIt class we built in Example 16-14—an event is effectively a Boolean variable you can wait on. And rather than being a simple one-shot mechanism as in Example 16-14, an event can go back and forth between its two states.

.NET offers ManualResetEvent and AutoResetEvent classes. The latter automatically reverts to its default state when letting waiting threads go, whereas the manual one remains in its so-called signaled state until you explicitly reset it.

Warning

AutoResetEvent can be problematic. There isn’t necessarily any correspondence between the number of times you signal it and the number of times it releases threads. If you signal it twice in a row when no threads are waiting, it doesn’t keep count—it’s just as signaled after the second signal as it was after the first one. This can lead to bugs where you occasionally miss signals, and your code can grind to a halt. Approach with caution.

These types are wrappers around the underlying Windows event synchronization primitive, so as with Mutex, you can use events for cross-process coordination. And of course, this also means that you incur the cost of getting the OS scheduler involved.

.NET 4 introduces an alternative called ManualResetEventSlim. This will use busy-waiting techniques similar to a spinlock for short waits. So just like the Monitor, it gets the scheduler involved only when a wait is necessary. Therefore, unless you really need the extra features available from the nonslim version (e.g., you need cross-process synchronization) the ManualResetEventSlim class is a better choice than ManualResetEvent if you’re using .NET 4 or later.

Countdown

.NET 4 introduces the CountdownEvent class, which provides a handy solution to a fairly common problem: knowing when you’re done. Remember back in Example 16-6, we ran into an issue with the thread pool. We queued up a couple of pieces of work, but we had no way of knowing when they were done. One solution to that would be to use the Task Parallel Library, which we’ll get to shortly, but an alternative would have been to use the CountdownEvent class.

CountdownEvent is very simple. For each piece of work you start, you call AddCount (or if you know how many pieces of work there will be up front, you can pass that number into the constructor). For each piece of work that completes you call Signal. And if you need to wait for all outstanding work to complete (e.g., before your program exits), just call Wait.

BlockingCollection

The System.Collections.Concurrent namespace provides various collection classes that are designed to be used in multithreaded environments. They look a little different from the normal collection classes because they are designed to be used without needing any locking, which means they can’t offer features that rely on things staying consistent from one moment to the next. Numerical indexing is out, for example, because the number of items in the collection may change, as we saw when trying to use List<T> in a multithreaded fashion in Example 16-11. So these are not thread-safe versions of normal collection classes—they are collections whose APIs are designed to support multithreaded use without the caller needing to use locks.

BlockingCollection is not just a multithreaded collection; it also offers associated coordination. It provides a way for threads to sit and wait for items to become available in the collection. Its Take method will block if the collection is empty. Once data becomes available, Take will return one item. Any number of threads may be waiting inside Take at any time, and other threads are free to call Add. If you Add enough items that all the threads waiting to Take are satisfied, and then you keep calling Add, that’s when items start to get added to the collection. And if the collection is nonempty, calls to Take will return immediately.

This allows you to have one or more threads dedicated to processing work items generated by other threads. The BlockingCollection acts as a kind of buffer—if you generate items faster than you process them, they will queue up in the BlockingCollection, and if the processing threads catch up, they will block efficiently until more items come along.

You could use this in a WPF application that needs to do slow work in the background—the UI thread could add work into a blocking collection, and then one or more worker threads could take items from the collection and process them. This is not hugely different from using the thread pool, but it gives you the opportunity to limit the number of worker threads—if you had just a single thread that performs background work, you might be able to get away with much simpler synchronization code, because all your background work is always done by the same thread.

We’ve looked at how to create threads explicitly and at the tools available for ensuring that our programs function correctly in a multithreaded world. Next we’re going to look at asynchronous programming models, where we don’t explicitly create new threads. We will need the locking and synchronization techniques we just explored because we are still working in a concurrent world; it’s just a slightly different programming style.

Asynchronous Programming

Some things are intrinsically slow. Reading all of the audio data off a CD, downloading a large file from a server at the end of a low-bandwidth connection on the opposite side of the world, or playing a sound—all of these processes have constraints that mean they’ll take a long time to complete, maybe seconds, minutes, or even hours. How should these sorts of operations look to the programmer?

One simple answer is that they don’t have to look different than faster operations. Our code consists of a sequence of statements—one thing after another—and some statements take longer than others. This has the useful property of being easy to understand. For example, if our code calls the WebClient class’s DownloadString method, our program doesn’t move on to the next step until the download is complete, and so we can know not just what our code does, but also the order in which it does it.

This style of API is sometimes described as synchronous—the time at which the API returns is determined by the time at which the operation finishes; execution progresses through the code in sync with the work being done. These are also sometimes known as blocking APIs, because they block the calling thread from further progress until work is complete.

Blocking APIs are problematic for user interfaces because the blocked thread can’t do anything else while slow work is in progress. Thread affinity means that code which responds to user input has to run on the UI thread, so if you’re keeping that thread busy, the UI will become unresponsive. It’s really annoying to use programs that stop responding to user input when they’re working—these applications seem to freeze anytime something takes too long, making them very frustrating to use. Failing to respond to user input within 100 ms is enough to disrupt the user’s concentration. (And it gets worse if your program’s user interface uses animation—the occasional glitch of just 15 ms is enough to make a smooth animation turn into something disappointingly choppy.)

Threads offer one solution to this: if you do all your potentially slow work on threads that aren’t responsible for handling user input, your application can remain responsive. However, this can sometimes seem like an overcomplicated solution—in a lot of cases, slow operations don’t work synchronously under the covers. Take fundamental operations such as reading and writing data from and to devices such as network cards or disks, for example. The kernel-mode device drivers that manage disk and network I/O are instructed by the operating system to start doing some work, and the OS expects the driver to configure the hardware to perform the necessary work and then return control to the operating system almost immediately—on the inside, Windows is built around the assumption that most slow work proceeds asynchronously, that there’s no need for code to progress strictly in sync with the work.

This asynchronous model is not limited to the internals of Windows—there are asynchronous public APIs. These typically return very quickly, long before the work in question is complete, and you then use either a notification mechanism or polling to discover when the work is finished. The exact details vary from one API to another, but these basic principles are universal. Many synchronous APIs really are just some code that starts an asynchronous operation and then makes the thread sleep until the operation completes.

An asynchronous API sounds like a pretty good fit for what we need to build responsive interactive applications.^[45] So it seems somewhat ludicrous to create multiple threads in order to use synchronous APIs without losing responsiveness, when those synchronous APIs are just wrappers on top of intrinsically asynchronous underpinnings. Rather than creating new threads, we may as well just use asynchronous APIs directly where they are available, cutting out the middle man.

.NET defines two common patterns for asynchronous operations. There’s a low-level pattern which is powerful and corresponds efficiently to how Windows does things under the covers. And then there’s a slightly higher-level pattern which is less flexible but considerably simpler to use in GUI code.

The Asynchronous Programming Model

The Asynchronous Programming Model (APM) is a pattern that many asynchronous APIs in the .NET Framework conform to. It defines common mechanisms for discovering when work is complete, for collecting the results of completed work, and for reporting errors that occurred during the asynchronous operation.

APIs that use the APM offer pairs of methods, starting with Begin and End. For example, the Socket class in the System.Net.Sockets namespace offers numerous instances of this pattern: BeginAccept and EndAccept, BeginSend and EndSend, BeginConnect and EndConnect, and so on.

The exact signature of the Begin method depends on what it does. For example, a socket’s BeginConnect needs the address to which you’d like to connect, whereas BeginReceive needs to know where you’d like to put the data and how much you’re ready to receive. But the APM requires all Begin methods to have the same final two parameters: the method must take an AsyncCallback delegate and an object. And it also requires the method to return an implementation of the IAsyncResult interface. Here’s an example from the Dns class in System.Net:

public static IAsyncResult BeginGetHostEntry(
    string hostNameOrAddress,
    AsyncCallback requestCallback,
    object stateObject
)

Callers may pass a null AsyncCallback. But if they pass a non-null reference, the type implementing the APM is required to invoke the callback once the operation is complete. The AsyncCallback delegate signature requires the callback method to accept an IAsyncResult argument—the APM implementation will pass in the same IAsyncResult to this completion callback as it returns from the Begin method. This object represents an asynchronous operation in progress—many classes can have multiple operations in progress simultaneously, and the IAsyncResult distinguishes between them.

Example 16-16 shows one way to use this pattern. It calls the asynchronous BeginGetHostEntry method provided by the Dns class. This looks up the IP address for a computer, so it takes a string—the name of the computer to find. And then it takes the two standard final APM arguments—a delegate and an object. We can pass anything we like as the object—the function we call doesn’t actually use it, it just hands it back to us later. We could pass null because our example doesn’t need the argument, but we’re passing a number just to demonstrate where it comes out. The reason the APM offers this argument is so that if you have multiple simultaneous asynchronous operations in progress at once, you have a convenient way to associate information with each operation. (This mattered much more in older versions of C#, which didn’t offer anonymous methods or lambdas—back then this argument was the easiest way to pass data into the callback.)

Example 16-16. Using the Asynchronous Programming Model

class Program
{
    static void Main(string[] args)
    {
        Dns.BeginGetHostEntry("oreilly.com", OnGetHostEntryComplete, 42);

        Console.ReadKey();
    }

    static void OnGetHostEntryComplete(IAsyncResult iar)
    {
        IPHostEntry result = Dns.EndGetHostEntry(iar);
        Console.WriteLine(result.AddressList[0]);
        Console.WriteLine(iar.AsyncState);
    }
}

The Main method waits until a key is pressed—much like with work items in the thread pool, having active asynchronous requests will not keep the process alive, so the program would exit before finishing its work without that ReadKey. (A more robust approach for a real program that needed to wait for work to complete would be to use the CountdownEvent described earlier.)

The Dns class will call the OnGetHostEntryComplete method once it has finished its lookup. Notice that the first thing we do is call the EndGetHostEntry method—the other half of the APM. The End method always takes the IAsyncResult object corresponding to the call—recall that this identifies the call in progress, so this is how EndGetHostEntry knows which particular lookup operation we want to get the results for.

Note

The APM says nothing about which thread your callback will be called on. In practice, it’s often a thread pool thread, but not always. Some individual implementations might make guarantees about what sort of thread you’ll be called on, but most don’t. And since you don’t usually know what thread the callback occurred on, you will need to take the same precautions you would when writing multithreaded code where you explicitly create new threads. For example, in a WPF or Windows Forms application, you’d need to use the SynchronizationContext class or an equivalent mechanism to get back to a UI thread if you wanted to make updates to the UI when an asynchronous operation completes.

The End method in the APM returns any data that comes out of the operation. In this case, there’s a single return value of IPHostEntry, but some implementations may return more by having out or ref arguments. Example 16-16 then prints the results, and finally prints the AsyncState property of the IAsyncResult, which will be 42—this is where the value we passed as the final argument to BeginGetHostEntry pops out.

This is not the only way to use the Asynchronous Programming Model—you are allowed to pass null as the delegate argument. You have three other options, all revolving around the IAsyncResult object returned by the Begin call. You can poll the IsCompleted property to test for completion. You can call the End method at any time—if the work is not finished this will block until it completes.^[46] Or you can use the AsyncWaitHandle property—this returns an object that is a wrapper around a Win32 synchronization handle that will become signaled when the work is complete. (That last one is rarely used, and has some complications regarding ownership and lifetime of the handle, which are described in the MSDN documentation. We mention this technique only out of a pedantic sense of duty to completeness.)

Note

You are required to call the End method at some point, no matter how you choose to wait for completion. Even if you don’t care about the outcome of the operation you must still call the End method. If you don’t, the operation might leak resources.

Asynchronous operations can throw exceptions. If the exception is the result of bad input, such as a null reference where an object is required, the Begin method will throw an exception. But it’s possible that something failed while the operation was in progress—perhaps we lost network connectivity partway through some work. In this case, the End method will throw an exception.

The Asynchronous Programming Model is widely used in the .NET Framework class library, and while it is an efficient and flexible way to support asynchronous operations, it’s slightly awkward to use in user interfaces. The completion callback typically happens on some random thread, so you can’t update the UI in that callback. And the support for multiple simultaneous operations, possible because each operation is represented by a distinct IAsyncResult object, may be useful in server environments, but it’s often just an unnecessary complication for client-side code. So there’s an alternative pattern better suited to the UI.

The Event-Based Asynchronous Pattern

Some classes offer an alternative pattern for asynchronous programming. You start an operation by calling a method whose name typically ends in Async; for example, the WebClient class’s DownloadDataAsync method. And unlike the APM, you do not pass a delegate to the method. Completion is indicated through an event, such as the DownloadDataCompleted event. Classes that implement this pattern are required to use the SynchronizationContext class (or the related AsyncOperationManager) to ensure that the event is raised in the same context in which the operation was started. So in a user interface, this means that completion events are raised on the UI thread.

This is, in effect, a single-threaded asynchronous model. You have the responsiveness benefits of asynchronous handling of slow operations, with fewer complications than multithreaded code. So in scenarios where this pattern is an option, it’s usually the best choice, as it is far simpler than the alternatives. It’s not always available, because some classes offer only the APM. (And some don’t offer any kind of asynchronous API, in which case you’d need to use one of the other multithreading mechanisms in this chapter to maintain a responsive UI.)

Warning

Single-threaded asynchronous code is more complex than sequential code, of course, so there’s still scope for trouble. For example, you need to be careful that you don’t attempt to set multiple asynchronous operations in flight simultaneously that might conflict. Also, components that implement this pattern call you back on the right thread only if you use them from the right thread in the first place—if you use a mixture of this pattern and other multithreading mechanisms, be aware that operations you kick off from worker threads will not complete on the UI thread.

There are two optional features of the event-based asynchronous model. Some classes also offer progress change notification events, such as the WebClient class’s DownloadProgressChanged event. (Such events are also raised on the original thread.) And there may be cancellation support. For example, WebClient offers a CancelAsync method.

Ad Hoc Asynchrony

There’s no fundamental need for code to use either the APM or the event-based asynchronous pattern. These are just conventions. You will occasionally come across code that uses its own unusual solution for asynchronous operation. This can happen when the design of the code in question is constrained by external influences—for example, the System.Threading namespace defines an Overlapped class that provides a managed representation of a Win32 asynchronous mechanism. Win32 does not have any direct equivalent to either of the .NET asynchronous patterns, and just tends to use function pointers for callbacks. .NET’s Overlapped class mimics this by accepting a delegate as an argument to a method. Conceptually, this isn’t very different from the APM, it just happens not to conform exactly to the pattern.

The standard asynchronous patterns are useful, but they are somewhat low-level. If you need to coordinate multiple operations, they leave you with a lot of work to do, particularly when it comes to robust error handling or cancellation. The Task Parallel Library provides a more comprehensive scheme for working with multiple concurrent operations.

The Task Parallel Library

.NET 4 introduces the Task Parallel Library (TPL), a set of classes in the System.Threading.Tasks namespace that help coordinate concurrent work. In some respects, the TPL superficially resembles the thread pool, in that you submit small work items (or tasks) and the TPL will take care of working out how many threads to use at once in order to run your work. But the TPL provides various services not available through direct use of the thread pool, especially in the areas of error handling, cancellation, and managing relationships between tasks.

You can associate tasks with one another—for example, tasks can have a parent-child relationship, which provides a way to wait until all of the tasks related to some higher-level operation are complete. You can also arrange for the completion of one task to kick off another related task.

Error handling gets tricky with asynchronous and concurrent code—what do you do if you’ve built an operation out of 20 related concurrent tasks, and just one of them fails while the rest are either in progress, already complete, or not yet started? The TPL provides a system for bringing work to an orderly halt, and collecting in a single place all of the errors that occurred.

The mechanisms required to halt work in the event of an error are also useful if you want to be able to stop work in progress for some other reason, such as when the user clicks a Cancel button.

We’ll start by looking at the most important concept in the TPL, which is, unsurprisingly, the task.

Tasks

A task is some work your program needs to do. It is represented by an instance of the Task class, defined in the System.Threading.Tasks namespace. This does not define exactly how the work will be done. The work for a task might be a method to be executed, but a task could also involve asynchronous work that executes without needing to tie up a thread—the TPL has support for creating task objects that work with APM implementations, for instance. Example 16-17 shows how to create new tasks that execute code.

Example 16-17. Executing code with tasks

using System;
using System.Threading.Tasks;

namespace TplExamples
{
    class Program
    {
        static void Main(string[] args)
        {
            Task.Factory.StartNew(Go, "One");
            Task.Factory.StartNew(Go, "Two");

            Console.ReadKey();
        }

        static void Go(object name)
        {
            for (int i = 0; i < 100; ++i)
            {
                Console.WriteLine("{0}: {1}", name, i);
            }
        }
    }
}

The Task class provides a static Factory property that returns a TaskFactory object, which can be used to create new tasks. The TPL defines the TaskFactory abstraction so that it’s possible to plug in different task creation strategies. The default factory returned by Task.Factory creates new tasks that execute code on the thread pool, but it’s possible to create factories that do something else. For example, you could make a task factory that creates tasks that will run on a UI thread.

A factory’s StartNew method creates a code-based task. You pass it a delegate—it’ll accept either a method with no arguments or a method that takes a single object as an argument. If you want to pass more arguments, you can use the same lambda-based trick we saw in Example 16-4. Example 16-18 uses this to pass two arguments to Go, while using the overload of StartNew that takes a zero-argument method. (The empty () tells C# to build a zero-argument lambda, which becomes the method StartNew invokes.)

Example 16-18. Passing more arguments with lambdas

static void Main(string[] args)
{
    Task.Factory.StartNew(() => Go("One", 100));
    Task.Factory.StartNew(() => Go("Two", 500));

    Console.ReadKey();
}

static void Go(string name, int iterations)
{
    for (int i = 0; i < iterations; ++i)
    {
        Console.WriteLine("{0}: {1}", name, i);
    }
}

These last two examples look pretty similar to the thread pool examples from earlier, and they suffer from the same problem: they don’t know when the work is complete, so we’ve used the dubious solution of waiting for the user to press a key so that the program doesn’t exit until the work is done. Fortunately, tasks provide a much better solution to this: we can wait until they are finished. Task provides a Wait method that blocks until the task is complete. This is an instance method, so we’d call it once for each task. There’s also a static WaitAll method that takes an array of Task objects and blocks until they are all complete, illustrated in Example 16-19. (This method uses the params modifier on its one argument, so we can just pass each task as though it were a separate argument. The C# compiler will take the two tasks Example 16-19 passes to WaitAll and wrap them in an array for us.)

Example 16-19. Task.WaitAll

static void Main(string[] args)
{
    Task t1 = Task.Factory.StartNew(() => Go("One", 100));
    Task t2 = Task.Factory.StartNew(() => Go("Two", 500));

    Task.WaitAll(t1, t2);
}

Alternatively, we could create a parent task that contains both of these tasks as children.

Parent-child relationships

If you write a code-based task that creates new tasks from within an existing task, you can make those new tasks children of the task in progress. Example 16-20 creates the same two tasks as the previous examples, but does so inside another task, passing in the AttachedToParent member of the TaskCreateOptions enumeration to establish the parent-child relationship.

Example 16-20. Task with children

static void Main(string[] args)
{
    Task t = Task.Factory.StartNew(() =>
    {
        Task.Factory.StartNew(() => Go("One", 100),
                              TaskCreationOptions.AttachedToParent);
        Task.Factory.StartNew(() => Go("Two", 500),
                              TaskCreationOptions.AttachedToParent);
    });

    t.Wait();
}

Notice that this example calls Wait only on the parent task. For a task to be considered complete, not only must it have finished running, but so must all of its children. (And that means that if the children have children of their own, those all have to complete too.) So there’s no need to list all the tasks and pass them to WaitAll if there’s a single top-level task that all the rest descend from.

Fine-grained concurrency

Although code-based tasks are superficially similar to thread pool work items, the TPL is designed to let you use much smaller tasks than would work efficiently when using the thread pool directly. The TPL encourages fine-grained concurrency—the idea is that you provide it with a large number of small work items, which gives it plenty of freedom to work out how to allocate that work across logical processors. This is sometimes described as overexpression of concurrency. The theory is that as newer computers come out with more and more logical processors, code that overexpresses its concurrency will be able to take advantage of the higher logical processor count.

The TPL uses the CLR thread pool internally, so it might seem surprising that the TPL is able to handle small work items more efficiently, but the TPL provides access to some features added to the thread pool in .NET 4, which you can’t use with the ThreadPool class. The ThreadPool class typically starts work in the order you queued it up, so it’s a FIFO (first in, first out) queue. (This is absolutely not guaranteed by the documentation, but the fact that the ThreadPool has behaved this way for years means that changing this behavior would doubtless break lots of code.) But when you set up work as Task objects the thread pool works differently. Each logical processor gets its own separate queue, and typically processes tasks in its queue in LIFO (last in, first out) order. This turns out to be far more efficient in a lot of scenarios, particularly when the work items are small. This ordering is not strict, by the way; if one logical processor manages to empty its work queue while others still have plenty to do, the idle processor may steal work from another processors, and will do so from the back end of its queue. (If you’re wondering about the rationale behind how the thread pool orders tasks, see the sidebar below.)

LIFO Queues and Work Stealing

Three of the thread pool’s features—per-logical-processor queues, LIFO ordering, and stealing from the end of the queue—share a single goal: working efficiently with CPU caches.

When possible, you want a task to be executed on the same logical processor that generated the work because that logical processor’s cache probably already contains a lot of the information relating to the task. Handling the task on a different logical processor would mean shuffling data out of the originating logical processor and into the one running the task. That’s why each logical processor has its own queue, and new tasks are allocated to the queue of the logical processor that creates them.

The rationale behind LIFO ordering is that the most recently created tasks are the ones most likely to have associated data still in the cache, so the average throughput will be better if we handle those first.

One reason work stealing between CPUs happens from the back end of the queue is that when stealing work from another logical processor you want to pick the item that is least likely to still have data sitting in the other logical processor’s cache, to minimize the amount of data that may need to be moved. So in that case, the oldest item is the best bet. Another benefit of this is that it can reduce contention—queues can be constructed in such a way that different CPUs can access opposite ends of the queue simultaneously.

The examples we’ve seen so far simply perform work and return no results. But a task can produce a result.

Tasks with results

The Task<TResult> class derives from Task and adds a Result property which, once the task is complete, contains the result produced by the task. Task<TResult> represents a concept known in some concurrent programming literature as a future—it represents some work that will produce a result at some point.

The TaskFactory.StartNew method we’ve already used can create either kind of task—it has overloads that accept methods that return a result. (So you can pass a Func<TResult> or Func<object, TResult>, instead of the Action or Action<object> passed in the previous examples.) These overloads return a Task<TResult>. (Alternatively, you can call StartNew on the Task<TResult>.Factory static property.)

You can start a Task<TResult> and then call Wait to wait for the result, or you can read the Result property, which will call Wait for you if the result is not yet available. But blocking until the task is complete may not be especially useful—it’s just a very roundabout way of invoking the code synchronously. In fact, you might sometimes want to do this—you might create multiple child tasks and then wait for all of them to complete, and you’d be able to take advantage of the TPL’s common exception-handling framework to manage any errors. However, it’s often useful to be able to provide some sort of callback method to be invoked once the task completes, rather than blocking. You can do this with a continuation.

Continuations

A continuation is a task that gets invoked when another tasks completes.^[47] The Task class provides a ContinueWith method that lets you provide the code for that continuation task. It requires a delegate that takes as its single argument the task that just completed. ContinueWith offers overloads that allow the delegate to return a value (in which case the continuation task will be another Task<TResult>), or not to return a value (in which case the continuation task will just be a Task). ContinueWith returns the Task object that represents the continuation. So you can string these things together:

static void Main(string[] args)
{
    Task t = Task.Factory.StartNew(() => Go("One", 100))
                     .ContinueWith(t1 => Go("Two", 500))
                     .ContinueWith(t2 => Go("Three", 200));
    t.Wait();
}

This will execute the three tasks one after another. Notice that the t variable here refers to the third task—the final continuation. So t.Wait() will wait until all the tasks are complete—it doesn’t need to wait for the first two because the final task can’t even start until the others are finished; waiting for the last task implicitly means waiting for all three here.

Continuations are slightly more interesting when the initial task produces a result—the continuation can then do something with the output. For example, you might have a task that fetches some data from a server and then have a continuation that puts the result into the user interface. Of course, we need to be on the correct thread to update the UI, but the TPL can help us with this.

Schedulers

The TaskScheduler class is responsible for working out when and how to execute tasks. If you don’t specify a scheduler, you’ll end up with the default one, which uses the thread pool. But you can provide other schedulers when creating tasks—both StartNew and ContinueWith offer overloads that accept a scheduler. The TPL offers a scheduler that uses the SynchronizationContext, which can run tasks on the UI thread. Example 16-21 shows how to use this in an event handler in a WPF application.

Example 16-21. Continuation on a UI thread

void OnButtonClick(object sender, RoutedEventArgs e)
{
    TaskScheduler uiScheduler =
        TaskScheduler.FromCurrentSynchronizationContext();
    Task<string>.Factory.StartNew(GetData)
                    .ContinueWith((task) => UpdateUi(task.Result),
                                  uiScheduler);
}

string GetData()
{
    WebClient w = new WebClient();
    return w.DownloadString("http://oreilly.com/");
}

void UpdateUi(string info)
{
    myTextBox.Text = info;
}

This example creates a task that returns a string, using the default scheduler. This task will invoke the GetData function on a thread pool thread. But it also sets up a continuation using a TaskScheduler that was obtained by calling FromCurrentSynchronizationContext. This grabs the SynchronizationContext class’s Current property and returns a scheduler that uses that context to run all tasks. Since the continuation specifies that it wants to use this scheduler, it will run the UpdateUi method on the UI thread.

The upshot is that GetData runs on a thread pool thread, and then its return value is passed into UpdateUi on the UI thread.

We could use a similar trick to work with APM implementations, because task factories provide methods for creating APM-based tasks.

Tasks and the Asynchronous Programming Model

TaskFactory and TaskFactory<TResult> provide various overloads of a FromAsync method. You can pass this the Begin and End methods from an APM implementation, along with the arguments you’d like to pass, and it will return a Task or Task<TResult> that executes the asynchronous operation, instead of one that invokes a delegate. Example 16-22 uses this to wrap the asynchronous methods we used from the Dns class in earlier examples in a task.

Example 16-22. Creating a task from an APM implementation

TaskScheduler uiScheduler = TaskScheduler.FromCurrentSynchronizationContext();
Task<IPHostEntry>.Factory.FromAsync(
                    Dns.BeginGetHostEntry, Dns.EndGetHostEntry,
                    "oreilly.com", null)
    .ContinueWith((task) => UpdateUi(task.Result.AddressList[0].ToString()),
                  uiScheduler);

FromAsync offers overloads for versions of the APM that take zero, one, two, or three arguments, which covers the vast majority of APM implementations. As well as passing the Begin and End methods, we also pass the arguments, and the additional object argument that all APM Begin methods accept. (For the minority of APM implementations that either require more arguments or have out or ref parameters, there’s an overload of FromAsync that accepts an IAsyncResult instead. This requires slightly more code, but enables you to wrap any APM implementation as a task.)

We’ve seen the main ways to create tasks, and to set up associations between them either with parent-child relationships or through continuations. But what happens if you want to stop some work after you’ve started it? Neither the thread pool nor the APM supports cancellation, but the TPL does.

Cancellation

Cancellation of asynchronous operations is surprisingly tricky. There are lots of awkward race conditions to contend with. The operation you’re trying to cancel might already have finished by the time you try to cancel it. Or if it hasn’t it might have gotten beyond the point where it is able to stop, in which case cancellation is doomed to fail. Or work might have failed, or be about to fail when you cancel it. And even when cancellation is possible, it might take awhile to do. Handling and testing every possible combination is difficult enough when you have just one operation, but if you have multiple related tasks, it gets a whole lot harder.

Fortunately, .NET 4 introduces a new cancellation model that provides a well thought out and thoroughly tested solution to the common cancellation problems. This cancellation model is not limited to the TPL—you are free to use it on its own, and it also crops up in other parts of the .NET Framework. (The data parallelism classes we’ll be looking at later can use it, for example.)

If you want to be able to cancel an operation, you must pass it a CancellationToken. A cancellation token allows the operation to discover whether the operation has been canceled—it provides an IsCancellationRequested property—and it’s also possible to pass a delegate to its Register method in order to be called back if cancellation happens.

CancellationToken only provides facilities for discovering that cancellation has been requested. It does not provide the ability to initiate cancellation. That is provided by a separate class called CancellationTokenSource. The reason for splitting the discovery and control of cancellation across two types is that it would otherwise be impossible to provide a task with cancellation notifications without also granting that task the capability of initiating cancellation. CancellationTokenSource is a factory of cancellation tokens—you ask it for a token and then pass that into the operation you want to be able to cancel. Example 16-23 is similar to Example 16-21, but it passes a cancellation token to StartNew, and then uses the source to cancel the operation if the user clicks a Cancel button.

Example 16-23. Ineffectual cancellation

private CancellationTokenSource cancelSource;

void OnButtonClick(object sender, RoutedEventArgs e)
{
    cancelSource = new CancellationTokenSource();

    TaskScheduler uiScheduler =
        TaskScheduler.FromCurrentSynchronizationContext();
    Task<string>.Factory.StartNew(GetData, cancelSource.Token)
                    .ContinueWith((task) => UpdateUi(task.Result),
                                  uiScheduler);
}

void OnCancelClick(object sender, RoutedEventArgs e)
{
    if (cancelSource != null)
    {
        cancelSource.Cancel();
    }
}


string GetData()
{
    WebClient w = new WebClient();
    return w.DownloadString("http://oreilly.com/");
}

void UpdateUi(string info)
{
    cancelSource = null;
    myTextBox.Text = info;
}

In fact, cancellation isn’t very effective in this example because this particular task consists of code that makes a single blocking method call. Cancellation will usually do nothing here in practice—the only situation in which it would have an effect is if the user managed to click Cancel before the task had even begun to execute. This illustrates an important issue: cancellation is never forced—it uses a cooperative approach, because the only alternative is killing the thread executing the work. And while that would be possible, forcibly terminating threads tends to leave the process in an uncertain state—it’s usually impossible to know whether the thread you just zapped happened to be in the middle of modifying some shared state. Since this leaves your program’s integrity in doubt, the only thing you can safely do next is kill the whole program, which is a bit drastic. So the cancellation model requires cooperation on the part of the task in question. The only situation in which cancellation would have any effect in this particular example is if the user managed to click the Cancel button before the task had even begun.

If you have divided your work into numerous relatively short tasks, cancellation is more useful—if you cancel tasks that have been queued up but not yet started, they will never run at all. Tasks already in progress will continue to run, but if all your tasks are short, you won’t have to wait long. If you have long-running tasks, however, you will need to be able to detect cancellation and act on it if you want to handle cancellation swiftly. This means you will have to arrange for the code you run as part of the tasks to have access to the cancellation token, and they must test the IsCancellationRequested property from time to time.

Cancellation isn’t the only reason a task or set of tasks might stop before finishing—things might be brought to a halt by exceptions.

Error Handling

A task can complete in one of three ways: it can run to completion, it can be canceled, or it can fault. The Task object’s TaskStatus property reflects this through RanToCompletion, Canceled, and Faulted values, respectively, and if the task enters the Faulted state, its IsFaulted property also becomes true. A code-based task will enter the Faulted state if its method throws an exception. You can retrieve the exception information from the task’s Exception property. This returns an AggregateException, which contains a list of exceptions in its InnerExceptions property. It’s a list because certain task usage patterns can end up hitting multiple exceptions; for example, you might have multiple failing child tasks.

If you don’t check the IsFaulted property and instead just attempt to proceed, either by calling Wait or by attempting to fetch the Result of a Task<TResult>, the AggregateException will be thrown back into your code.

It’s possible to write code that never looks for the exception. Example 16-17 starts two tasks, and since it ignores the Task objects returned by StartNew, it clearly never does anything more with the tasks. If they were children of another task that wouldn’t matter—if you ignore exceptions in child tasks they end up causing the parent task to fault. But these are not child tasks, so if exceptions occur during their execution, the program won’t notice. However, the TPL tries hard to make sure you don’t ignore such exceptions—it uses a feature of the garbage collector called finalization to discover when a Task that faulted is about to be collected without your program ever having noticed the exception. When it detects this, it throws the AggregateException, which will cause your program to crash unless you’ve configured your process to deal with unhandled exceptions. (The .NET Framework runs all finalizers on a dedicated thread, and it’s this thread that the TPL throws the exception on.) The TaskScheduler class offers an UnobservedTaskException event that lets you customize the way these unhandled exceptions are dealt with.

The upshot is that you should write error handling for any nonchild tasks that could throw. One way to do this is to provide a continuation specifically for error handling. The ContinueWith method takes an optional argument whose type is the TaskContinuationOptions enumeration, which has an OnlyOnFaulted value—you could use this to build a continuation that will run only when an unanticipated exception occurs. (Of course, unanticipated exceptions are always bad news because, by definition, you weren’t expecting them and therefore have no idea what state your program is in. So you probably need to terminate the program, which is what would have happened anyway if you hadn’t written any error handling. However, you do get to write errors to your logs, and perhaps make an emergency attempt to write out unsaved data somewhere in the hope of recovering it when the program restarts.) But in general, it’s preferable to handle errors by putting normal try...catch blocks inside your code so that the exceptions never make it out into the TPL in the first place.

Data Parallelism

The final concurrency feature we’re going to look at is data parallelism. This is where concurrency is driven by having lots of data items, rather than by explicitly creating numerous tasks or threads. It can be a simple approach to parallelism because you don’t have to tell the .NET Framework anything about how you want it to split up the work.

With tasks, the .NET Framework has no idea how many tasks you plan to create when you create the first one, but with data parallelism, it has the opportunity to see more of the problem before deciding how to spread the load across the available logical processors. So in some scenarios, it may be able to make more efficient use of the available resources.

Parallel For and ForEach

The Parallel class provides a couple of methods for performing data-driven parallel execution. Its For and ForEach methods are similar in concept to C# for and foreach loops, but rather than iterating through collections one item at a time, on a system with multiple logical processors available it will process multiple items simultaneously.

Example 16-24 uses Parallel.For. This code calculates pixel values for a fractal known as the Mandelbrot set, a popular parallelism demonstration because each pixel value can be calculated entirely independently of all the others, so the scope for parallel execution is effectively endless (unless machines with more logical processors than pixels become available). And since it’s a relatively expensive computation, the benefits of parallel execution are easy to see. Normally, this sort of code would contain two nested for loops, one to iterate over the rows of pixels and one to iterate over the columns in each row. In this example, the outer loop has been replaced with a Parallel.For. (So this particular code cannot exploit more processors than it calculates lines of pixels—therefore, we don’t quite have scope for per-pixel parallelism, but since you would typically generate an image a few hundred pixels tall, there is still a reasonable amount of scope for concurrency here.)

Example 16-24. Parallel.For

static int[,] CalculateMandelbrotValues(int pixelWidth, int pixelHeight,
    double left, double top, double width, double height, int maxIterations)
{
    int[,] results = new int[pixelWidth, pixelHeight];

    // Non-parallel version of following line would have looked like this:
    // for(int pixelY = 0; pixelY < pixelHeight; ++pixelY)
    Parallel.For(0, pixelHeight, pixelY =>
    {
        double y = top + (pixelY * height) / (double) pixelHeight;
        for (int pixelX = 0; pixelX < pixelWidth; ++pixelX)
        {
            double x = left + (pixelX * width) / (double) pixelWidth;

            // Note: this lives in the System.Numerics namespace in the
            // System.Numerics assembly.
            Complex c = new Complex(x, y);
            Complex z = new Complex();

            int iter;
            for (iter = 1; z.Magnitude < 2 && iter < maxIterations; ++iter)
            {
                z = z * z + c;
            }
            if (iter == maxIterations) { iter = 0; }
            results[pixelX, pixelY] = iter;
        }
    });

    return results;
}

This structure, seen in the preceding code:

Parallel.For(0, pixelHeight, pixelY =>
{
    ...
});

iterates over the same range as this:

for(int pixelY = 0, pixelY  < pixelHeight; ++pixelY)
{
    ...
}

The syntax isn’t identical because Parallel.For is just a method, not a language feature. The first two arguments indicate the range—the start value is inclusive (i.e., it will start from the specified value), but the end value is exclusive (it stops one short of the end value). The final argument to Parallel.For is a delegate that takes the iteration variable as its argument. Example 16-24 uses a lambda, whose minimal syntax introduces the least possible extra clutter over a normal for loop.

Parallel.For will attempt to execute the delegate on multiple logical processors simultaneously, using the thread pool to attempt to make full, efficient use of the available processors. The way it distributes the iterations across logical processors may come as a surprise, though. It doesn’t simply give the first row to the first logical processor, the second row to the second logical processor, and so on. It carves the available rows into chunks, and so the second logical processor will find itself starting several rows ahead of the first. And it may decide to subdivide further depending on the progress your code makes. So you must not rely on the iteration being done in any particular order. It does this chunking to avoid subdividing the work into pieces that are too small to handle efficiently. Ideally, each CPU should be given work in lumps that are large enough to minimize context switching and synchronization overheads, but small enough that each CPU can be kept busy while there’s work to be done. This chunking is one reason why data parallelism can sometimes be more efficient than using tasks directly—the parallelism gets to be exactly as fine-grained as necessary and no more so, minimizing overheads.

Arguably, calling Example 16-24 data parallelism is stretching a point—the “data” here is just the numbers being fed into the calculations. Parallel.For is no more or less data-oriented than a typical for loop with an int loop counter—it just iterates a numeric variable over a particular range in a list. However, you could use exactly the same construct to iterate over a range of data instead of a range of numbers. Alternatively, there’s Parallel.ForEach, which is very similar in use to Parallel.For, except, as you’d expect, it iterates over any IEnumerable<T> like a C# foreach loop, instead of using a range of integers. It reads ahead into the enumeration to perform chunking. (And if you provide it with an IList<T> it will use the list’s indexer to implement a more efficient partitioning strategy.)

There’s another way to perform parallel iteration over enumerable data: PLINQ.

PLINQ: Parallel LINQ

Parallel LINQ (PLINQ) is a LINQ provider that enables any IEnumerable<T> to be processed using normal LINQ query syntax, but in a way that works in parallel. On the face of it, it’s deceptively simple. This:

var pq = from x in someList
         where x.SomeProperty > 42
         select x.Frob(x.Bar);

will use LINQ to Objects, assuming that someList implements IEnumerable<T>. Here’s the PLINQ version:

var pq = from x in someList.AsParallel()
         where x.SomeProperty > 42
         select x.Frob(x.Bar);

The only difference here is the addition of a call to AsParallel, an extension method that the ParallelEnumerable class makes available on all IEnumerable<T> implementations. It’s available to any code that has brought the System.Linq namespace into scope with a suitable using declaration. AsParallel returns a ParallelQuery<T>, which means that the normal LINQ to Objects implementation of the standard LINQ operators no longer applies. All the same operators are available, but now they’re supplied by ParallelEnumerable, which is able to execute certain operators in parallel.

Note

Not all queries will execute in parallel. Some LINQ operators essentially force things to be done in a certain order, so PLINQ will inspect the structure of your query to decide which parts, if any, it can usefully run in parallel.

Iterating over the results with foreach can restrict the extent to which the query can execute in parallel, because foreach asks for items one at a time—upstream parts of the query may still be able to execute concurrently, but the final results will be sequential. If you’d like to execute code for each item and to allow work to proceed in parallel even for this final processing step, PLINQ offers a ForAll operator:

pq.ForAll(x => x.DoSomething());

This will execute the delegate once for each item the query returns, and can do so in parallel—it will use as many logical processors concurrently as possible to evaluate the query and to call the delegate you provide.

This means that all the usual multithreading caveats apply for the code you run from ForAll. In fact, PLINQ can be a little dangerous as it’s not that obvious that your code is going to run on multiple threads—it manages to make parallel code look just a bit too normal. This is not always a problem—LINQ tends to encourage a functional style of programming in its queries, meaning that most of the data involved will be used in a read-only fashion, which makes dealing with threading much simpler. But code executed by ForAll is useful only if it has no side effects, so you need to be careful with whatever you put in there.

Summary

To exploit the potential of multicore CPUs, you’ll need to run code on multiple threads. Threads can also be useful for keeping user interfaces responsive in the face of slow operations, although asynchronous programming techniques can be a better choice than creating threads explicitly. While you can create threads explicitly, the thread pool—used either directly or through the Task Parallel Library—is often preferable because it makes it easier for your code to adapt to the available CPU resources on the target machine. For code that needs to process large collections of data or perform uniform calculations across large ranges of numbers, data parallelism can help parallelize your execution without adding too much complication to your code.

No matter what multithreading mechanisms you use, you are likely to need the synchronization and locking primitives to ensure that your code avoids concurrency hazards such as races. The monitor facility built into every .NET object, and exposed through the Monitor class and C# lock keyword, is usually the best mechanism to use, but some more specialized primitives are available that can work better if you happen to find yourself in one of the scenarios for which they are designed.

^[40]In fact, the CLR creates some utility threads for various purposes, so if you inspect the process’s thread count, you’ll see more than one.

^[41]Always remember that even if you have not created any threads explicitly, that doesn’t mean you’re necessarily writing single-threaded code. Some .NET Framework classes will bring extra threads into play implicitly. For example, the CLR’s garbage collector runs finalizers on a distinct thread.

^[42]More accurately, it’s whichever thread acquires the lock Console.WriteLine uses internally to serialize access to the console.

^[43].NET does have an isolation mechanism: you can divide code into so-called appdomains. But this adds its own complications and is designed for slightly more coarse-grained divisions, and it’s really not well suited to this problem. ASP.NET can use this to isolate multiple web applications sharing a process, but does not use it to isolate individual requests.

^[44]Yielding is when a thread tells the OS scheduler that it wants to give other threads a turn using the CPU now, rather than waiting to be preempted. If there are no other runnable threads, yielding does nothing, and the thread will continue to run.

^[45]Asynchronous APIs tend to be used slightly differently in server-side code in web applications. There, they are most useful for when an application needs to communicate with multiple different external services to handle a single request.

^[46]This isn’t always supported. For example, if you attempt such an early call on an End method for a networking operation on the UI thread in a Silverlight application, you’ll get an exception.

^[47]In case you’ve come across continuations in the sense meant by languages such as Scheme that offer call with current continuation, be aware that this is not the same idea. There’s a tenuous connection in the sense that both represent the ability to continue some work sometime later, but they’re really quite different.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16. Threads and Asynchronous Code

Create new playlist

Sign In

Sign Up

Chapter 16. Threads and Asynchronous Code

Threads

Threads and the OS Scheduler

Note

The Stack

Note

Note

Note

Warning

The Thread Pool

Thread Affinity and Context

Note

Common Thread Misconceptions

Myth: Threads are necessary to get work done

Note

Note

Myth: Multiple logical processors will necessarily make things faster

Myth: Maxing the CPU must mean we’re going really fast

Note

Multithreaded Coding Is Hard

Race conditions

Warning

Deadlocks and livelocks

Warning

Multithreading Survival Strategies

Abstinence

Isolation

Immutability

Synchronization

Synchronization Primitives

Monitor

Note

Note

Warning

Notification

Warning

Note

Note

Warning

Other Lock Types

SpinLock

Note

Warning

Reader/writer locks

Mutexes

Other Coordination Mechanisms

Events

Warning

Countdown

BlockingCollection

Asynchronous Programming

The Asynchronous Programming Model

Note

Note

The Event-Based Asynchronous Pattern

Warning

Ad Hoc Asynchrony

The Task Parallel Library

Tasks

Parent-child relationships

Fine-grained concurrency

Tasks with results

Continuations

Schedulers

Tasks and the Asynchronous Programming Model

Cancellation

Error Handling

Data Parallelism

Parallel For and ForEach

PLINQ: Parallel LINQ

Note

Summary

Table of Contents for
16. Threads and Asynchronous Code