482 31.Producer‐ConsumerQueues
This particular class has a few limitations. For example, there is no way to
signal to all the consumers to stop waiting for the producer(s) to insert more data.
One way to handle this is by adding a manual reset event that is initially non-
signaled and calling the
WaitForMultipleObjects() function on both the stop
event and semaphore, with the
bWaitAll set to FALSE. After a SetEvent() call,
all consumers would then wake up, and the return value of the
WaitForMulti-
pleObjects()
function indicates whether it was the event that signaled it.
Finally, the semaphore object maintains an exact physical count of the ob-
jects, but this is only internal to the OS. To modify the consumer to get the num-
ber of items currently in the queue, we can add a member variable that is
incremented or decremented whenever the queue is locked. However, every time
the count is read, it would only be a logical snapshot at that point in time since
another thread could insert or remove an item from the queue. This might be use-
ful for gauging a running count to see if the producer is producing data too quick-
ly (or the consumer is consuming it too quickly), but the only way to ensure the
count would remain accurate would be to lock the whole queue before checking
the count.
31.4ASecondApproach:Lock‐FreeAlgorithms
The goal of every multithreaded algorithm is to maximize parallelism. However,
once a resource is requested for exclusive access, all threads must stall while
waiting for that resource to become available. This causes contention. Also,
many OS locks (such as mutexes) require access to the kernel mode and can take
hundreds of clock cycles or more to execute the lock. To improve performance,
some multithreaded algorithms can be designed not to use any OS locks. These
are referred to as lock-free algorithms.
Lock-free algorithms have been around for decades, but these techniques are
seeing more and more exposure in user-mode code as multicore architectures
become commonplace and developers seek to improve parallelism. For example,
Valve’s Source Engine internally uses lock-free algorithms for many of the mul-
tithreaded components in their game engine [Leonard 2007].
The design of new lock-free algorithms is notoriously difficult and should be
based on published, peer-reviewed techniques. Most lock-free algorithms are en-
gineered around modern x86 and x64 architectures, which requires compiler and
processor support around a specific multiprocessor memory model. Despite the
complexity, many algorithms that have high thread contention or low latency
requirements can achieve measurable performance increases when switching to a
lock-free algorithm. Lock-free algorithms, when implemented correctly, general-