Dealing with race conditions

Now that we have the tools to start threads and run them, let's simulate a race condition such as the one we discussed earlier:

# race.py
import threading
from time import sleep
from random import random

counter = 0
randsleep = lambda: sleep(0.1 * random())

def incr(n):
global counter
for count in range(n):
current = counter
randsleep()
counter = current + 1
randsleep()

n = 5
t1 = threading.Thread(target=incr, args=(n, ))
t2 = threading.Thread(target=incr, args=(n, ))
t1.start()
t2.start()
t1.join()
t2.join()
print(f'Counter: {counter}')

In this example, we define the incr function, which gets a number n in input, and loops over n. In each cycle, it reads the value of the counter, sleeps for a random amount of time (between 0 and 0.1 seconds) by calling randsleep, a tiny Lambda function I wrote to improve readability, then increases the value of the counter by 1.

I chose to use global in order to have read/write access to counter, but it could be anything really, so feel free to experiment with that yourself.

The whole script basically starts two threads, each of which runs the same function, and gets n = 5Notice how we need to join on both threads at the end to make sure that when we print the final value of the counter (last line), both threads are done doing their work.

When we print the final value, we would expect the counter to be 10, right? Two threads, five loops each, that makes 10. However, we almost never get 10 if we run this script. I ran it myself many times, and it seems to always hit somewhere between 5 and 7. The reason this happens is that there is a race condition in this code, and those random sleeps I added are there to exacerbate it. If you removed them, there would still be a race condition, because the counter is increased in a non-atomic way (which means an operation that can be broken down in multiple steps, and therefore paused in between). However, the likelihood of that race condition showing is really low, so adding the random sleep helps.

Let's analyze the code. t1 gets the current value of the counter, say, 3. t1 then sleeps for a moment. If the scheduler switches context in that moment, pausing t1 and starting t2, t2 will read the same value, 3. Whatever happens afterward, we know that both threads will update the counter to be 4, which will be incorrect as after two readings it should have gone up to 5. Adding the second random sleep call, after the update, helps the scheduler switch more frequently, and makes it easier to show the race condition. Try commenting out one of them, and see how the result changes (it will do so, dramatically).

Now that we have identified the issue, let's fix it by using a lock. The code is basically the same, so I'll show you only what changes:

# race_with_lock.py
incr_lock = threading.Lock()

def incr(n):
global counter
for count in range(n):
with incr_lock:
current = counter
randsleep()
counter = current + 1
randsleep()

This time we have created a lock, from the threading.Lock class. We could call its acquire and release methods manually, or we can be Pythonic and use it within a context manager, which looks much nicer, and does the whole acquire/release business for us. Notice I left the random sleeps in the code. However, every time you run it, it will now return 10.

The difference is this: when the first thread acquires that lock, it doesn't matter that when it's sleeping, a moment later, the scheduler switches the context. The second thread will try to acquire the lock, and Python will answer with a resounding no. So, the second thread will just sit and wait until that lock is released. As soon as the scheduler switches back to the first thread, and the lock is released, then the other thread will have a chance (if it gets there first, which is not necessarily guaranteed), to acquire the lock and update the counter. Try adding some prints into that logic to see whether the threads alternate perfectly or not. My guess is that they won't, at least not every time. Remember the threading.current_thread function, to be able to see which thread is actually printing the information.

Python offers several data structures in the threading module: Lock, RLock, Condition, Semaphore, Event, Timer, and Barrier. I won't be able to show you all of them, because unfortunately I don't have the room to explain all the use cases, but reading the documentation of the threading module (https://docs.python.org/3.7/library/threading.html) will be a good place to start understanding them.

Let's now see an example about thread's local data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset