Chapter 13. Multiprocessing – When a Single CPU Core Is Not Enough

In the previous chapter, we discussed factors that influence performance and some methods to increase performance. This chapter can actually be seen as an extension to the list of performance tips. In this chapter, we will discuss the multiprocessing module, a module that makes it very easy to make your code run on multiple CPU cores and even on multiple machines. This is an easy way to work around the Global Interpreter Lock (GIL) that was discussed in the previous chapter.

To summarize, this chapter will cover:

  • Local multiprocessing
  • Remote multiprocessing
  • Data sharing and synchronization between processes

Multithreading versus multiprocessing

Within this book we haven't really covered multithreading yet, but you have probably seen multithreaded code in the past. The big difference between multithreading and multiprocessing is that with multithreading everything is still executed within a single process. That effectively limits your performance to a single CPU core. It actually limits you even further because the code has to deal with the GIL limitations of CPython.

Note

The GIL is the global lock that Python uses for safe memory access. It is discussed in more detail in Chapter 12, Performance – Tracking and Reducing Your Memory and CPU Usage, about performance.

To illustrate that multithreading code doesn't help performance in all cases and can actually be slightly slower than single threaded code, look at this example:

import datetime
import threading


def busy_wait(n):
    while n > 0:
        n -= 1


if __name__ == '__main__':
    n = 10000000
    start = datetime.datetime.now()
    for _ in range(4):
        busy_wait(n)
    end = datetime.datetime.now()
    print('The single threaded loops took: %s' % (end - start))

    start = datetime.datetime.now()
    threads = []
    for _ in range(4):
        thread = threading.Thread(target=busy_wait, args=(n,))
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()

    end = datetime.datetime.now()
    print('The multithreaded loops took: %s' % (end - start))

With Python 3.5, which has the new and improved GIL implementation (introduced in Python 3.2), the performance is quite comparable but there is no improvement:

# python3 test_multithreading.py
The single threaded loops took: 0:00:02.623443
The multithreaded loops took: 0:00:02.597900

With Python 2.7, which still has the old GIL, the performance is a lot better in the single threaded variant:

# python2 test_multithreading.py
The single threaded loops took: 0:00:02.010967
The multithreaded loops took: 0:00:03.924950

From this test we can conclude that Python 2 is faster in some cases while Python 3 is faster in other cases. What you should take from this is that there is no performance reason to choose between Python 2 or Python 3 specifically. Just note that Python 3 is at least as fast as Python 2 in most cases and if that is not the case, it will be fixed soon.

Regardless, for CPU-bound operations, threading does not offer any performance benefit since it executes on a single processor core. For I/O bound operations however, the threading library does offer a clear benefit, but in that case I would recommend trying asyncio instead. The biggest problem with threading is that if one of the threads blocks, the main process blocks.

The multiprocessing library offers an API that is very similar to the threading library but utilizes multiple processes instead of multiple threads. The advantages are that the GIL is no longer an issue and that multiple processor cores and even multiple machines can be used for processing.

To illustrate the performance difference, let's repeat the test while using the multiprocessing module instead of threading:

import datetime
import multiprocessing


def busy_wait(n):
    while n > 0:
        n -= 1


if __name__ == '__main__':
    n = 10000000
    start = datetime.datetime.now()

    processes = []
    for _ in range(4):
        process = multiprocessing.Process(
            target=busy_wait, args=(n,))
        process.start()
        processes.append(process)

    for process in processes:
        process.join()

    end = datetime.datetime.now()
    print('The multiprocessed loops took: %s' % (end - start))

When running it, we see a huge improvement:

# python3 test_multiprocessing.py
The multiprocessed loops took: 0:00:00.671249

Note that this was run on a quad core processor, which is why I chose four processes. The multiprocessing library defaults to multiprocessing.cpu_count() which counts the available CPU cores, but that method fails to take CPU hyper-threading into account. Which means it would return 8 in my case and that is why I hardcoded it to 4 instead.

Note

It's important to note that because the multiprocessing library uses multiple processes, the code needs to be imported from the sub processes. The result is that the multiprocessing library does not work within the Python or IPython shells. As we will see later in this chapter, IPython has its own provisions for multiprocessing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset