Running multiple threads with the threading module

A computer process is an instance of a running program. Processes are actually heavyweight, so we may prefer threads, which are lighter. In fact, threads are often just subunits of a process. Processes are separated from each other, while threads can share instructions and data.

Operating systems typically assign one thread to each core (if there are more than one), or switch between threads periodically; this is called time slicing. Threads as processes can have different priorities and the operating system has daemon threads running in the background with very low priority.

It's easier to switch between threads than between processes; however, because threads share information, they are more dangerous to use. For instance, if multiple threads are able to increment a counter at the same time, this will make the code nondeterministic and potentially incorrect. One way to minimize risks is to make sure that only one thread can access a shared variable or shared function at a time. This strategy is implemented in Python as the GIL.

How to do it...

  1. The imports are as follows:
    import dautil as dl
    import ch12util
    from functools import partial
    from queue import Queue
    from threading import Thread
    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.stats import skew
    from IPython.display import HTML
    
    STATS = []
  2. Define the following function to resample:
    def resample(arr):
        sample = ch12util.bootstrap(arr)
        STATS.append((sample.mean(), sample.std(), skew(sample)))
  3. Define the following class to bootstrap:
    class Bootstrapper(Thread):
        def __init__(self, queue, data):
            Thread.__init__(self)
            self.queue = queue
            self.data = data
            self.log = dl.log_api.conf_logger(__name__)
    
        def run(self):
            while True:
                index = self.queue.get()
    
                if index % 10 == 0:
                    self.log.debug('Bootstrap {}'.format(
                        index))
    
                resample(self.data)
                self.queue.task_done()
  4. Define the following function to perform serial resampling:
    def serial(arr, n):
        for i in range(n):
            resample(arr)
  5. Define the following function to perform parallel resampling:
    def threaded(arr, n):
        queue = Queue()
    
        for x in range(8):
            worker = Bootstrapper(queue, arr)
            worker.daemon = True
            worker.start()
    
        for i in range(n):
            queue.put(i)
    
        queue.join()
  6. Plot distributions of moments and execution times:
    sp = dl.plotting.Subplotter(2, 2, context)
    temp = dl.data.Weather.load()['TEMP'].dropna().values
    np.random.seed(26)
    threaded_times = ch12util.time_many(partial(threaded, temp))
    serial_times = ch12util.time_many(partial(serial, temp))
    
    ch12util.plot_times(sp.ax, serial_times, threaded_times)
    
    stats_arr = np.array(STATS)
    ch12util.plot_distro(sp.next_ax(), stats_arr.T[0], temp.mean())
    sp.label()
    
    ch12util.plot_distro(sp.next_ax(), stats_arr.T[1], temp.std())
    sp.label()
    
    ch12util.plot_distro(sp.next_ax(), stats_arr.T[2], skew(temp))
    sp.label()
    
    HTML(sp.exit())

Refer to the following screenshot for the end result:

How to do it...

The code is in the running_threads.ipynb file in this book's code bundle.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset