A computer process is an instance of a running program. Processes are actually heavyweight, so we may prefer threads, which are lighter. In fact, threads are often just subunits of a process. Processes are separated from each other, while threads can share instructions and data.
Operating systems typically assign one thread to each core (if there are more than one), or switch between threads periodically; this is called time slicing. Threads as processes can have different priorities and the operating system has daemon threads running in the background with very low priority.
It's easier to switch between threads than between processes; however, because threads share information, they are more dangerous to use. For instance, if multiple threads are able to increment a counter at the same time, this will make the code nondeterministic and potentially incorrect. One way to minimize risks is to make sure that only one thread can access a shared variable or shared function at a time. This strategy is implemented in Python as the GIL.
import dautil as dl import ch12util from functools import partial from queue import Queue from threading import Thread import matplotlib.pyplot as plt import numpy as np from scipy.stats import skew from IPython.display import HTML STATS = []
def resample(arr): sample = ch12util.bootstrap(arr) STATS.append((sample.mean(), sample.std(), skew(sample)))
class Bootstrapper(Thread): def __init__(self, queue, data): Thread.__init__(self) self.queue = queue self.data = data self.log = dl.log_api.conf_logger(__name__) def run(self): while True: index = self.queue.get() if index % 10 == 0: self.log.debug('Bootstrap {}'.format( index)) resample(self.data) self.queue.task_done()
def serial(arr, n): for i in range(n): resample(arr)
def threaded(arr, n): queue = Queue() for x in range(8): worker = Bootstrapper(queue, arr) worker.daemon = True worker.start() for i in range(n): queue.put(i) queue.join()
sp = dl.plotting.Subplotter(2, 2, context) temp = dl.data.Weather.load()['TEMP'].dropna().values np.random.seed(26) threaded_times = ch12util.time_many(partial(threaded, temp)) serial_times = ch12util.time_many(partial(serial, temp)) ch12util.plot_times(sp.ax, serial_times, threaded_times) stats_arr = np.array(STATS) ch12util.plot_distro(sp.next_ax(), stats_arr.T[0], temp.mean()) sp.label() ch12util.plot_distro(sp.next_ax(), stats_arr.T[1], temp.std()) sp.label() ch12util.plot_distro(sp.next_ax(), stats_arr.T[2], skew(temp)) sp.label() HTML(sp.exit())
Refer to the following screenshot for the end result:
The code is in the running_threads.ipynb
file in this book's code bundle.