17.3. Threads and Python

17.3.1. Global Interpreter Lock

Execution by Python code is controlled by the Python Virtual Machine (a.k.a. the interpreter main loop), and Python was designed in such a way that only one thread of control may be executing in this main loop, similar to how multiple processes in a system share a single CPU. Many programs may be in memory, but only one is live on the CPU at any given moment. Likewise, although multiple threads may be “running” within the Python interpreter, only one thread is being executed by the interpreter at any given time.

Access to the Python Virtual Machine is controlled by a global interpreter lock (GIL). This lock is what ensures that exactly one thread is running. The Python Virtual Machine executes in the following manner in an MT environment:

  • Set the GIL,

  • Switch in a thread to run,

  • Execute for a specified number of bytecode instructions,

  • Put the thread back to sleep (switch out thread),

  • Unlock the GIL, and,

  • Do it all over again (rinse, lather, repeat).

When a call is made to external code, i.e., any C/C++ extension built-in function, the GIL will be locked until it has completed (since there are no Python bytecodes to count as the interval). Extension programmers do have the ability to unlock the GIL however, so you being the Python developer shouldn't have to worry about your Python code locking up in those situations.

As an example, for any Python I/O-oriented routines (which invoke built-in operating system C code), the GIL is released before the I/O call is made, allowing other threads to run while the I/O is being performed. Code which doesn't have much I/O will tend to keep the processor (and GIL) to the full interval a thread is allowed before it yields. In other words, I/O-bound Python programs stand a much better chance of being able to take advantage of a multithreaded environment than CPU-bound code.

Those of you interested in the source code, the interpreter main loop, and the GIL can take a look at eval_code2() routine in the Python/ceval.c file, which is the Python Virtual Machine.

17.3.2. Exiting Threads

When a thread completes execution of the function they were created for, they exit. Threads may also quit by calling an exit function such as thread.exit(), or any of the standard ways of exiting a Python process, i.e., sys.exit() or raising the SystemExit exception.

There are a variety of ways of managing thread termination. In most systems, when the main thread exits, all other threads die without cleanup, but for some systems, they live on. Check your operating system threaded programming documentation regarding their behavior in such occasions.

Main threads should always be good managers, though, and perform the task of knowing what needs to be executed by individual threads, what data or arguments each of the spawned threads requires, when they complete execution, and what results they provide. In so doing, those main threads can collate the individual results into a final conclusion.

17.3.3. Accessing Threads From Python

Python supports multithreaded programming, depending on the operating system that it is running on. It is supported on most versions of Unix, including Solaris and Linux, and Windows. Threads are not currently available on the Macintosh platform. Python uses POSIX-compliant threads, or “pthreads,” as they commonly known.

By default, threads are not enabled when building Python from source, but are available for Windows platforms automatically from the installer. To tell whether threads are installed, simply attempt to import the thread module from the interactive interpreter. No errors occur when threads are available:

>>> import thread
>>>

If your Python interpreter was not compiled with threads enabled, the module import fails:

>>> import thread
Traceback (innermost last):
  File "<stdin>", line 1, in ?
ImportError: No module named thread

In such cases, you may have to recompile your Python interpreter to get access to threads. This usually involves invoking the configure script with the “--with-thread” option. Check the README file for your distribution for specific instructions on how to compile Python with threads for your system.

Due to the brevity of this chapter, we will give you only a quick introduction to threads and MT programming in Python. We refer you to the official documentation to get the full coverage of all the aspects of the threading support which Python has to offer. Also, we recommended accessing any general operating system textbook for more details on processes, interprocess communication, multi-threaded programming, and thread/process synchronization. (Some of these texts are listed in the appendix.)

17.3.4. Life Without Threads

For our first set of examples, we are going to use the time.sleep() function to show how threads work. time.sleep() takes a floating point argument and “sleeps” for the given number of seconds, meaning that execution is temporarily halted for the amount of time specified.

Let us create two “time loops,” one which sleeps for 4 seconds and one that sleeps for 2 seconds, loop0() and loop1(), respectively. (We use the names “loop0” and “loop1” as a hint that we will eventually have a sequence of loops.) If we were to execute loop0() and loop1() sequentially in a one-process or single-threaded program, as onethr.py does in Example 17.1, the total execution time would be at least 6 seconds. There may or may not be a 1-second gap between the starting of loop0() and loop1(), and other execution overhead which may cause the overall time to be bumped to 7 seconds.

Listing 17.1. Loops Executed by a Single Thread (onethr.py)

Executes two loops consecutively in a single-threaded program. One loop must complete before the other can begin. The total elapsed time is the sum of times taken by each loop.

1  #!/usr/bin/env python
2
3  from time import sleep, time, ctime
4
5  def loop0():
6      print 'start loop 0 at:', ctime(time())
7      sleep(4)
8      print 'loop 0 done at:', ctime(time())
9
10 def loop1():
11     print 'start loop 1 at:', ctime(time())
12     sleep(2)
13     print 'loop 1 done at:', ctime(time())
14
15 def main():
16     print 'starting…'
17     loop0()
18     loop1()
19     print 'all DONE at:', ctime(time())
20
21 if __name__ == '__main__':
22     main()

We can verify this by executing onethr.py, which gives the following output:

% onethr.py
starting…
start loop 0 at: Sun Aug 13 05:03:34 2000
loop 0 done at: Sun Aug 13 05:03:38 2000
start loop 1 at: Sun Aug 13 05:03:38 2000
loop 1 done at: Sun Aug 13 05:03:40 2000
all DONE at: Sun Aug 13 05:03:40 2000

Now, pretend that rather than sleeping, loop0() and loop1() were separate functions that performed individual and independent computations, all working to arrive at a common solution. Wouldn't it be useful to have them run in parallel to cut down on the overall running time? That is the premise behind MT that we will now introduce you to.

17.3.5. Python Threading Modules

Python provides several modules to support MT programming, including the thread, threading, and Queue modules. The thread and threading modules allow the programmer to create and manage threads. The thread module provides the basic thread and locking support, while threading provides high-level full-featured thread management. The Queue module allows the user to create a queue data structure which can be shared across multiple threads. We will take a look at these modules individually, present a good number of examples, and a couple of intermediate-sized applications.

CORE TIP: Avoid use of thread module

We recommend avoiding the thread module for many reasons. The first is that the high-level threading module is more contemporary, not to mention the fact that thread support in the threading module is much improved and the use of attributes of the thread module may conflict with using the threading module. Another reason is that the lower-level thread module has a few synchronization primitives (actually only one) while threading has many.

However, in the interest of learning Python and threading in general, we do present some code which uses the thread module. These pieces of code should be used for learning purposes only and will give you a much better insight as to why you would want to avoid using the thread module. These examples also show how our applications and thread programming improve as we migrate to using more appropriate tools such as those available in the threading and Queue modules.

Use of the thread module is recommended only for experts desiring lower-level thread access. Those of you new to threads should look at the code samples to see how we can overlay threads onto our time loop application and to gain a better understanding as to how these first examples evolve to the main code samples of this chapter. Your first multithreaded application should utilize threading and perhaps other high-level thread modules, if applicable.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset