Using a message queue to transmit objects

The multiprocessing module uses both the serialization and transmission of objects. We can use queues and pipes to serialize objects that are then transmitted to other processes. There are numerous external projects to provide sophisticated message queue processing. We'll focus on the multiprocessing queue because it's built in to Python and works nicely.

For high-performance applications, a faster message queue may be necessary. It may also be necessary to use a faster serialization technique than pickling. For this chapter, we'll focus only on the Python design issues. The multiprocessing module relies on pickle to encode objects. See Chapter 10, Serializing and Saving – JSON, YAML, Pickle, CSV, and XML, for more information. We can't provide a restricted unpickler easily; therefore, this module offers us some relatively simple security measures put into place to prevent any unpickle problems.

There is one important design consideration when using multiprocessing: it's generally best to avoid having multiple processes (or multiple threads) attempting to update shared objects. The synchronization and locking issues are so profound (and easy to get wrong) that the standard joke goes as follows:

When confronted with a problem, the programmer thinks, "I'll use multiple threads."

problems Now. two programmer the has

It's very easy for locking and buffering to make a mess of multithreaded processing.

Using process-level synchronization via RESTful web services or multiprocessing can prevent synchronization issues because there are no shared objects. The essential design principle is to look at the processing as a pipeline of discrete steps. Each processing step will have an input queue and an output queue; the step will fetch an object, perform some processing, and write the object.

The multiprocessing philosophy matches the POSIX concept of a shell pipeline, written as process1 | process2 | process3. This kind of shell pipeline involves three concurrent processes interconnected with pipes. The important difference is that we don't need to use STDIN, STDOUT, or an explicit serialization of the objects. We can trust the multiprocessing module to handle the operating system (OS)-level infrastructure.

The POSIX shell pipelines are limited in that each pipe has a single producer and a single consumer. The Python multiprocessing module allows us to create message queues that include multiple consumers. This allows us to have a pipeline that fans out from one source process to multiple sink processes. A queue can also have multiple consumers that allow us to build a pipeline where the results of multiple source processes can be combined by a single sink process.

To maximize throughput on a given computer system, we need to have enough work pending so that no processor or core is ever left with nothing useful to do. When any given OS process is waiting for a resource, at least one other process should be ready to run.

When looking at our simulations, for example, we need to gather statistically significant simulation data by exercising a player strategy or betting strategy (or both) a number of times. The idea is to create a queue of processing requests so that our computer's processors (and cores) are fully engaged in processing our simulations.

Each processing request can be a Python object. The multiprocessing module will pickle that object so that it is transmitted via the queue to another process.

We'll revisit this in Chapter 16, The Logging and Warning Modules, when we look at how the logging module can use multiprocessing queues to provide a single, centralized log for separate producer processes. In these examples, the objects transmitted from process to process will be the logging.LogRecord instances.

In the next section, we'll learn how to define processes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset