Remote processes

So far, we have only executed our scripts on multiple local processors, but we can actually expand this further. Using the multiprocessing library, it's actually very easy to execute jobs on remote servers, but the documentation is currently still a bit cryptic. There are actually a few ways of executing processes in a distributed way, but the most obvious one isn't the easiest one. The multiprocessing.connection module has both the Client and Listener classes, which facilitate secure communication between the clients and servers in a simple way. Communication is not the same as process management and queue management however, those features requires some extra effort. The multiprocessing library is still a bit bare in this regard, but it's most certainly possible given a few different processes.

Distributed processing using multiprocessing

First of all, we will start with a module with containing a few constants which should be shared between all clients and the server, so the secret password and the hostname of the server are available to all. In addition to that, we will add our prime calculation functions, which we will be using later. The imports in the following modules will expect this file to be stored as, but feel free to call it anything you like as long as you modify the imports and references:

host = 'localhost'
port = 12345
password = b'some secret password'

def primes(n):
    for i, prime in enumerate(prime_generator()):
        if i == n:
            return prime

def prime_generator():
    n = 2
    primes = set()
    while True:
        for p in primes:
            if n % p == 0:
            yield n
        n += 1

Now it's time to create the actual server which links the functions and the job queue:

import constants
import multiprocessing
from multiprocessing import managers

queue = multiprocessing.Queue()
manager = managers.BaseManager(address=('', constants.port),

manager.register('queue', callable=lambda: queue)
manager.register('primes', callable=constants.primes)

server = manager.get_server()

After creating the server, we need to have a script that sends the jobs, which will actually be a regular client. It's simple enough really and a regular client can also function as a processor, but to keep things sensible we will use them as separate scripts. The following script will add 0 to 999 to the queue for processing:

from multiprocessing import managers
import functions

manager = managers.BaseManager(
    address=(, functions.port),

queue = manager.queue()
for i in range(1000):

Lastly, we need to create a client to actually process the queue:

from multiprocessing import managers
import functions

manager = managers.BaseManager(
    address=(, functions.port),

queue = manager.queue()
while not queue.empty():

From the preceding code you can see how we pass along functions; the manager allows registering of functions and classes which can be called from the clients as well. With that we pass along a queue from the multiprocessing class which is safe for both multithreading and multiprocessing. Now we need to start the processes themselves. First the server which keeps on running:

# python3

After that, run the producer to generate the prime generation requests:

# python3

And now we can run multiple clients on multiple machines to get the first 1000 primes. Since these clients now print the first 1000 primes, the output is a bit too lengthy to show here, but you can simply run this in parallel on multiple machines to generate your output:

# python3

Instead of printing, you can obviously use queues or pipes to send the output to a different process if you'd like. As you can see verify though, it's still a bit of work to process things in parallel and it requires some code synchronization to work. There are a few alternatives available, such as ØMQ, Celery, and IPyparallel. Which of these is the best and most suitable depends on your use case. If you are simply looking for processing tasks on multiple CPUs, then multiprocessing and IPyparallel are probably your best choices. If you are looking for background processing and/or easy offloading to multiple machines, then ØMQ and Celery are better choices.

Distributed processing using IPyparallel

The IPyparallel module (previously, IPython Parallel) is a module that makes it really easy to process code on multiple computers at the same time. The library supports more features than you are likely to need, but the basic usage is important to know just in case you need to do heavy calculations which can benefit from multiple computers. First let's start with installing the latest IPyparallel package and all the IPython components:

pip install -U ipython[all] ipyparallel


Especially on Windows, it might be easier to install IPython using Anaconda instead, as it includes binaries for many science, math, engineering, and data analysis packages. To get a consistent installation, the Anaconda installer is also available for OS X and Linux systems.

Secondly, we need a cluster configuration. Technically this is optional, but since we are going to create a distributed IPython cluster, it is much more convenient to configure everything using a specific profile:

# ipython profile create --parallel --profile=mastering_python
[ProfileCreate] Generating default config file: '~/.ipython/profile_mastering_python/'
[ProfileCreate] Generating default config file: '~/.ipython/profile_mastering_python/'
[ProfileCreate] Generating default config file: '~/.ipython/profile_mastering_python/'
[ProfileCreate] Generating default config file: '~/.ipython/profile_mastering_python/'
[ProfileCreate] Generating default config file: '~/.ipython/profile_mastering_python/'

These configuration files contain a huge amount of options so I recommend searching for a specific section instead of walking through them. A quick listing gave me about 2500 lines of configuration in total for these five files. The filenames already provide hint about the purpose of the configuration files, but we'll explain them in a little more detail since they are still a tad confusing.

This is the generic IPython configuration file; you can customize pretty much everything about your IPython shell here. It defines how your shell should look, which modules should be loaded by default, whether or not to load a GUI, and quite a bit more. For the purpose of this chapter not all that important but it's definitely worth a look if you're going to use IPython more often. One of the things you can configure here is the automatic loading of extensions, such as line_profiler and memory_profiler discussed in the previous chapter. For example:

c.InteractiveShellApp.extensions = [

This file configures your IPython kernel and allows you to overwrite/extend To understand its purpose, it's important to know what an IPython kernel is. The kernel, in this context, is the program that runs and introspects the code. By default this is IPyKernel, which is a regular Python interpreter, but there are also other options such as IRuby or IJavascript to run Ruby or JavaScript respectively.

One of the more useful options is the possibility to configure the listening port(s) and IP addresses for the kernel. By default the ports are all set to use a random number, but it is important to note that if someone else has access to the same machine while you are running your kernel, they will be able to connect to your IPython kernel which can be dangerous on shared machines.

ipcontroller is the master process of your IPython cluster. It controls the engines and the distribution of tasks, and takes care of tasks such as logging.

The most important parameter in terms of performance is the TaskScheduler setting. By default, the c.TaskScheduler.scheme_name setting is set to use the Python LRU scheduler, but depending on your workload, others such as leastload and weighted might be better. And if you have to process so many tasks on such a large cluster that the scheduler becomes the bottleneck, there is also the plainrandom scheduler that works surprisingly well if all your machines have similar specs and the tasks have similar durations.

For the purpose of our test we will set the IP of the controller to *, which means that all IP addresses will be accepted and that every network connection will be accepted. If you are in an unsafe environment/network and/or don't have any firewalls which allow you to selectively enable certain IP addresses, then this method is not recommended! In such cases, I recommend launching through more secure options, such as SSHEngineSetLauncher or WindowsHPCEngineSetLauncher instead.

But, assuming your network is indeed safe, set the factory IP to all the local addresses:

c.HubFactory.client_ip = '*'
c.RegistrationFactory.ip = '*'

Now start the controller:

# ipcontroller --profile=mastering_python
[IPControllerApp] Hub listening on tcp://*:58412 for registration.
[IPControllerApp] Hub listening on tcp:// for registration.
[IPControllerApp] Hub using DB backend: 'NoDB'
[IPControllerApp] hub::created hub
[IPControllerApp] writing connection info to ~/.ipython/profile_mastering_python/security/ipcontroller-client.json
[IPControllerApp] writing connection info to ~/.ipython/profile_mastering_python/security/ipcontroller-engine.json
[IPControllerApp] task::using Python leastload Task scheduler
[IPControllerApp] Heartmonitor started
[IPControllerApp] Creating pid file: .ipython/profile_mastering_python/pid/
[scheduler] Scheduler started [leastload]
[IPControllerApp] client::client b'x00x80x00Axa7' requested 'connection_request'
[IPControllerApp] client::client [b'x00x80x00Axa7'] connected

Pay attention to the files that were written to the security directory of the profile directory. They have the authentication information which is used by ipengine to find ipcontroller. It contains the ports, encryption keys, and IP address.

ipengine is the actual worker process. These processes run the actual calculations, so to speed up the processing you will need these on as many machines as you have available. You probably won't need to change this file, but it can be useful if you want to configure centralized logging or need to change the working directory. Generally, you don't want to start the ipengine process manually since you will most likely want to launch multiple processes per computer. That's where our next command comes in, the ipcluster command.

The ipcluster command is actually just an easy shorthand to start a combination of ipcontroller and ipengine at the same time. For a simple local processing cluster, I recommend using this, but when starting a distributed cluster, it can be useful to have the control that the separate use of ipcontroller and ipengine offers. In most cases the command offers enough options, so you might have no need for the separate commands.

The most important configuration option is c.IPClusterEngines.engine_launcher_class, as this controls the communication method between the engines and the controller. Along with that, it is also the most important component for secure communication between the processes. By default it's set to ipyparallel.apps.launcher.LocalControllerLauncher which is designed for local processes but ipyparallel.apps.launcher.SSHEngineSetLauncher is also an option if you want to use SSH to communicate with the clients. Or ipyparallel.apps.launcher.WindowsHPCEngineSetLauncher for Windows HPC.

Before we can create the cluster on all machines, we need to transfer the configuration files. Your options are to transfer all the files or to simply transfer the files in your IPython profile's security directory.

Now it's time to start the cluster, since we already started the ipcontroller separately, we only need to start the engines. On the local machine we simply need to start it, but the other machines don't have the configuration yet. One option is copying the entire IPython profile directory, but the only file that really needs copying is security/ipcontroller-engine.json. After creating the profile using the profile creation command that is. So unless you are going to copy the entire IPython profile directory, you need to execute the profile creation command again:

# ipython profile create --parallel --profile=mastering_python

After that, simply copy the ipcontroller-engine.json file and you're done. Now we can start the actual engines:

# ipcluster engines --profile=mastering_python -n 4
[IPClusterEngines] IPython cluster: started
[IPClusterEngines] Starting engines with [daemon=False]
[IPClusterEngines] Starting 4 Engines with LocalEngineSetLauncher

Note that the 4 here was chosen for a quad-core processor, but any number would do. The default will use the amount of logical processor cores, but depending on the workload it might be better to match the amount of physical processor cores instead.

Now we can run some parallel code from our IPython shell. To demonstrate the performance difference, we will use a simple sum of all the numbers from 0 to 10,000,000. Not an extremely heavy task, but when performed 10 times in succession, a regular Python interpreter takes a while:

In [1]: %timeit for _ in range(10): sum(range(10000000))
1 loops, best of 3: 2.27 s per loop

This time however, to illustrate the difference, we will run it a 100 times to demonstrate how fast a distributed cluster is. Note that this is with only three machines cluster, but it's still quite a bit faster:

In [1]: import ipyparallel

In [2]: client = ipyparallel.Client(profile='mastering_python')

In [3]: view = client.load_balanced_view()

In [4]: %timeit _: sum(range(10000000)), range(100)).wait()
1 loop, best of 3: 909 ms per loop

More fun however is the definition of parallel functions in IPyParallel. With just a simple decorator, a function is marked as parallel:

In [1]: import ipyparallel

In [2]: client = ipyparallel.Client(profile='mastering_python')

In [3]: view = client.load_balanced_view()

In [4]: @view.parallel()
   ...: def loop():
   ...:     return sum(range(10000000))

In [5]:
Out[5]: <AsyncMapResult: loop>

The IPyParallel library offers many more useful features, but that is outside the scope of this book. Even though IPyParallel is a separate entity from the rest of Jupyter/IPython, it does integrate well, which makes combining them easy enough.

One of the most convenient ways of using IPyParallel is through the Jupyter/IPython Notebooks. To demonstrate, we first have to make sure to enable the parallel processing in the Jupyter Notebook since IPython notebooks execute single threaded by default:

ipcluster nbextension enable

After that we can start the notebook and see what it's all about:

# jupyter notebook
Unrecognized JSON config file version, assuming version 1
Loading IPython parallel extension
Serving notebooks from local directory: ./
0 active kernels
The Jupyter Notebook is running at: http://localhost:8888/
Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

With the Jupyter Notebook you can create scripts in your web browser which can easily be shared with others later. It is really very useful for sharing scripts and debugging your code, especially since web pages (as opposed to command line environments) can display images easily. This helps a lot with graphing data. Here's a screenshot of our Notebook:
