Distributed processing with execnet

The execnet module has a share-nothing model and uses channels for communication. Channels in this context are software abstractions used to send and receive messages between (distributed) computer processes. execnet is most useful for combining heterogeneous computing environments with different Python interpreters and installed software. The environments can have different operating systems and Python implementations (CPython, Jython, PyPy, or others).

In the shared nothing architecture, computing nodes don't share memory or files. The architecture is therefore totally decentralized with completely independent nodes. The obvious advantage is that we are not dependent on any one node.

Getting ready

Install execnet with the following command:

$ pip/conda install execnet 

I tested the code with execnet 1.3.0.

How to do it...

  1. The imports are as follows:
    import dautil as dl
    import ch12util
    from functools import partial
    import matplotlib.pyplot as plt
    import numpy as np
    import execnet
    
    STATS = []
  2. Define the following helper function:
    def run(channel, data=[]):
        while not channel.isclosed():
            index = channel.receive()
    
            if index % 10 == 0:
                print('Bootstrap {}'.format(
                    index))
    
            total = 0
    
            for x in data:
                total += x
    
            channel.send((total - data[index])/(len(data) - 1))
  3. Define the following function to perform serial resampling:
    def serial(arr, n):
        for i in range(n):
            total = 0
    
            for x in arr:
                total += x
    
            STATS.append((total - arr[i])/(len(arr) - 1))
  4. Define the following function to perform parallel resampling:
    def parallel(arr, n):
        gw = execnet.makegateway()
        channel = gw.remote_exec(run, data=arr.tolist())
    
        for i in range(n):
            channel.send(i)
            STATS.append(channel.receive())
    
        gw.exit()
  5. Plot distributions of means and execution times:
    ws = dl.data.Weather.load()['WIND_SPEED'].dropna().values
    np.random.seed(33)
    parallel_times = ch12util.time_many(partial(parallel, ws))
    serial_times = ch12util.time_many(partial(serial, ws))
    
    %matplotlib inline
    dl.options.mimic_seaborn()
    ch12util.plot_times(plt.gca(), serial_times, parallel_times)
    plt.legend(loc='best')
    
    plt.figure()
    STATS = np.array(STATS)
    ch12util.plot_distro(plt.gca(), STATS, ws.mean())
    plt.title('Distribution of the Means')
    plt.legend(loc='best')

Refer to the following screenshot for the end result:

How to do it...

The code is in the distributing_execnet.ipynb file in this book's code bundle.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset