Just-in-time compiling with Numba

The Numba software performs just-in-time compiling using special function decorators. The compilation produces native machine code automatically. The generated code can run on CPUs and GPUs. The main use case for Numba is math-heavy code that uses NumPy arrays.

We can compile the code with the @numba.jit decorator with optional function signature (for instance, int32(int32)). The types correspond with similar NumPy types. Numba operates in the nopython and object modes. The nopython mode is faster but more restricted. We can also release the Global Interpreter Lock (GIL) with the nogil option. You can cache the compilation results by requesting a file cache with the cache argument.

The @vectorize decorator converts functions with scalar arguments into NumPy ufuncs. Vectorization gives extra advantages, such as automatic broadcasting, and can be used on a single core, multiple cores in parallel, or a GPU.

Getting ready

Install Numba with the following command:

$ pip/conda install numba

I tested the code with Numba 0.22.1.

How to do it...

  1. The imports are as follows:
    from numba import vectorize
    from numba import jit
    import numpy as np
  2. Define the following function to use the @vectorize decorator:
    @vectorize
    def vectorize_version(x, y, z):
        return x ** 2 + y ** 2 + z ** 2
  3. Define the following function to use the @jit decorator:
    @jit(nopython=True)
    def jit_version(x, y, z):
        return x ** 2 + y ** 2 + z ** 2
  4. Define some random arrays as follows:
    np.random.seed(36)
    x = np.random.random(1000)
    y = np.random.random(1000)
    z = np.random.random(1000)
  5. Measure the time it takes to sum the squares of the arrays:
    %timeit x ** 2 + y ** 2 + z ** 2
    %timeit vectorize_version(x, y, z)
    %timeit jit_version(x, y, z)
    jit_version.inspect_types()

Refer to the following screenshot for the end result:

How to do it...

The code is in the compiling_numba.ipynb file in this book's code bundle.

How it works

The best time measured is 1.82 microseconds on my machine, which is significantly faster than the measured time for normal Python code. At the end of the screenshot, we see the result of the compilation, with the last part omitted because it is too long and difficult to read. We get warnings, which are most likely caused by CPU caching. I left them on purpose, but you may be able to get rid of them using much larger arrays that don't fit in the cache.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset