The Numba software performs just-in-time compiling using special function decorators. The compilation produces native machine code automatically. The generated code can run on CPUs and GPUs. The main use case for Numba is math-heavy code that uses NumPy arrays.
We can compile the code with the @numba.jit
decorator with optional function signature (for instance, int32(int32)
). The types correspond with similar NumPy types. Numba operates in the nopython
and object
modes. The nopython
mode is faster but more restricted. We can also release the Global Interpreter Lock (GIL) with the nogil
option. You can cache the compilation results by requesting a file cache with the cache
argument.
The @vectorize
decorator converts functions with scalar arguments into NumPy ufuncs. Vectorization gives extra advantages, such as automatic broadcasting, and can be used on a single core, multiple cores in parallel, or a GPU.
Install Numba with the following command:
$ pip/conda install numba
I tested the code with Numba 0.22.1.
from numba import vectorize from numba import jit import numpy as np
@vectorize
decorator:@vectorize def vectorize_version(x, y, z): return x ** 2 + y ** 2 + z ** 2
@jit
decorator:@jit(nopython=True) def jit_version(x, y, z): return x ** 2 + y ** 2 + z ** 2
np.random.seed(36) x = np.random.random(1000) y = np.random.random(1000) z = np.random.random(1000)
%timeit x ** 2 + y ** 2 + z ** 2 %timeit vectorize_version(x, y, z) %timeit jit_version(x, y, z) jit_version.inspect_types()
Refer to the following screenshot for the end result:
The code is in the compiling_numba.ipynb
file in this book's code bundle.
The best time measured is 1.82 microseconds on my machine, which is significantly faster than the measured time for normal Python code. At the end of the screenshot, we see the result of the compilation, with the last part omitted because it is too long and difficult to read. We get warnings, which are most likely caused by CPU caching. I left them on purpose, but you may be able to get rid of them using much larger arrays that don't fit in the cache.