Writing an algorithm for a reduction operation using PyCUDA can be quite complex. For this purpose, Numba provides the @reduce decorator for converting simple binary operations into reduction kernels.
Reduction operations reduce a set of values to a single value. A typical example of a reduction operation is to calculate the sum of all the elements of an array. As an example, consider the following array of elements: 1, 2, 3, 4, 5, 6, 7, 8.
The sequential algorithm operates in the way shown in the diagram, that is, adding the elements of the array one after the other:
A parallel algorithm operates according to the following schema:
It is clear that the latter has the advantage of shorter execution time.
By using Numba and the @reduce decorator, we can write an algorithm, in a few lines of code, for the parallel sum on an array of integers ranging from 1 to 10,000:
import numpy from numba import cuda @cuda.reduce def sum_reduce(a, b): return a + b A = (numpy.arange(10000, dtype=numpy.int64)) + 1
print(A) got = sum_reduce(A)
print(got)
The previous example can be performed by typing the following command:
(base) C:>python reduceNumba.py
The following result is provided:
vector to reduce = [ 1 2 3 ... 9998 9999 10000]
result = 50005000