In the following lines, after the relevant import, we define the input vectors:
vector_dimension = 100 vector_a = np.random.randint(vector_dimension, size= vector_dimension) vector_b = np.random.randint(vector_dimension, size= vector_dimension)
Each vector contains 100 integer items, which are randomly selected through the numpy function:
np.random.randint(max integer , size of the vector)
Then, we select the platform to achieve the computation by using the get_platform() method:
platform = cl.get_platforms()[1]
Then, select the corresponding device. Here, platform.get_devices()[0] corresponds to the Intel(R) HD Graphics 5500 graphics card:
device = platform.get_devices()[0]
In the following steps, the context and the queue are defined; PyOpenCL provides the method context (device selected) and queue (context selected):
context = cl.Context([device]) queue = cl.CommandQueue(context)
In order to perform the computation in the selected device, the input vector is copied to the device's memory:
mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR,
hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR,
hostbuf=vector_b)
Then, we prepare the buffer for the resulting vector:
res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes)
Here, the kernel code is defined:
program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid];} """).build()
vectorSum is the name of the kernel, and the parameter list defines the data types of the input arguments and output data type (both are integer vectors). Inside the kernel body, the sum of two vectors is defined in the following steps:
- Initialize the vector's index: int gid = get_global_id(0).
- Sum the vector's components: res_g[gid] = a_g[gid] + b_g[gid].
In OpenCL (hence, in PyOpenCL), the buffers are attached to a context (https://documen.tician.de/pyopencl/runtime.html#pyopencl.Context), which are moved to a device once the buffer is used on that device.
Finally, we execute vectorSum in the device:
program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g)
To check the result, we use the assert statement. This tests the result and triggers an error if the condition is false:
assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5
The output should be as follows:
(base) C:>python vectorSumPyopencl.py
PyOPENCL SUM OF TWO VECTORS
Platform Selected = Intel(R) OpenCL
Device Selected = Intel(R) HD Graphics 5500
VECTOR LENGTH = 100
INPUT VECTOR A
[45 46 0 97 96 98 83 7 51 21 72 70 59 65 79 92 98 24 56 6 70 64 59 0
96 78 15 21 4 89 14 66 53 20 34 64 48 20 8 53 82 66 19 53 11 17 39 11
89 97 51 53 7 4 92 82 90 78 31 18 72 52 44 17 98 3 36 69 25 87 86 68
85 16 58 4 57 64 97 11 81 36 37 21 51 22 17 6 66 12 80 50 77 94 6 70
21 86 80 69]
INPUT VECTOR B
[25 8 76 57 86 96 58 89 26 31 28 92 67 47 72 64 13 93 96 91 91 36 1 75
2 40 60 49 24 40 23 35 80 60 61 27 82 38 66 81 95 79 96 23 73 19 5 43
2 47 17 88 46 76 64 82 31 73 43 17 35 28 48 89 8 61 23 17 56 7 84 36
95 60 34 9 4 5 74 59 6 89 84 98 25 50 38 2 3 43 64 96 47 79 12 82
72 0 78 5]
OUTPUT VECTOR RESULT A + B
[70 54 76 154 182 194 141 96 77 52 100 162 126 112 151 156 111 117 152
97 161 100 60 75 98 118 75 70 28 129 37 101 133 80 95 91 130 58 74 134
177 145 115 76 84 36 44 54 91 144 68 141 53 80 156 164 121 151 74 35
107 80 92 106 106 64 59 86 81 94 170 104 80 76 92 13 61 69 171 70 87
125 121 119 76 72 55 8 69 55 144 146 124 173 18 152 93 86 158 74]