In the following example, we show the basic steps to build an application with PyOpenCL: the task to be performed is the sum of two vectors. In order to have a readable output, we'll consider two vectors that each have 100 elements: each i-th element of the resulting vector will be equal to the sum of the i-th element of vector_a, plus the i-th element of vector_b:
- Let's start by importing all the necessary libraries:
import numpy as np import pyopencl as cl import numpy.linalg as la
- We define the size of the vectors to be added, as follows:
vector_dimension = 100
- Here, the input vectors, vector_a and vector_b, are defined:
vector_a = np.random.randint(vector_dimension,size=vector_dimension) vector_b = np.random.randint(vector_dimension,size=vector_dimension)
- In sequence, we define platform, device, context, and queue:
platform = cl.get_platforms()[1] device = platform.get_devices()[0] context = cl.Context([device]) queue = cl.CommandQueue(context)
- Now, it's time to organize the memory areas that will contain the input vectors:
mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_b)
- Finally, we build the application kernel by using the Program method:
program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build()
- Then, we allocate the memory of the resulting matrix:
res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes)
- Then, we call the kernel function:
program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g)
- The memory space used to store the result is allocated in the host memory area (res_np):
res_np = np.empty_like(vector_a)
- Copy the result of the computation into the memory area created:
cl._enqueue_copy(queue, res_np, res_g)
- Finally, we print the results:
print ("PyOPENCL SUM OF TWO VECTORS") print ("Platform Selected = %s" %platform.name ) print ("Device Selected = %s" %device.name) print ("VECTOR LENGTH = %s" %vector_dimension) print ("INPUT VECTOR A") print (vector_a) print ("INPUT VECTOR B") print (vector_b) print ("OUTPUT VECTOR RESULT A + B ") print (res_np)
- Then, we perform a simple check in order to verify that the sum operation is correct:
assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5