The CUDA programming model (and consequently PyCUDA, which is a Python wrapper) is implemented through specific extensions to the standard library of the C language. These extensions have been created just like function calls in the standard C library, allowing a simple approach to a heterogeneous programming model that includes the host and device code. The management of the two logical parts is done by the nvcc compiler.
Here is a brief description of how this works:
- Separate device code from the host code.
- Invoke a default compiler (for example, GCC) to compile the host code.
- Build the device code in binary form (.cubin objects) or in assembly form (PTX objects):
PyCUDA execution model
All the preceding steps are performed by PyCUDA during execution, with an increase in the application loading time compared to a CUDA application.