The CUDA architecture

In 2006, NVIDIA was presented as the first GPU to support DirectX 10; the GeForce 8800GTX was also the first GPU to use the CUDA architecture. This architecture included several new components designed specifically for GPU computing and aimed to remove the limitations that prevented them that previous GPUs were used for non-graphical calculations. In fact, the execution units on the GPU could read and write arbitrary memory as well as access a cache maintained in software called shared memory. These architectural features were added to make a CUDA GPU that also excelled in general purpose calculations as well as in traditional graphics tasks.

The following figure summarizes the division of space between the various components of a graphics processing unit (GPU) and a central processing unit (CPU). As you can see, a GPU devotes more transistors to data processing; it is a highly parallel, multithreaded, and many core processor:

CPU versus GPU architecture

Almost all the space on the GPU chip is dedicated to the ALU, apart from cache and control, making it suitable for repetitive calculations on large amounts of data. The GPU accesses a local memory and is connected to the system, that is, the CPU via a bus--currently, the Peripheral Component Interconnect Express (PCI Express).

The graphics chip consists of a series of multiprocessors, the Streaming Multiprocessor (SM).

The number of these multiprocessors depends on the specific characteristics and the performance class of each GPU.

Each multiprocessor is in turn formed by stream processors (or cores). Each of these processors can perform basic arithmetic operations on integer or floating-point numbers in single and double precision.

Table of Contents for The CUDA architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
The CUDA architecture