22.4CPUImplementation 369
22.4CPUImplementation
We first describe the implementation of the algorithm for the CPU as a reference
for the implementations on the GPU described in the following sections.
During the design of an algorithm for the GPU, it is critical to minimize the
amount of data that travels on the main memory bus. The time spent on the bus is
actually one of the primary bottlenecks that strongly penalize performance [Nvid-
ia 2010]. The transfer bandwidth of a standard PCI-express bus is 2 to 8 GB per
second. The internal bus bandwidth of a modern GPU is approximately 100 to
150 GB per second. It is very important, therefore, to minimize the amount of
data that travels on the bus and keep the data on the GPU as much as possible.
In the case of cloth simulation, only the current and the previous positions of
the particles are needed on the GPU. The algorithm computes directly on GPU
the rest distance of the springs and which particles are connected by the springs.
The state of each particle is represented by the following attributes:
1. The current position (four floating-point values).
2. The previous position (four floating-point values).
3. The current normal vector (four floating-point values).
Even though the normal vector is computed during the simulation, it is used
only for rendering purposes and does not affect the simulation dynamics. Here,
the normal vector of a particle is defined to be the average of the normal vectors
of the triangulated faces to which the particle belongs. A different array is created
for storing the current positions, previous positions, and normal vectors. As ex-
plained in later sections of this chapter, for the GPU implementation, these at-
tributes are loaded as textures or buffers into video memory. Each array stores
the attributes for all the particles. The size of each array is equal to the size of an
attribute (four floating-point values) multiplied by the number of particles. For
example, the position of the i-th particle
i
is stored in the positions array and
accessed as follows:
i
os
vec3(in_pos[i * 4], in_pos[i * 4 + 1], in_pos[i * 4 + 2],
in_pos[i * 4 + 3])
The cloth is built as a grid of
nn
particles, where n is the number of parti-
cles composing one side of the grid. Regardless of the value of n, the horizontal
and the vertical spatial dimensions of the grid are always normalized to the range