22. GPGPU Cloth Simulation Using GLSL, OpenCL, and CUDA (3/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

22.9Results 375

22.9Results

The described method has been implemented and tested on two different ma-

chines:

■ A desktop PC with an Nvidia GeForce GTS250, 1GB VRAM and a proces-

sor Intel Core i5.

■ A laptop PC with an Nvidia Quadro FX 360M, 128MB VRAM and a proces-

sor Intel Core2 Duo.

We collected performance times for each GPU computing platform, varying the

numbers of particles and springs, from a grid resolution of

232



(1024 particles

and 11,412 springs) to 256 256



(65,536 particles and approximately 700,000

springs). Numerical results are collected in the plots in Figures 22.5 and 22.6.

From the data plotted in Figures 22.5 and 22.6, the computing superiority of

the GPU compared with the CPU is evident. This is mainly due to the fact that

this cloth simulation algorithm is strongly parallelizable, like most of the particle-

based approaches. While the computational cost on the CPU keeps growing line-

arly with the number of particles, the computation time on the GPU remains rela-

tively low because the particle dynamics are computed in parallel. On the

GTS250 device, this leads to a performance gain ranging from 10 to 40 times,

depending on the number of particles.

It is interesting to note that in this case, GLSL has a much better performance

than CUDA does. This can be explained by considering how the memory is ac-

cessed by the GPU kernels. In the GLSL fragment program, images are em-

ployed to store particle data in texture memory, while in CUDA and OpenCL,

these data is stored in the global memory of the device. Texture memory has two

main advantages [Nvidia 2010]. First, it is cached, and thus, video memory is

accessed only if there is a cache miss. Second, it is built in such a way as to op-

timize the access to 2D local data, which is the case because each particle corre-

sponds to a pixel, and it must have access to the positions of its neighbors, which

are stored in the immediately adjacent texture pixels. Furthermore, the results in

GLSL are stored in the color render targets that are then directly mapped to

VBOs and drawn on the screen. The data resides in video memory and does not

need to be copied between different memory areas. This makes the entire process

extremely fast compared with the other approaches.

The plots also highlight the lower performance of OpenCL compared with

CUDA. This difference is caused by the fact that it has been rather difficult to

tune the number of global and local work items due to causes requiring further

376 22.GPGPUClothSimulationUsing GLSL,OpenCL,andCUDA

Figure 22.5. Computation times measured on different computation platforms using a

GeForce GTS 250 device (16 computing units, 128 CUDA cores).

Figure 22.6. Computation times measured on different computation platforms using a

Quadro FX 360M device (2 computing units, 16 CUDA cores).

1.57

0.25

1.02

0.70

0.28

0.99

0.71

6.48

25.5

0.23

1.58

0.69

0.17

4.10

1.36

99.8

Time (ms)

CPU GLSL OCL CUDA CPU GLSL OCL CUDA CPU GLSL OCL CUDA CPU GLSL OCL CUDA

1024 particles

11,412 springs

4096 particles

47,380 springs

16,384 particles

193,044 springs

65,536 particles

779,284 springs

Time (ms)

2.54

0.30

1.66

0.81

0.30

9.78

3.93

10.4

42.5

0.26

10.1

3.32

0.29

34.1

11.0

160

CPU GLSL OCL CUDA CPU GLSL OCL CUDA CPU GLSL OCL CUDA CPU GLSL OCL CUDA

1024 particles

11,412 springs

4096 particles

47,380 springs

16,384 particles

193,044 springs

65,536 particles

779,284 springs

22.10FutureWork 377

investigation. OpenCL is a very young standard, and both the specification and

the driver implementation are likely to change in the near future in order to avoid

such instabilities.

The GLSL program works on relatively old hardware, and different from

CUDA, it does not require Nvidia hardware. CUDA on the other hand, is a more

flexible architecture that has been specifically devised for performing computing

tasks (not only graphics, like GLSL), which is easier to debug and provides ac-

cess to hardware resources, like the shared memory, allowing for a further boost

to the performance. OpenCL has the same features as CUDA, but its implementa-

tion is rather naive at the moment, and it is harder to debug. However, different

from CUDA, it has been devised to run on the widest range of hardware plat-

forms (including consoles and mobile phones), not limited to Nvidia ones, and

thus, it is the main candidate for becoming the reference platform for GPGPU in

the near future.

The main effort when dealing with GPGPU is in the design of the algorithm.

The challenging task that researchers and developers are currently facing is how

to redesign algorithms that have been originally conceived to run in a serial man-

ner for the CPU, to make them parallel and thus suitable for the GPU. The main

disadvantage of particle-based methods is that they require a very large number

of particles to obtain realistic results. However, it is relatively easy to parallelize

algorithms handling particle systems, and the massive parallel computation capa-

bilities of modern GPUs now makes it possible to simulate large systems at inter-

active rates.

22.10FutureWork

Our algorithm for cloth simulation can be improved in many ways. In the CUDA

and OpenCL implementations, it would be interesting to exploit the use of shared

memory, which should reduce the amount of global accesses and lead to im-

proved performance.

For future research, we would like to investigate ways to generalize this algo-

rithm by introducing connectivity information [Tejada 2005] that stores the in-

dexes of the neighbors of each particle. This data can be stored in constant

memory to hide as much as possible the inevitable latency that using this infor-

mation would introduce. By using connectivity, it would be possible to simulate

deformable, breakable objects with arbitrary shapes, not only rectangular pieces

of cloth.

378 22.GPGPUClothSimulationUsing GLSL,OpenCL,andCUDA

22.11Demo

An implementation of the GPU cloth simulation is provided on the website, and

it includes both the source code in C++ and the Windows binaries. The demo

allows you to switch among the computing platforms at run time, and it includes

a hierarchical profiler. Even though the source code has been developed for Win-

dows using Visual Studio 2008, it has been written with cross-platform compati-

bility in mind, without using any Windows-specific commands, so it should

compile and run on *nix platforms (Mac and Linux). The demo requires a ma-

chine capable of running Nvidia CUDA, and the CUDA Computing SDK 3.0

needs to have been compiled. A video is also included on the website.

Acknowledgements

The shader used for rendering the cloth is “fabric plaid” from RenderMonkey 1.82 by

AMD and 3DLabs. The author is grateful to Professor Ingemar Ragnemalm for having

introduced him to the fascinating world of GPGPU.

References

[Müller 2008] Matthias Müller, Jos Stam, Doug James, and Nils Thürey. “Real Time

Physics.” ACM SIGGRAPH 2008 Course Notes. Available at http://www.

matthiasmueller.info/realtimephysics/index.html.

[Nvidia 2010] “NVIDIA CUDA Best Practices Guide,” Version 3.0, 2010. Available at

http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_

CUDA_BestPracticesGuide.pdf.

[Tejada 2005] Eduardo Tejada and Thomas Ertl. “Large Steps in GPU-Based Deforma-

ble Bodies Simulation.” Simulation Modelling Practice and Theory 13:8 (Novem-

ber 2005), pp. 703–715.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 22. GPGPU Cloth Simulation Using GLSL, OpenCL, and CUDA (3/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
22. GPGPU Cloth Simulation Using GLSL, OpenCL, and CUDA (3/3)