GPU versus CPU

One of the reasons for the popularity of deep learning today is the drastically increased processing capacity of GPUs (Graphical Processing Units). Architecturally, the CPU (Central Processing Unit) is composed of a few cores that can handle a few threads at a time, while GPUs are composed of hundreds of cores that can handle thousands of threads at the same time. A GPU is a highly parallelizable unit, compared to the CPU that is mainly a serial unit.

DNNs are composed of several layers, and each layer has neurons that behave in the same manner. Moreover, we have discussed how the activity value for each neuron is , or, if we express it in matrix form, we have a = wx, where a and x are vectors and w a matrix. All activation values are calculated in the same way across the network. CPUs and GPUs have a different architecture, in particular they are optimized differently: CPUs are latency optimized and GPUs are bandwidth optimized. In a deep neural network with many layers and a large number of neurons, bandwidth becomes the bottleneck, not latency, and this is the reason why GPUs perform so much better. In addition, the L1 cache of the GPU is much faster than the L1 cache for the CPU and is also larger.

The L1 cache represents memory of information that the program is likely to use next, and storing this data can speed up the process. Much of the memory gets re-used in deep neural networks, which is why L1 cache memory is important. Using GPUs, you can get your program, go up to one order of magnitude faster than simply using CPUs, and use of this speed-up is also the reason behind much of the recent progress in speech and image processing using deep neural networks, an increase in computing power that was not available a decade ago.

GPU versus CPU

In addition to be faster for DNN training, GPUs are also more efficient to run the DNN inference. Inference is the post-training phase where we deploy our trained DNN. In a whitepaper published by GPU vendor Nvidia titled GPU-Based Deep Learning Inference: A Performance and Power Analysis, available online at http://www.nvidia.com/content/tegra/embedded-systems/pdf/jetson_tx1_whitepaper.pdf, an efficiency comparison is made between the use of GPUs and CPUs on the AlexNet network (a DNN with several convolutional layers) and the results are summarized in the following table:

Network: AlexNet

Batch Size

Tegra X1 (FP32)

Tegra X1 (FP16)

Core i7 6700K (FP32)

Inference performance

1

47 img/sec

67 img/sec

62 img/sec

Power

5.5 W

5.1 W

49.7 W

Performance/Watt

8.6 img/sec/W

13.1 img/sec/W

1.3 img/sec/W

Inference performance

128 (Tegra X1)

48 (Core i7)

155 img/sec

258 img/sec

242 img/sec

Power

6.0 W

5.7 W

62.5 W

Performance/Watt

25.8 img/sec/W

45 img/sec/W

3.9 img/sec/W

The results show that inference on Tegra X1 can be up to an order of magnitude more energy-efficient that CPU-based inference while achieving comparable performance levels

Writing code to access the GPU directly instead of the CPU is not easy, but that is why most popular open source libraries like Theano or TensorFlow allow you to simply turn on a simple switch in your code to use the GPU rather than the CPU. Use of these libraries does not require writing specialized code, but the same code can run on both the CPU and the GPU, if available. The switch depends on the open source library, but typically it can be through setting up determined environment variables or by creating a specialized resource (.rc) file that is used by the particular open source library chosen.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset