The TensorFlow GPU setup

To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit.

Once the CUDA Toolkit is installed, you must download the cuDNN v5.1 library for Linux from https://developer.nvidia.com/cudnn.

cuDNN is a library that helps accelerate deep learning frameworks, such as TensorFlow and Theano. Here's a brief explanation from the NVIDIA website:

"The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK."

Before installing it, you'll need to register on NVIDIA's Accelerated Computing Developer Program. Once you're registered, log in and download cuDNN 5.1 to your local computer.

Once it is downloaded, decompress the files and copy them into the CUDA Toolkit directory (we've assumed here that the directory is /usr/local/cuda/):

$ sudo tar -xvf cudnn-8.0-linux-x64-v5.1-rc.tgz -C /usr/local

Update TensorFlow

We're assuming you'll be using TensorFlow to build your deep neural network models. Simply update TensorFlow via pip with the upgrade flag.

We suppose you're currently using TensorFlow 0.11:

pip install — upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl

Now you should have everything you need to run a model using your GPU.

GPU representation

In TensorFlow, the supported devices are represented as strings:

  • "/cpu:0": The CPU of your machine
  • "/gpu:0": The GPU of your machine, if you have one
  • "/gpu:1": The second GPU of your machine, and so on

The execution flow gives priority when an operation is assigned to a GPU device.

Using a GPU

To use a GPU in your TensorFlow program, just type the following:

with tf.device("/gpu:0"):

Then you need to do the setup operations. This line of code will create a new context manager, telling TensorFlow to perform those actions on the GPU.

Let's consider the following example, in which we want to execute the following sum of two large matrices: Using a GPU.

Define the basic imports:

import numpy as np
import tensorflow as tf
import datetime

We can configure a TensorFlow program to find out which devices your operations and tensors are assigned to. To do this, we'll create a session with the following log_device_placement parameter set to True:

log_device_placement = True

Then we set the n parameter, which is the number of multiplications to perform:

n=10

Then we build two random large matrixes. We use NumPy's rand function to perform this operation:

A = np.random.rand(10000, 10000).astype('float32')
B = np.random.rand(10000, 10000).astype('float32')

A and B will each be of size 10000x10000.

The following arrays will be used to store the results:

c1 = []
c2 = []

Next, we define the kernel matrix multiplication function, that will be performed by the GPU:

def matpow(M, n):
    if n == 1:
        return M
    else:
        return tf.matmul(M, matpow(M, n-1))

As we previously explained, we must configure the GPU and the GPU with the operations to perform:

The GPU will compute the Using a GPU and Using a GPU operations and store results in c1:

with tf.device('/gpu:0'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

The addition of all elements in c1 (Using a GPU) is performed by the CPU, so we define the following:

with tf.device('/cpu:0'):
  sum = tf.add_n(c1) 

The datetime class allows us to evaluate the computational time:

t1_1 = datetime.datetime.now()
with tf.Session(config=tf.ConfigProto
          (log_device_placement=log_device_placement)) as sess:
    sess.run(sum, {a:A, b:B})
t2_1 = datetime.datetime.now()

The computational time is then displayed:

print("GPU computation time: " + str(t2_1-t1_1))

On my laptop, using a GeForce 840M graphic card, the result is as follows:

GPU computation time: 0:00:13.816644

GPU memory management

In some cases, it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two config options on the session to control this.

The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the amount of GPU memory needed by the TensorFlow process.

Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in ConfigProto as follows:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU as follows:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

This is useful if you want to truly limit the amount of GPU memory available to the TensorFlow process.

Assigning a single GPU on a multi-GPU system

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run your session on a different GPU, you will need to specify the preference explicitly.

For example, we can try to change the GPU assignation in the previous code:

with tf.device('/gpu:1'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

In this way, we are telling gpu1 to execute the kernel function. If the device we have specified does not exist (as in my case), you will get InvalidArgumentError:

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Placeholder_1': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
     [[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=[100,100], _device="/device:GPU:1"]()]]

If you would like TensorFlow to automatically choose an existing and supported device to run the operations if the specified one doesn't exist, you can set allow_soft_placement to True in the configuration option when creating the session.

Again, we set '/gpu:1' for the following node:

with tf.device('/gpu:1'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

Then we build a Session with the following allow_soft_placement parameter set to True:

with tf.Session(config=tf.ConfigProto
                 (allow_soft_placement=True,
                log_device_placement=log_device_placement))
                  as sess:

In this way, when we run the session, no InvalidArgumentError will be displayed. We'll get a correct result, in this case, with a little delay:

GPU computation time: 0:00:15.006644

The source code for GPU with soft placement

Here's the complete source code, just for clarity:

import numpy as np
import tensorflow as tf
import datetime

log_device_placement = True
n = 10

A = np.random.rand(10000, 10000).astype('float32')
B = np.random.rand(10000, 10000).astype('float32')

c1 = []
c2 = []


def matpow(M, n):
    if n == 1: 
        return M
    else:
        return tf.matmul(M, matpow(M, n-1))

with tf.device('/gpu:0'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

with tf.device('/cpu:0'):
    sum = tf.add_n(c1) 

t1_1 = datetime.datetime.now()
with tf.Session(config=tf.ConfigProto
                 (allow_soft_placement=True,
                log_device_placement=log_device_placement))
                  as sess:
     sess.run(sum, {a:A, b:B})
t2_1 = datetime.datetime.now()

Using multiple GPUs

If you would like to run TensorFlow on multiple GPUs, you can construct your model by assigning a specific chink of code to a GPU. For example, if we have two GPUs, we can split the previous code as follows, assigning the first matrix computation to the first GPU:

with tf.device('/gpu:0'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))

The second matrix computation is assigned to the second GPU:

with tf.device('/gpu:1'):
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(b, n))

The CPU will manage the results. Also, note that we used the shared c1 array to collect them:

with tf.device('/cpu:0'):
    sum = tf.add_n(c1)

In the following code snippet, we provide a concrete example of management of two GPUs:

import numpy as np
import tensorflow as tf
import datetime

log_device_placement = True
n = 10

A = np.random.rand(10000, 10000).astype('float32')
B = np.random.rand(10000, 10000).astype('float32')

c1 = []

def matpow(M, n):
    if n == 1:  
        return M
    else:
        return tf.matmul(M, matpow(M, n-1))

#FIRST GPU
with tf.device('/gpu:0'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(a, n))
    
#SECOND GPU
with tf.device('/gpu:1'):
    b = tf.placeholder(tf.float32, [10000, 10000])
    c1.append(matpow(b, n))


with tf.device('/cpu:0'):
    sum = tf.add_n(c1) 

t1_1 = datetime.datetime.now()
with tf.Session(config=tf.ConfigProto
                 (allow_soft_placement=True,
                log_device_placement=log_device_placement))
                  as sess:
     sess.run(sum, {a:A, b:B})
t2_1 = datetime.datetime.now()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset