The TensorFlow library and digit recognition

For the exercises in this chapter, we will be using the TensorFlow library open-sourced by Google (available at https://www.tensorflow.org/). Installation instructions vary by operating system. Additionally, for Linux systems, it is possible to leverage both the CPU and graphics processing unit (GPU) on your computer to run deep learning models. Because many of the steps in training (such as the multiplications required to update a grid of weight values) involve matrix operations, they can be readily parallelized (and thus accelerated) by using a GPU. However, the TensorFlow library will work on CPU as well, so don't worry if you don't have access to an Nvidia GPU card.

The MNIST data

The data we will be examining in this exercise is a set of images of hand-drawn numbers from 0 to 9 from the Mixed National Institute of Standards and Technology (MNIST) database (LeCun, Yann, Corinna Cortes, and Christopher JC Burges. The MNIST database of handwritten digits. (1998)). Similar to the Hello World! program used to introduce basic programming techniques, or the word count example used for demonstrating distributed computing frameworks, the MNIST data is a common example used to demonstrate the functions of neural network libraries. The prediction task associated with this data is to assign a label (digit from 0 to 9) for an image, given only the input pixels.

The TensorFlow library provides a convenient library function to load this data using the commands:

>>> from tensorflow.examples.tutorials.mnist import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Note that along with specifying that we wish to load the MNIST data, we have indicated that the target variable (the digit represented by the image) should be encoded in a binary vector (for example, the number 3 is indicated by placing a 1 in the fourth element of this vector, since the first element encodes the digit 0). Once we have loaded the data, we can start examining the images themselves. We can see that the data has already been conveniently divided into a training and test set using a 4:1 split by examining the length of the training and test sets using the commands:

>>> len(mnist.train.images)
>>> len(mnist.test.images)

Each of these images is a 28*28 pixel image. In the data, these are stored as a one-dimensional vector of length 784, but we can use the skimage library from the previous chapter to visualize the images once we have reshaped the array into its original dimensions using the commands.

>>> from skimage import io
>> io.imshow(np.reshape(mnist.train.images[0],(28,28))) 

Which displays the first image in the set:

The MNIST data

This looks like the number 7: to check the label assigned to this image, we can examine the labels element of the train object using:

>>> mnist.train.labels[0]

which gives

array([ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])

The label is a 10-element vector, representing (from left to right) the digits 0–9, with 1 in the position associated with the label for an image. Indeed, the label assigned to this image is 7. Note that the label array takes the same shape as the final layer of the neural network algorithms we have been examining, giving the convenient ability to directly compare the label to the output once we have calculated the prediction.

Now that we have examined the data, let us use the other utilities of the TensorFlow library to develop a neural network that can predict the label of an image.

Constructing the network

As you can probably appreciate by now, the structure of deep neural networks can be extremely complex. Thus, if we were to define variables for each layer of the network, we would end up with a long block of code that would need to be modified every time we changed the structure of the network. Because in practical applications we may want to experiment with many different variations of depth, layer size, and connectivity, we instead show in this exercise an example of how to make this structure generic and reusable. The key ingredients are functions to produce the layers, a list of the desired layers specified by these generator functions, and an outer process that links together the generated layers:

>>> def weight_variable(dimensions,stddev):
 …     return tf.Variable(tf.truncated_normal(dimensions, stddev=stddev))

>>> def bias_variable(dimensions,constant):
…      return tf.Variable(tf.constant(constant, shape=dimensions))

>>> def two_dimensional_convolutional_layer(x, W, strides, padding):
 …     return tf.nn.conv2d(x, W, strides=strides, padding=padding)

>>> def max_pooling(x,strides,ksize,padding):
…     return tf.nn.max_pool(x, ksize=ksize,strides=strides, padding=padding)

>>> def generate_network(weight_variables,
                      bias_variables,
                      relu_layers,
                      pooling_layers,
                      fully_connected_layers,
                      inputs,
                      conv_strides,
                      pool_stries,
                      ksize,
                      output_channels,
                      conv_field_sizes,
                      conv_field_depths,
                      sd_weights
                      ,bias_mean,
                      padding,
                      conv_layers,
                      fc_layers,
                      fc_shape,
                      keep_prob,
                      class_num,
                      dropouts):
    
    # add convolutional layers
    >>> for k in range(conv_layers):
    …      weight_variables.append(weight_variable([conv_field_sizes[k], conv_field_sizes[k], conv_field_depths[k],output_channels[k]],sd_weights))
            bias_variables.append(bias_variable([output_channels[k]],bias_mean))
               relu_layers.append(tf.nn.relu(two_dimensional_convolutional_layer(inputs[k],weight_variables[k],
                                                                          conv_strides,padding) + bias_variables[k]))
            pooling_layers.append(max_pooling(relu_layers[k],pool_strides,ksize,padding))
            inputs.append(pooling_layers[k])
        
    # finally, add fully connected layers at end with dropout
    >>> for r in range(fc_layers):
            weight_variables.append(weight_variable(fc_shape,sd_weights))
            bias_variables.append(bias_variable([fc_shape[1]],bias_mean))
              pooling_layers.append(tf.reshape(pooling_layers[-1],[-1,fc_shape[0]]))
            fully_connected_layers.append(tf.nn.relu(tf.matmul(pooling_layers[-1], weight_variables[-1]) + bias_variables[-1]))
            dropouts.append(tf.nn.dropout(fully_connected_layers[-1], keep_prob))
    
    # output layer
          weight_variables.append(weight_variable([fc_shape[1],class_num],sd_weights))
           bias_variables.append(bias_variable([class_num],bias_mean))
          return tf.nn.softmax(tf.matmul(dropouts[-1],weight_variables[-1])+bias_variables[-1])

This format thus allows us to template the construction of the network in a way that is easily reconfigured and reused.

This function constructs a series of convolutional/max_pooling layers, followed by one or more fully connected layers, whose output is used to generate a prediction. At the end, we simply return the final layer prediction from the softmax function as the output. Thus, we can configure a network by setting a few parameters:

 >>> X = tf.placeholder("float", shape=[None, 784])
>>> observed = tf.placeholder("float", shape=[None, 10])
>>> images = tf.reshape(X, [-1,28,28,1])

# shape variables
>>> sd_weights = 0.1
>>> bias_mean = 0.1
>>> padding = 'SAME'
>>> conv_strides = [1,1,1,1]
>>> pool_strides = [1,2,2,1]
>>> ksize = [1,2,2,1]
>>> output_channels = [32,64]
>>> conv_field_sizes = [5,5]
>>> conv_field_depths = [1,32]
>>>fc_shape = [7*7*64,1024]
>>> keep_prob = tf.placeholder("float")
>>> class_num = 10
>>> conv_layers = 2
>>> fc_layers = 1

# layers variables
>>> weight_variables = []
>>> bias_variables = []
>>> relu_layers = []
>>> pooling_layers = []
>>> inputs = [images]
>>> fully_connected_layers = []
>>> dropouts = []

>>> prediction = generate_network(weight_variables,
                      bias_variables,
                      relu_layers,
                      pooling_layers,
                      fully_connected_layers,
                      inputs,
                      conv_strides,
                      pool_strides,
                      ksize,
                      output_channels,
                      conv_field_sizes,
                      conv_field_depths,
                      sd_weights
                      ,bias_mean,
                      padding,
                      conv_layers,
                      fc_layers,
                      fc_shape,
                      keep_prob,
                      class_num,
                      dropouts)

Note that the input (X) and the true labels (observed) are both placeholders, as is the probability of dropout in a layer (keep_prob)—they do not contain actual values, but will be filled in as the network is trained and we submit batches of data to the algorithm.

Now all we need to do is initialize a session and begin submitting batches of data using the following code:

>>> my_session = tf.InteractiveSession()
>>> squared_error = tf.reduce_sum(tf.pow(tf.reduce_sum(tf.sub(observed,prediction)),[2]))
>>> train_step = tf.train.GradientDescentOptimizer(0.01).minimize(squared_error)
>>> correct_prediction = tf.equal(tf.argmax(prediction,1), tf.argmax(observed,1))
>>> accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
>>> my_session.run(tf.initialize_all_variables())

>>>for i in range(20000):
…    batch = mnist.train.next_batch(50)
 …   if i%1000 == 0:
…      train_accuracy = accuracy.eval(feed_dict={X: batch[0], observed: batch[1], keep_prob: 1.0})
 …     print("step %d, training accuracy %g"%(i, train_accuracy))
…        train_step.run(feed_dict={X: batch[0], observed: batch[1], keep_prob: 0.5})
…        print("test accuracy %g"%accuracy.eval(feed_dict={X: mnist.test.images, observed: mnist.test.labels, keep_prob: 1.0}))

We can observe the progress of the algorithm as it trains, with the accuracy of every 1000th iteration printed to the console.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset