Let us start with the recipe:
- Import tensorflow, matplotlib, random, and numpy. Then, import the minst data and perform one-hot encoding. Note the TensorFlow has some built-in libraries to deal with MNIST and we are going to use them:
from __future__ import division, print_function import tensorflow as tf import matplotlib.pyplot as plt import numpy as np # Import MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
- Introspect some data to understand what MNIST is. This gives us an idea of how many images are in the training dataset and how many are in the test dataset. We are also going to visualize a few numbers just to understand how they are represented. The multi-flat output can give us a visual sense of how difficult it can be to recognize a handwritten number, even for humans.
def train_size(num): print ('Total Training Images in Dataset = ' + str(mnist.train.images.shape)) print ('--------------------------------------------------') x_train = mnist.train.images[:num,:] print ('x_train Examples Loaded = ' + str(x_train.shape)) y_train = mnist.train.labels[:num,:] print ('y_train Examples Loaded = ' + str(y_train.shape)) print('') return x_train, y_train def test_size(num): print ('Total Test Examples in Dataset = ' + str(mnist.test.images.shape)) print ('--------------------------------------------------') x_test = mnist.test.images[:num,:] print ('x_test Examples Loaded = ' + str(x_test.shape)) y_test = mnist.test.labels[:num,:] print ('y_test Examples Loaded = ' + str(y_test.shape)) return x_test, y_test def display_digit(num): print(y_train[num]) label = y_train[num].argmax(axis=0) image = x_train[num].reshape([28,28]) plt.title('Example: %d Label: %d' % (num, label)) plt.imshow(image, cmap=plt.get_cmap('gray_r')) plt.show() def display_mult_flat(start, stop): images = x_train[start].reshape([1,784]) for i in range(start+1,stop): images = np.concatenate((images, x_train[i].reshape([1,784]))) plt.imshow(images, cmap=plt.get_cmap('gray_r')) plt.show() x_train, y_train = train_size(55000) display_digit(np.random.randint(0, x_train.shape[0])) display_mult_flat(0,400)
Let us look at the output of the preceding code:
An example of MNIST handwritten numbers
- Set up the learning parameter, the batch_size, and the display_step. In addition, set up n_input = 784 given that the MNIST images share 28 x 28 pixels, output n_classes = 10 representing the output digits [0-9], and the dropout probability = 0.85:
# Parameters learning_rate = 0.001 training_iters = 500 batch_size = 128 display_step = 10 # Network Parameters n_input = 784
# MNIST data input (img shape: 28*28) n_classes = 10
# MNIST total classes (0-9 digits) dropout = 0.85
# Dropout, probability to keep units
- Set up the TensorFlow computation graph input. Let's define two placeholders to store predictions and true labels:
x = tf.placeholder(tf.float32, [None, n_input]) y = tf.placeholder(tf.float32, [None, n_classes]) keep_prob = tf.placeholder(tf.float32)
- Define a convolutional layer with input x, weight W, bias b, and a given stride. The activation function is ReLU and padding is SAME:
def conv2d(x, W, b, strides=1): x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') x = tf.nn.bias_add(x, b) return tf.nn.relu(x)
- Define a maxpool layer with input x, with ksize and padding SAME:
def maxpool2d(x, k=2): return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')
- Define convnet with two convolutional layers followed by a fully connected layer, a dropout layer, and a final output layer:
def conv_net(x, weights, biases, dropout): # reshape the input picture x = tf.reshape(x, shape=[-1, 28, 28, 1]) # First convolution layer conv1 = conv2d(x, weights['wc1'], biases['bc1']) # Max Pooling used for downsampling conv1 = maxpool2d(conv1, k=2) # Second convolution layer conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) # Max Pooling used for downsampling conv2 = maxpool2d(conv2, k=2) # Reshape conv2 output to match the input of fully connected layer fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]]) # Fully connected layer fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1']) fc1 = tf.nn.relu(fc1) # Dropout fc1 = tf.nn.dropout(fc1, dropout) # Output the class prediction out = tf.add(tf.matmul(fc1, weights['out']), biases['out']) return out
- Define the layers weights and the bias. The first conv layer has a 5 x 5 convolution, 1 input, and 32 outputs. The second conv layer has 5 x 5 convolutions, 32 inputs, and 64 outputs. The fully connected layer has 7 x 7 x 64 inputs and 1,024 outputs, while the second layer has 1,024 inputs and 10 outputs corresponding to the final digit classes. All the weights and bias are initialized with randon_normal distribution:
weights = { # 5x5 conv, 1 input, and 32 outputs 'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])), # 5x5 conv, 32 inputs, and 64 outputs 'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])), # fully connected, 7*7*64 inputs, and 1024 outputs 'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])), # 1024 inputs, 10 outputs for class digits 'out': tf.Variable(tf.random_normal([1024, n_classes])) } biases = { 'bc1': tf.Variable(tf.random_normal([32])), 'bc2': tf.Variable(tf.random_normal([64])), 'bd1': tf.Variable(tf.random_normal([1024])), 'out': tf.Variable(tf.random_normal([n_classes])) }
- Build the convnet with the given weights and bias. Define the loss function based on cross_entropy with logits and use the Adam optimizer for cost minimization. After optimization, compute the accuracy:
pred = conv_net(x, weights, biases, keep_prob) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init = tf.global_variables_initializer()
- Launch the graph and iterate for training_iterats times where each time a batch_size is taken in the input to run the optimizer. Note that we train by using the mnist.train data, which is separated from minst. Each display_step, the current partial accuracy is computed. At the end, the accuracy is computed on 2,048 test images with no dropout.
train_loss = [] train_acc = [] test_acc = [] with tf.Session() as sess: sess.run(init) step = 1 while step <= training_iters: batch_x, batch_y = mnist.train.next_batch(batch_size) sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout}) if step % display_step == 0: loss_train, acc_train = sess.run([cost, accuracy], feed_dict={x: batch_x, y: batch_y, keep_prob: 1.}) print "Iter " + str(step) + ", Minibatch Loss= " + "{:.2f}".format(loss_train) + ", Training Accuracy= " + "{:.2f}".format(acc_train) # Calculate accuracy for 2048 mnist test images. # Note that in this case no dropout acc_test = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1.}) print "Testing Accuracy:" + "{:.2f}".format(acc_train) train_loss.append(loss_train) train_acc.append(acc_train) test_acc.append(acc_test) step += 1
- Plot the Softmax loss per iteration, together with train and test accuracy:
eval_indices = range(0, training_iters, display_step) # Plot loss over time plt.plot(eval_indices, train_loss, 'k-') plt.title('Softmax Loss per iteration') plt.xlabel('Iteration') plt.ylabel('Softmax Loss') plt.show() # Plot train and test accuracy plt.plot(eval_indices, train_acc, 'k-', label='Train Set Accuracy') plt.plot(eval_indices, test_acc, 'r--', label='Test Set Accuracy') plt.title('Train and Test Accuracy') plt.xlabel('Generation') plt.ylabel('Accuracy') plt.legend(loc='lower right') plt.show()
Following is the output of the preceding code. We first look at Softmax per iteration:
An example of loss decrease
We look at the train and text accuracy next:
An example of train and test accuracy increase