An image classifier with RNNs

At this point we introduce our implementation of a recurrent model including LSTMs blocks for an image classification problem. The dataset we used is the well known MNIST.

The implemented model is composed of a single LSTM layer followed by a reduce mean operation and a softmax layer, as illustrated in the following figure:

Dataflow in an RNN architecture

The following code computes the mean of elements across dimensions of a tensor and reduces input_tensor along the dimensions given in axis. Unless keep_dims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keep_dims is true, the reduced dimensions are retained with length 1:
tf.reduce_mean(input_tensor, axis=None,
keep_dims=False, name=None, reduction_indices=None)
If axis has no entries, all dimensions are reduced, and a tensor with a single element is returned.
For example:
# 'x' is [[1., 1.]
# [2., 2.]]
tf.reduce_mean(x)==> 1.5
tf.reduce_mean(x,0)==> [1.5,1.5]
tf.reduce_mean(x,1)==> [1.,2.]

Thus, starting from an input sequence x0, x1,...xn, the memory cells in the LSTM layer will produce a representation sequence h0, h1,...hn.

This representation sequence is then averaged over all time steps resulting in the final representation h. Finally, this representation is fed to a softmax layer whose target is the class label associated with the input sequence.

Let's begin the implementation, with the usual importing of all dependencies:

import tensorflow as tf 
from tensorflow.contrib import rnn

The imported rnn and rnn_cell are TensorFlow classes, described as follows:

The rnn_cell module provides a number of basic commonly used RNN cells, such as LSTM and a number of operators that allow us to add dropouts, projections, or embeddings for inputs

Then we load the MNIST dataset using the following library:

from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

It might take minutes as it downloads the dataset from the Internet.

To classify the images using a Recurrent Neural Network, we must consider every image row as a sequence of pixels, because MNIST image shape is 28×28 pixels, we will then handle 28 sequences of 28 timesteps for every sample.

We then define the following parameters:

MNIST data input (image shape: 28×28) 
n_input = 28 
the timesteps  
n_steps = 28 
The number of features in the hidden layer:  
n_hidden = 128 
MNIST total classes (0-9 digits)  
n_classes = 10

Here we define our parameters that we will be using in the learning process:

learning_rate = 0.001 
training_iters = 100000 
batch_size = 128 
display_step = 10

Define our input data (the images) as x. The datatype for this tensor is set to float and the shape is set to [None, n_steps, n_input]. The None parameter means that the tensor may hold an arbitrary number of images:

x = tf.placeholder("float", [None, n_steps, n_input])

Then we have the placeholder variable for the true labels associated with the images that were input in the placeholder variable x. The shape of this placeholder variable is [None, n_classes], which means it may hold an arbitrary number of labels and each label is a vector of length n_classes, which is 10 in this case:

y = tf.placeholder("float", [None, n_classes]) 
weights = { 
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes])) 
} 
biases = { 
    'out': tf.Variable(tf.random_normal([n_classes])) 
}

We define the network through the RNN function:

def RNN (x, weights, biases):

We set the input x data shape to correctly match the RNN function requirements. Notice the following:

The current input data will be (batch_size, n_steps, n_input)
The required shape is a n_steps tensors list of shape (batch_size, n_input)

In order to do this, we must perform some transformations on the x input tensor. The first operation is to permutate a transposition of the current input data:

    x = tf.transpose(x, [1, 0, 2])

This operation returns a (28,28,128) tensor from the (128,28,28) current input data. Then, reshape x:

    x = tf.reshape(x, [-1, n_input])

It returns a n_steps × batch_size, n_input tensor. Split the x tensor to get the required list of n_steps tensors of shape (batch_size, n_input):

    x = tf.split(axis=0, num_or_size_splits=n_steps, value=x)

To define our Recurrent Neural Network perform the following steps:

Define a single LSTM cell: The BasicLSTMCell method defines LSTM recurrent network cell. The forget_bias parameter is set to 1.0 to reduce the scale of forgetting in the beginning of the training:

         lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

Build the network: The rnn() operation creates the compute nodes for a given amount of time steps:

         outputs, states = rnn. static_rnn (lstm_cell, x, 
         dtype=tf.float32)

This operation returns the LSTM cell outputs, where:

outputs is a length and n_steps is a list of outputs (one for each input)
states are the cell final states

The resulting tensor of the RNN function is a vector of length 10 for determining which of the 10 classes the input image belongs to:

return tf.matmul(outputs[-1], weights['out']) + biases['out']

We define the cost function and optimizer for the predictor:

pred = RNN(x, weights, biases)

We used softmax_cross_entropy_with_logits as a performance measure and reduce_mean to take the average of the cross-entropy for all the image classifications:

New: cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

Then we apply the AdamOptimizer algorithm to minimize the cross-entropy so it gets as close to zero as possible by changing the variables of the network layers:

optimizer = tf.train.AdamOptimizer 
                (learning_rate=learning_rate).minimize(cost)

We define the accuracy that will be displayed during the computation:

correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

We then initialize the variables:

init = tf.global_variables_initializer()

It's time to begin the training session--first, we make our session to use it to make our computations happen:

with tf.Session() as sess: 
    sess.run(init) 
    step = 1

Build the batch sets until we reach the maximum training iterations:

    while step * batch_size < training_iters: 
        batch_x, batch_y = mnist.train.next_batch(batch_size)

Reshape data to get 28 sequences of 28 elements:

     batch_x = batch_x.reshape((batch_size, n_steps, n_input))

Run through our data in the sequential manner, we break them into pieces and every piece is sized by the batch size that we defined, and then we take every piece and feed it to our optimizer and calculate our accuracy and error and repeat it by feeding new chunks, and so on. In this process, our accuracy gets better the more we feed it:

    sess.run(optimizer, feed_dict={x: batch_x, y: batch_y}) 
        if step % display_step == 0:

Compute the accuracy using the following code:

     acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})

The loss value, on the other hand, can be calculated as follows:

      loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})

Then we can display the accuracy as follows:

       print("Iter " + str(step*batch_size) + ", Minibatch Loss= " +  
                  "{:.6f}".format(loss) + ", Training Accuracy= " +  
                "{:.5f}".format(acc)) 
        step += 1 
       print("Optimization Finished!")

Finally, we test the RNN model, on a subset (or batch set) of images:

    test_len = 128 
    test_data = mnist.test.images[:test_len] 
                          .reshape((-1, n_steps, n_input)) 
    test_label = mnist.test.labels[:test_len] 
    print("Testing Accuracy:",  
    sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

The resulting output is shown as follows:

 >>>
 Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
 Extracting /tmp/data/t10k-images-idx3-ubyte.gz
 Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
 Iter 1280, Minibatch Loss= 1.861236, Training Accuracy= 0.35156
 Iter 2560, Minibatch Loss= 1.457468, Training Accuracy= 0.51562
 Iter 3840, Minibatch Loss= 1.092437, Training Accuracy= 0.64062
 Iter 5120, Minibatch Loss= 0.857512, Training Accuracy= 0.73438
 Iter 6400, Minibatch Loss= 0.678605, Training Accuracy= 0.78125
 Iter 7680, Minibatch Loss= 1.139174, Training Accuracy= 0.61719
 Iter 8960, Minibatch Loss= 0.797665, Training Accuracy= 0.75781
 Iter 10240, Minibatch Loss= 0.640586, Training Accuracy= 0.81250
 Iter 11520, Minibatch Loss= 0.379285, Training Accuracy= 0.90625
 Iter 12800, Minibatch Loss= 0.694143, Training Accuracy= 0.72656
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Iter 85760, Minibatch Loss= 0.110027, Training Accuracy= 0.96094
 Iter 87040, Minibatch Loss= 0.042054, Training Accuracy= 0.98438
 Iter 88320, Minibatch Loss= 0.110460, Training Accuracy= 0.96875
 Iter 89600, Minibatch Loss= 0.098120, Training Accuracy= 0.97656
 Iter 90880, Minibatch Loss= 0.081780, Training Accuracy= 0.96875
 Iter 92160, Minibatch Loss= 0.064964, Training Accuracy= 0.97656
 Iter 93440, Minibatch Loss= 0.077182, Training Accuracy= 0.96094
 Iter 94720, Minibatch Loss= 0.187053, Training Accuracy= 0.95312
 Iter 96000, Minibatch Loss= 0.128569, Training Accuracy= 0.96094
 Iter 97280, Minibatch Loss= 0.125085, Training Accuracy= 0.96094
 Iter 98560, Minibatch Loss= 0.102962, Training Accuracy= 0.96094
 Iter 99840, Minibatch Loss= 0.063063, Training Accuracy= 0.98438
 Optimization Finished! Testing Accuracy: 0.960938 
 >>>

Table of Contents for An image classifier with RNNs

Create new playlist

Sign In

Sign Up

Table of Contents for
An image classifier with RNNs