A convolutional layer example with Keras to recognize digits

In the third chapter, we introduced a simple neural network to classify digits using Keras and we got 94%. In this chapter, we will work to improve that value above 99% using convolutional networks. Actual values may vary slightly due to variability in initialization.

First of all, we can start by improving the neural network we had defined by using 400 hidden neurons and run it for 30 epochs; that should get us already up to around 96.5% accuracy:

    hidden_neurons = 400
    epochs = 30

Next we could try scaling the input. Images are comprised of pixels, and each pixel has an integer value between 0 and 255. We could make that value a float and scale it between 0 and 1 by adding these four lines of code right after we define our input:

X_train = X_train.astype('float32')     
X_test = X_test.astype('float32')     
X_train /= 255     
X_test /= 255

If we run our network now, we get a poorer accuracy, just above 92%, but we need not worry. By rescaling, we have in fact changed the values of the gradient of our function, which therefore will converge much more slowly, but there is an easy work-around. In our code, inside the model.compile function, we defined an optimizer equal to "sgd". That is the standard stochastic gradient descent, which uses the gradient to converge to a minimum. However, Keras allows other choices, in particular "adadelta", which automatically uses momentum and adjusts the learning rate depending on the gradient, making it larger or smaller in an inversely proportional way to the gradient, so that the network does not learn too slowly and it does not skip minima by taking too large a step. By using adadelta, we dynamically adjust the parameters with time (see also: Matthew D. Zeiler, Adadelta: An Adaptive Learning Rate Method, arXiv:1212.5701v1 (https://arxiv.org/pdf/1212.5701v1.pdf)).

Inside the main function, we are now going to change our compile function and use:

model.compile(loss='categorical_crossentropy', 
              metrics=['accuracy'], optimizer='adadelta')

If we run our algorithm again, we are now at about 98.25% accuracy. Finally, let's modify our first dense (fully connected) layer and use the relu activation function instead of the sigmoid:

model.add(Activation('relu'))

This will now give around 98.4% accuracy. The problem is that now it becomes increasingly difficult to improve our results using a classical feed-forward architecture, due to over-fitting, and increasing the number of epochs or modifying the number of hidden neurons will not bring any added benefit, as the network will simply learn to over-fit the data, rather than learn to generalize better. We are therefore now going to introduce convolutional networks in the example.

To do this, we keep our input scaled between 0 and 1. However, we reshape the data to a volume of size (28, 28, 1) = (width of image, height of image, number of channels) in order to be used by a convolutional layer, and we bring the number of hidden neurons down to 200, but we now add a simple convolutional layer at the beginning, with a 3 x 3 filter, no padding, and stride 1, followed by a max-pooling layer of stride 2 and size 2. In order to be able to then pass the output to the dense layer, we need to flatten the volume (convolutional layers are volumes) to pass it to the regular dense layer with 100 hidden neurons by using the following code:

from keras.layers import Convolution2D, MaxPooling2D, Flatten
hidden_neurons = 200
X_train = X_train.reshape(60000, 28, 28, 1)     
X_test = X_test.reshape(10000, 28, 28, 1)
model.add(Convolution2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())

We can also reduce the number of epochs down to just 8, and we will get an accuracy of around 98.55%. Often it is common to use pairs of convolutional layers, so we add a second one similar to the first one, (before the pooling layer):

model.add(Convolution2D(32, (3, 3))) 
model.add(Activation('relu'))

And we will now be at 98.9%.

In order to get to 99%, we add a dropout layer as we have discussed. This does not add any new parameters, but helps prevent overfitting, and we add it right before the flatten layer:

from keras.layers import Dropout
model.add(Dropout(0.25))

In this example we use a dropout rate of about 25%, so each neuron is randomly dropped once every four times.

This will take us above 99%. If we want to improve more (accuracy may vary due to differences in initializations), we can also add more dropout layers, for example, after the hidden layer and increase the number of epochs. This would force the neurons on the final dense layer, prone to overfit, to be dropped randomly. Our final code looks like this:

import numpy as np      
np.random.seed(0)  #for reproducibility
from keras.datasets import mnist 
from keras.models import Sequential  
from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Flatten, Dropout  
from keras.utils import np_utils

input_size = 784
batch_size = 100     
hidden_neurons = 200     
classes = 10     
epochs = 8          

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()          
X_train = X_train.reshape(60000, 28, 28, 1)     
X_test = X_test.reshape(10000, 28, 28, 1)          
X_train = X_train.astype('float32')     
X_test = X_test.astype('float32')     
X_train /= 255     
X_test /= 255               
Y_train = np_utils.to_categorical(Y_train, classes)     
Y_test = np_utils.to_categorical(Y_test, classes)              
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(28, 28, 1)))             
model.add(Activation('relu'))                     
model.add(Convolution2D(32, (3, 3)))
model.add(Activation('relu'))                  
model.add(MaxPooling2D(pool_size=(2, 2)))             
model.add(Dropout(0.25))                  
model.add(Flatten())         
model.add(Dense(hidden_neurons))
model.add(Activation('relu'))       
model.add(Dense(classes))       
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',             
              metrics=['accuracy'], optimizer='adadelta')    
model.fit(X_train, Y_train, batch_size=batch_size,        
        epochs=epochs, validation_split = 0.1, verbose=1) 
score = model.evaluate(X_train, Y_train, verbose=1)
print('Train accuracy:', score[1])                           
score = model.evaluate(X_test, Y_test, verbose=1) 
print('Test accuracy:', score[1])

It is possible to further optimize this network, but the point here is not to get an award-winning score, but to understand the process, and understand how each step we have taken has improved performance. It is also important to understand that by using the convolutional layer, we have in fact also avoided overfitting our network, by utilizing fewer parameters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset