Popular open source libraries – an introduction

There are several Open Source libraries available that allow the creation of deep neural nets in Python without having to explicitly write the code from scratch. The most commonly used are: Keras, Theano, TensorFlow, Caffe, and Torch. In this book we will provide examples using the first three libraries, which can all be used in Python. The reason for this is that Torch is not based on Python, but on a different language, called Lua, while Caffe is mainly used for Image recognition only. For these libraries, we will quickly describe how to turn on the GPU switch we discussed in the previous paragraph. Much of the code in this book can then be run on a CPU or a GPU, depending on the hardware available to the reader.

Theano

Theano (http://deeplearning.net/software/theano/) is an open source library written in Python that implements many features that make it easy to write code for neural networks. In addition, Theano makes it very easy to take advantage of GPU acceleration and performance. Without going into the details of how Theano works, Theano uses symbolic variables and functions. Among many features that are really appealing, Theano allows us to use back-propagation very easily by calculating all the derivatives for us.

As mentioned earlier, Theano also makes it very easily to utilize the GPU on your machine. There are many ways to do this, but the simplest is to create a resource file called .theanorc with the following lines:

[global]
device = gpu  
floatX = float32

It is easy to check whether Theano is configured to use your GPU by simply typing:

print(theano.config.device)

We refer to the Theano documentation for learning the first steps on how to use Theano, and we will implement some test code examples using Theano for deep learning in this book.

TensorFlow

TensorFlow (https://www.tensorflow.org) works very similarly to Theano, and in TensorFlow, computations are also represented as graphs. A TensorFlow graph is therefore a description of computations. In TensorFlow, you do not need to explicitly require the use of your GPU, rather, TensorFlow will automatically try to use your GPU if you have one, however if you have more than one GPU you must assign operations to each GPU explicitly, or only the first one will be used. To do this, you simply need to type the line:

with tensorflow.device("/gpu:1"):

Here, the following devices can be defined:

  • "/cpu:0": main CPU of your machine
  • "/gpu:0": first GPU of your machine, if one exists
  • "/gpu:1": second GPU of your machine, if it exists
  • "/gpu:2": third GPU of your machine, if it exists, and so on

Once again, we refer to the TensorFlow documentation for learning the first steps on how to use TensorFlow and test code examples using TensorFlow will be implemented in the book.

Keras

Keras (http://keras.io) is a neural net Python library that can run on top of either Theano or TensorFlow, even though it will run by default using TensorFlow. Instructions online are provided at http://keras.io/backend/. Keras can run on a CPU or GPU and to do that, if you run it on top of Theano, you will need to set up a .theanorc file as described before. Keras allows different ways to create deep neural networks, and it makes it easy by using a model for the neural network. The main type of model is the Sequential model which creates a linear stack of layers. You can then add new layers by simply calling the add function. In the coming section, we will create a few examples using Keras. Keras can be easily installed with the following, simple command:

pip install Keras

It can also be installed by forking from its Git repository and then running setup on it:

git clone https://github.com/fchollet/keras.git
cd keras
python setup.py install

However, we refer the reader to the online documentation for further information.

Sample deep neural net code using Keras

In this section, we will introduce some simple code to use Keras for the correct classification of digits using the popular dataset MNIST. MNIST is a dataset comprised of 70,000 examples of handwritten digits by many different people. The first 60,000 are typically used for training and the remaining 10,000 for testing.

Sample deep neural net code using Keras

Sample of digits taken from the MNIST dataset

One of the advantages of Keras is that it can import this dataset for you without the need to explicitly download it from the web (Keras will download it for you). This can be achieved by one simple line of code:

from keras.datasets import mnist

There are a few classes we need to import from Keras to use a classical deep neural net, and these are:

from keras.models import Sequential 
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

We are now ready to start writing our code to import the data, and we can do this with just one line:

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

This imports the training data and the testing data; in addition, both datasets are divided into two subsets: one that contains the actual images and the other that contains the labels. We need to slightly modify the data to be able to use it. The X_train and X_test data in fact is comprised of 60000 small (28,28)-pixels images, but we want to reshape each sample to be a 784-pixels long vector, rather than a (28,28) 2-dimensional matrix. This can be easily accomplished with these two lines:

X_train = X_train.reshape(60000, 784)     
X_test = X_test.reshape(10000, 784)

Similarly, the labels are indicating the value of the digit depicted by the images, and we want to convert this into a 10-entry vector with all zeroes and just one 1 in the entry corresponding to the digit, so for example 4 is mapped to [0, 0, 0, 0, 1, 0, 0, 0, 0, 0].

classes = 10
Y_train = np_utils.to_categorical(Y_train, classes)     
Y_test = np_utils.to_categorical(Y_test, classes)

Finally, before calling our main function, we just need to set the size of our input (the size of the mnist images), how many hidden neurons our hidden layer has, for how many epochs we want to try our network, and the batch size for our training:

input_size = 784
batch_size = 100     
hidden_neurons = 100     
epochs = 15
main(X_train, X_test, Y_train, Y_test)

We are now ready to write the code for our main function. Keras works by defining a model, and we will use the Sequential model, then add layers (in this case we will use regular dense, not sparse layers) specifying the number of input and output neurons. For each layer, we specify the activity function of its neurons:

model = Sequential()     
model.add(Dense(hidden_neurons, input_dim=input_size)) 
model.add(Activation('sigmoid'))     
model.add(Dense(classes, input_dim=hidden_neurons)) 
model.add(Activation('softmax'))

Keras now provides a simple way to specify the cost function (the loss) and its optimization (training rate, momentum, and so on). We are not going to modify the default values, so we can simply pass:

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='sgd')

In this example, the optimizer is sgd, which stands for stochastic gradient descent. At this point, we need to train the network, and this, similarly to scikit-learn, is done calling a fit function. We will use the verbose parameter so that we can follow the process:

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=epochs, verbose=1)

All that is left to do is to add code to evaluate our network on the test data and print the accuracy result, which is done simply by:

score = model.evaluate(X_test, Y_test, verbose=1)
print('Test accuracy:', score[1]) 

And that's it. It is now enough to run. The test accuracy will be about 94%, which is not a great result, but this example runs in less than 30 seconds on a CPU and is an extremely simple implementation. There are simple improvements that could be made, for example selecting a larger number of hidden neurons or selecting a larger number of epochs, and we leave those simple changes to the reader to familiarize himself or herself with the code.

Keras allows us to also look at the weight matrix it creates. To do that, it is enough to type the following line:

weights = model.layers[0].get_weights()

By adding the following lines to our previous code, we can look at what the hidden neurons have learned:

import matplotlib.pyplot as plt     
import matplotlib.cm as cm     
w = weights[0].T          
for neuron in range(hidden_neurons):         
    plt.imshow(numpy.reshape(w[neuron], (28, 28)), cmap = cm.Greys_r) 
    plt.show()  

To get a clearer image, we have increased the number of epochs to 100 to get the following figure:

Sample deep neural net code using Keras

Composite figure with what was learned by all the hidden neurons

For simplicity, we have aggregated all the images for each neuron in a single figure that represents a composite for all the neurons. Clearly, since the initial images are very small and do not have lots of details (they are just digits), the features learned by the hidden neurons are not all that interesting, but it is already clear that each neuron is learning a different "shape".

The code for drawing above should be immediately clear; we just notice that the following line is importing cm:

import matplotlib.cm as cm 

This simply allows for a grayscale representation of the neurons, and it is used inside the imshow() call by passing in the option cmap = cm.Greys_r. This is because the mnist images are not color images but gray scale images.

The beauty of Keras is that it is easy to create neural nets, but it is also easy to download test datasets. Let's try to use the cifar10 dataset instead of the mnist dataset. Instead of digits, the cifar10 dataset is comprised of 10 classes of objects: airplanes, automobiles, birds, cats, deers, dogs, frogs, horses, ships, and trucks. To use the cifar10 dataset, it is enough to write:

from keras.datasets import cifar10

In place of the preceding code line:

from keras.datasets import mnist

Then, we need to make these changes to the code we wrote above:

(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()
X_train = X_train.reshape(50000, 3072)     
X_test = X_test.reshape(10000, 3072)
input_size = 3072

This is because there are only 50,000 training images (instead of 60,000) and because the images are colored (RGB) 32 x 32 pixel images, therefore their size is 3 x 32 x 32. We can keep everything else as before for now, however, if we run this example, we can see that our performance is now very bad, just around 20%. This is due to the fact that the data is much more complex and it requires a more complex neural network. In fact, most neural networks implemented for classification of images use some basic convolutional layers, that will be discussed only in Chapter 5, Image Recognition, however, for now we could try raising the number of hidden neurons to 3,000, and adding a second hidden layer with 2,000 neurons. We are also going to use the ReLU activity function in the first hidden layer.

To do this we simply need to write the following lines defining the model, instead of what we had before:

    model = Sequential()     
    model.add(Dense(3000, input_dim=input_size)) 
    model.add(Activation('sigmoid'))
    model.add(Dense(2000, input_dim=3000)) 
    model.add(Activation('sigmoid'))     
    model.add(Dense(classes, input_dim=2000)) 
    model.add(Activation('softmax'))

If we run this code, we will see that it will take much longer to train, however, at the end, we will have about a 60% accuracy rate for the training set, but only about a 50% accuracy for the test data. The much poorer accuracy rate, with respect to the much simpler mnist dataset, despite the larger network and the much longer training time, is due to the higher complexity of the data. In addition, by substituting the line where we fit the network with the following line:

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=epochs, validation_split=0.1, verbose=1)

We can also output during the process how the accuracy improves on a split 90/10 of the training data. This also shows that while the accuracy of the training keeps increasing during training, the accuracy of the validation set plateaus at some point, showing that the network starts to overfit and to saturate some parameters.

While this may seem like a failure of deep networks to deliver good accuracy on richer datasets, we will see that in fact, there are ways around this problem that will allow us to get better performances on even much more complex and larger datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset