Trying something new – CNNs using Keras with TensorFlow

Keras (https://keras.io/) is a high-level deep learning API written in Python that can run on top of any of these three deep learning frameworks: TensorFlow (from Google), CNTK (from Microsoft), and Theano (from the Montreal Institute for Learning Algorithms, Université de Montréal, Canada). To solve a machine learning problem efficiently, being able to quickly prototype ideas is the key. And this is why Keras was developed initially, to facilitate fast experimentation in the following key aspects:

  • User-friendly API built on top of multiple powerful backends, including TensorFlow, CNTK, and Theano.
  • Built-in CNN, RNN, and autoencoder models as well as support classes and methods (metrics, optimizers, regularizers, visualization, and so on), which enable easy and fast prototyping.
  • Excellent modularity and extensibility. These allow for customized network architectures: Multiple input, multiple output, layer sharing, model sharing, memory-based network, and so on.
  • Allowing the same code to run seamlessly on CPU and GPU.

For R users, the R interface to Keras (https://keras.rstudio.com/) was developed in 2017, and its adoption by the community has gradually grown. Let's first install the keras R package from GitHub as follows:

> if (!require("keras"))  
+     devtools::install_github("rstudio/keras") 
> library(keras) 

This is not finished yet. We need to install the underlying backend(s) that Keras connects to. By default, it uses TensorFlow as the backend engine. We can use the following function to install the TensorFlow backend:

> install_keras() 
Using existing virtualenv at  ~/.virtualenvs/r-tensorflow  
Upgrading pip ... 
...... 
Installation complete. 

While we are waiting for the installation, let's learn a bit more about TensorFlow.

TensorFlow (https://www.tensorflow.org/) is an open source machine learning framework created by Google. It is well known for being used to design, build, and train deep learning models, but it can also be used for general numerical computation. In TensorFlow, computation is described using data flow graphs, where each node in a graph represents an instance of a mathematical operation and each edge represents a multidimensional data array (the so-called tensor, which can hold a matrix, vector, or scalar) on which the operations are performed. Such flexible architecture allows us to efficiently perform data-crunching machine learning operations, such as derivatives on huge matrices. Here is an example of a data flow graph:

Now we see where its name TensorFlow is derived from: Tensors flowing in networks.

By now, the installation of the core Keras library as well as the TensorFlow backend is done. We continue our Keras-based solution for traffic signs classification.

First, prepare the input data for Keras modeling by reshaping the training and testing feature matrix:

> x_train <- data_train.x 
> dim(x_train) <- c(nrow(data_train.x), 32, 32, 1) 
> x_test <- data_test.x 
> dim(x_test) <- c(nrow(data_test.x), 32, 32, 1) 

Again, the input pixels are already of values ranging from 0 to 1, so we do not need to perform any rescaling.

We also convert training and testing target vectors (integers from 0 to 42) into a binary class matrix (one-hot encoded) as required by the Keras classification models:

> y_train <- to_categorical(data_train.y, num_classes = 43) 
> y_test <- to_categorical(data_test.y, num_classes = 43) 

Before we start modeling, there is a trick for obtaining reproducible results in Keras in R. It is specifying a random seed before development of a model using the following function:

> use_session_with_seed(42) 

It takes all measures known to ensure reproducible results from Keras sessions.

Time to define the model after data preparation!

We begin by initializing the Keras sequential model as follows:

> model <- keras_model_sequential() 

Then we add the first set of convolutional layers, the ReLu nonlinear layer and the pooling layer, with the same parameters used in the previous MXNet solution (same parameters are used for rest of the network):

> model %>% 

Start with a hidden 2D convolutional layer being fed 32*32 pixel images:

+   layer_conv_2d( 
+     filter = 32, kernel_size = c(5,5),  
+     input_shape = c(32, 32, 1) 
+   ) %>% 
+   layer_activation("relu") %>% 
+   layer_max_pooling_2d(pool_size = c(2,2)) %>% 

Note that we use the pipe (%>%) operator to add layers to the Keras sequential model.

It follows with the second set of convolutional, ReLu nonlinear, and pooling layer:

+   # Second hidden convolutional layer layer 
+   layer_conv_2d(filter = 64, kernel_size = c(5,5)) %>% 
+   layer_activation("relu") %>% 
+   layer_max_pooling_2d(pool_size = c(2,2)) %>% 

Flatten the resulting feature maps from the previous convolution layers:

+   layer_flatten() %>% 

And feed into a dense layer:

+   layer_dense(1000) %>% 
+   layer_activation("relu") %>% 

Finally, connect to a softmax layer containing 43 output units:

+   layer_dense(43) %>% 
+   layer_activation("softmax") 

We can use the summary() function to view the details of the model:

> summary(model)

Depending on when the model is constructed, the names of the layers may have different suffixes (_1, _2 for example).

All the pieces of the CNN model are now assembled. Before compiling the model, we need to explicitly specify its optimizer. In MXNet, the optimizer is a parameter in the mx.model.FeedForward.create method with stochastic gradient descent (SGD) as the default value. In Keras, we use the same optimizer with the same learning rate and momentum:

> opt <- optimizer_sgd(lr = 0.005, momentum = 0.9)

Along with the optimizer we just defined, cross entropy as the loss function, and classification accuracy as the metric, we compile the CNN model:

> model %>% compile( 
+   loss = "categorical_crossentropy", 
+   optimizer = opt, 
+   metrics = "accuracy" 
+ ) 

We can now safely kick off our model training. Again, the same hyperparameters are used, including batch size and number of iterations. Note that the testing dataset is used for model validation, where the classification performance of the current CNN model is computed for each training iteration. Last but not least, shuffle is not necessary in our case as the training data is already shuffled after raw data splits:

> model %>% fit( 
+   x_train, y_train, 
+   batch_size = 100, 
+   epochs = 30, 
+   validation_data = list(x_test, y_test), 
+   shuffle = FALSE 
+ ) 
Train on 29409 samples, validate on 9800 samples 
Epoch 1/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 2.8031 - acc: 0.2823 - val_loss: 1.1719 - val_acc: 0.6733 
Epoch 2/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.6438 - acc: 0.8372 - val_loss: 0.4079 - val_acc: 0.8891 
Epoch 3/30 
29409/29409 [==============================] - 110s 4ms/step - loss: 0.3154 - acc: 0.9217 - val_loss: 0.2623 - val_acc: 0.9336 
Epoch 4/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.1969 - acc: 0.9533 - val_loss: 0.2096 - val_acc: 0.9483 
Epoch 5/30 
29409/29409 [==============================] - 24703s 840ms/step - loss: 0.1410 - acc: 0.9682 - val_loss: 0.1715 - val_acc: 0.9604 
Epoch 6/30 
29409/29409 [==============================] - 1076s 37ms/step - loss: 0.1055 - acc: 0.9761 - val_loss: 0.1363 - val_acc: 0.9690 
Epoch 7/30 
29409/29409 [==============================] - 34344s 1s/step - loss: 0.0860 - acc: 0.9806 - val_loss: 0.1147 - val_acc: 0.9742 
Epoch 8/30 
29409/29409 [==============================] - 104s 4ms/step - loss: 0.0698 - acc: 0.9841 - val_loss: 0.1065 - val_acc: 0.9756 
Epoch 9/30 
29409/29409 [==============================] - 108s 4ms/step - loss: 0.0535 - acc: 0.9874 - val_loss: 0.1015 - val_acc: 0.9780 
Epoch 10/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0414 - acc: 0.9913 - val_loss: 0.0927 - val_acc: 0.9801 
Epoch 11/30 
29409/29409 [==============================] - 108s 4ms/step - loss: 0.0415 - acc: 0.9917 - val_loss: 0.0912 - val_acc: 0.9807 
Epoch 12/30 
29409/29409 [==============================] - 106s 4ms/step - loss: 0.0341 - acc: 0.9933 - val_loss: 0.1054 - val_acc: 0.9769 
Epoch 13/30 
29409/29409 [==============================] - 108s 4ms/step - loss: 0.0266 - acc: 0.9946 - val_loss: 0.0811 - val_acc: 0.9842 
Epoch 14/30 
29409/29409 [==============================] - 106s 4ms/step - loss: 0.0207 - acc: 0.9965 - val_loss: 0.0790 - val_acc: 0.9845 
Epoch 15/30 
29409/29409 [==============================] - 106s 4ms/step - loss: 0.0221 - acc: 0.9955 - val_loss: 0.0780 - val_acc: 0.9841 
Epoch 16/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0169 - acc: 0.9974 - val_loss: 0.0753 - val_acc: 0.9854 
Epoch 17/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0137 - acc: 0.9982 - val_loss: 0.0777 - val_acc: 0.9863 
Epoch 18/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0114 - acc: 0.9986 - val_loss: 0.0757 - val_acc: 0.9863 
Epoch 19/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0101 - acc: 0.9990 - val_loss: 0.0775 - val_acc: 0.9867 
Epoch 20/30 
29409/29409 [==============================] - 108s 4ms/step - loss: 0.0086 - acc: 0.9993 - val_loss: 0.0786 - val_acc: 0.9862 
Epoch 21/30 
29409/29409 [==============================] - 110s 4ms/step - loss: 0.0077 - acc: 0.9994 - val_loss: 0.0776 - val_acc: 0.9859 
Epoch 22/30 
29409/29409 [==============================] - 110s 4ms/step - loss: 0.0071 - acc: 0.9995 - val_loss: 0.0774 - val_acc: 0.9862 
Epoch 23/30 
29409/29409 [==============================] - 109s 4ms/step - loss: 0.0066 - acc: 0.9996 - val_loss: 0.0779 - val_acc: 0.9862 
Epoch 24/30 
29409/29409 [==============================] - 110s 4ms/step - loss: 0.0062 - acc: 0.9997 - val_loss: 0.0783 - val_acc: 0.9860 
Epoch 25/30 
29409/29409 [==============================] - 114s 4ms/step - loss: 0.0059 - acc: 0.9997 - val_loss: 0.0786 - val_acc: 0.9859 
Epoch 26/30 
29409/29409 [==============================] - 115s 4ms/step - loss: 0.0056 - acc: 0.9998 - val_loss: 0.0791 - val_acc: 0.9861 
Epoch 27/30 
29409/29409 [==============================] - 117s 4ms/step - loss: 0.0053 - acc: 0.9998 - val_loss: 0.0793 - val_acc: 0.9860 
Epoch 28/30 
29409/29409 [==============================] - 115s 4ms/step - loss: 0.0051 - acc: 0.9998 - val_loss: 0.0794 - val_acc: 0.9862 
Epoch 29/30 
29409/29409 [==============================] - 114s 4ms/step - loss: 0.0050 - acc: 0.9998 - val_loss: 0.0795 - val_acc: 0.9864 
Epoch 30/30 
29409/29409 [==============================] - 113s 4ms/step - loss: 0.0048 - acc: 0.9998 - val_loss: 0.0796 - val_acc: 0.9865  

After 30 epochs, the model is well trained, with 98.65% accuracy achieved on the testing set. In the RStudio viewer pane, we can also see the classification performance for each epoch in real time:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset