First solution – convolutional neural networks using MXNet

We start off with a solution similar to the one we developed at the end of the previous chapter, with CNNs using MXNet.

Again, we first split the dataset into two subsets for training (75%) and testing (25%) using the caret package:

> if (!require("caret"))  
+     install.packages("caret") 
> library (caret) 
> set.seed(42) 
> train_perc = 0.75 
> train_index <- createDataPartition(data.y, p=train_perc, list=FALSE) 
> train_index <- train_index[sample(nrow(train_index)),] 
> data_train.x <- data.x[train_index,] 
> data_train.y <- data.y[train_index] 
> data_test.x <- data.x[-train_index,] 
> data_test.y <- data.y[-train_index]

Don't forget to specify a particular random seed for reproducible work. We normally do data normalization before applying CNNs. In our case, the raw pixels loaded are already in the range of 0 to 1; after Y' brightness conversion, the resulting pixels are still in the range of 0 to 1.

In general, normalizing our data is necessary before putting it into a CNN or in fact any neural network or gradient-descent-based model. As for image inputs, we usually scale the pixels in the range of 0 to 1.

After preparing the training and testing set, we structure our CNN model as follows:

The network starts with the first set of convolutional layers, the ReLu nonlinear layer and the pooling layer. Here we use 32 5*5 convolutional filters and a 2*2 max pooling filter:

> require(mxnet) 
> data <- mx.symbol.Variable("data") 
> # first convolution 
> conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5),  
                                 num_filter=32) 
> act1 <- mx.symbol.Activation(data=conv1, act_type="relu") 
> pool1 <- mx.symbol.Pooling(data=act1, pool_type="max", 
+                            kernel=c(2,2), stride=c(2,2)) 
It follows with the second set of convolutional, ReLu nonlinear and pooling layer, where 64 5*5 convolutional filters and 2*2 max pooling filter are used: 
> # second convolution 
> conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5),  
                                 num_filter=64) 
> act2 <- mx.symbol.Activation(data=conv2, act_type="relu") 
> pool2 <- mx.symbol.Pooling(data=act2, pool_type="max", 
+                            kernel=c(2,2), stride=c(2,2))

Now that we've extracted rich representations of the input images by detecting edges, curves, and shapes, we move on to the fully connected layers. But before doing so, we need to flatten the resulting feature maps from the previous convolution layers:

> flatten <- mx.symbol.Flatten(data=pool2)

In the fully connected section, we apply a ReLu hidden layer with 1,000 units and a softmax layer with 43 units:

> # first fully connected layer 
> fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=1000) 
> act3 <- mx.symbol.Activation(data=fc1, act_type="relu") 
> # second fully connected layer 
> fc2 <- mx.symbol.FullyConnected(data=act3, num_hidden=43)

Finally, the softmax layer, producing outputs for each of the 43 classes:

> # softmax output 
> softmax <- mx.symbol.SoftmaxOutput(data=fc2, name="sm")

All pieces of the network are now assembled. Before we start training the model, we specify the random seed and training devices and reshape the matrix data_train.x into an array as required by the convolutional layer in MXNet:

> devices <- mx.cpu() 
> train.array <- t(data_train.x) 
> dim(train.array) <- c(32, 32, 1, nrow(data_train.x)) 
> mx.set.seed(42)

Time for model training:

> model_cnn <- mx.model.FeedForward.create(softmax, X=train.array,  
y=data_train.y, ctx=devices, num.round=30,  
array.batch.size=100,learning.rate=0.05, momentum=0.9,  
wd=0.00001, eval.metric=mx.metric.accuracy,            
               epoch.end.callback=mx.callback.log.train.metric(100))

Start training with one device:

[1] Train-accuracy=0.184965986394558 
[2] Train-accuracy=0.824610169491525 
[3] Train-accuracy=0.949389830508475 
[4] Train-accuracy=0.968305084745763 
[5] Train-accuracy=0.983050847457628 
[6] Train-accuracy=0.988372881355934 
[7] Train-accuracy=0.990745762711866 
[8] Train-accuracy=0.993152542372882 
[9] Train-accuracy=0.992576271186442 
[10] Train-accuracy=0.994372881355933 
[11] Train-accuracy=0.99542372881356 
[12] Train-accuracy=0.995118644067798 
[13] Train-accuracy=0.99671186440678 
[14] Train-accuracy=0.999830508474576 
[15] Train-accuracy=0.999932203389831 
[16] Train-accuracy=1 
[17] Train-accuracy=1 
[18] Train-accuracy=1 
[19] Train-accuracy=1 
[20] Train-accuracy=1 
[21] Train-accuracy=1 
[22] Train-accuracy=1 
[23] Train-accuracy=1 
[24] Train-accuracy=1 
[25] Train-accuracy=1 
[26] Train-accuracy=1 
[27] Train-accuracy=1 
[28] Train-accuracy=1 
[29] Train-accuracy=1 
[30] Train-accuracy=1

We just fit our model with hyperparameters:

num.round =30: The maximum number of iterations
array.batch.size = 100: The batch size of mini-batch gradient descent
learning.rate = 0.05: The learning rate
momentum=0.9: The momentum factor that determines how much of the previous velocity is incorporated into the current one
eval.metric=mx.metric.accuracy: This uses classification accuracy as the evaluation metric
initializer=mx.init.uniform(0.1): Initial weights are randomly generated from the uniform distribution between -0.1 and 0.1 so as to lower the chances of weights exploding and vanishing in the deep network
wd = 0.00001: The weight decay for L2 regularization, which adds penalties for large weights in order to avoid overfitting

We can view the structure of the model by:

> graph.viz(model_cnn$symbol)

The trained model is then applied to the testing set:

> test.array <- t(data_test.x) 
> dim(test.array) <- c(32, 32, 1, nrow(data_test.x)) 
> prob_cnn <- predict(model_cnn, test.array) 
> prediction_cnn <- max.col(t(prob_cnn)) - 1

We compute the confusion matrix and classification accuracy as follows:

> cm_cnn = table(data_test.y, prediction_cnn) 
> cm_cnn

The first half of the resulting confusion matrix:

The second half:

Misclassification occurs only in several rare cases regardless of large variations in appearances of images due to illumination changes, partial occlusions, rotations, weather conditions, and so on:

> accuracy_cnn = mean(prediction_cnn == data_test.y) 
> accuracy_cnn 
[1] 0.9930612

We just built a robust CNN model that correctly classifies more than 99.30% of the testing signs. The CNN model efficiently learns the representations by deriving low-level and high-level features. It makes those hand-crafted features obsolete, as it captures important and distinguishable features by itself from the sign images.

Now that we have achieved great success using our favorite (so far) deep learning tool MXNet, why don't we explore other tools that are also powerful? In fact, another deep learning API called Keras has been gaining popularity recently; its backend, TensorFlow, is probably the best known deep learning framework.

Table of Contents for First solution&#xA0;&#x2013; convolutional neural networks using MXNet

Create new playlist

Sign In

Sign Up

Table of Contents for
First solution – convolutional neural networks using MXNet