3.1 Introduction
Finding circles is a classification problem. Given a set of geometric shapes, we want the deep learning system to classify a shape as either a circle or something else. This is much simpler than classifying faces or digits. It is a good way to determine how well your classification system works. We will apply a convolutional network to the problem as it is the most appropriate for classifying image data.
In this chapter, we will first generate a set of image data. This will be a set of ellipses, a subset of which will be circles. Then we will build the neural net, using convolution, and train it to identify the circles. Finally, we will test the neural network and try some different options for training options and layer architecture.
3.2 Structure
The convolutional network consists of multiple layers. Each layer has a specific purpose. The layers may be repeated with different parameters as part of the convolutional network. The layer types we will use are
- 1.
imageInputLayer
- 2.
convolution2dLayer
- 3.
batchNormalizationLayer
- 4.
reluLayer
- 5.
maxPooling2dLayer
- 6.
fullyConnectedLayer
- 7.
softmaxLayer
- 8.
classificationLayer
You can have multiple layers of each type of layer. Some convolutional nets have hundreds of layers. Krizhevsky [22] and Bai [2] give guidelines for organizing the layers. Studying the loss in the training and validation can guide you to improving your neural network.
3.2.1 imageInputLayer
says the image is RGB and 28 by 28 pixels.
3.2.2 convolution2dLayer
- 1.
Blurring filter – ones(3,3)/9
- 2.
Sharpening filter – [0 -1 0;-1 5 -1;0 -1 0]
- 3.
Horizontal Sobel filter for edge detection – [-1 -2 -1; 0 0 0; 1 2 1]
- 4.
Vertical Sobel filter for edge detection – [-1 0 1;-2 0 2;-1 0 1]
We create an n by n mask that we apply to an m by m matrix of data where m is greater than n. We start in the upper-left corner of the matrix, as shown in Figure 3.1. We multiply the mask times the corresponding elements in the input matrix and sum all the elements using two calls to sum. That is the first element of the convolved output. We then move it column by column until the highest column of the mask is aligned with the highest column of the input matrix. We then return it to the first column and increment the row. We continue until we have traversed the entire input matrix and our mask is aligned with the maximum row and maximum column.
The mask represents a feature. In effect, we are seeing if the feature appears in different areas of the image. Here is an example: we have a 2 by 2 mask with an L. The script Convolution.m demonstrates convolution.
We can have multiple masks. There is one bias and one weight for each element of the mask for each feature. In this case, the convolution works on the image itself. Convolutions can also be applied to the output of other convolutional layers or pooling layers. Pooling layers further condense the data. In deep learning, the masks are determined as part of the learning process. Each pixel in a mask has a weight and may have a bias; these are computed from the learning data. Convolution should be highlighting important features in the data. Subsequent convolution layers narrow down features. The MATLAB function has two inputs: the filterSize, specifying the height and width of the filters as either a scalar or an array of [h w], and numFilters, the number of filters.
3.2.3 batchNormalizationLayer
A batch normalization layer normalizes each input channel across a mini-batch. It automatically divides up the input channel into mini-batches, which are subsets of the entire batch. This reduces the sensitivity to the initialization.
3.2.4 reluLayer
Figure 3.2 shows the activation function. An alternative is a leaky reluLayer where the value is not zero for negative input values.
Figure 3.3 shows the leaky function. Below zero, it has a slight slope.
A leaky ReLU layer solves the dead ReLU problem where the network stops learning because the inputs to the activation problem are below zero, or whatever the threshold might be. It should let you worry a bit less on how you initialize the network.
3.2.5 maxPooling2dLayer
maxPooling2dLayer creates a layer that breaks the 2D input into rectangular pooling regions, and outputs the maximum value of each region. The input poolSize specifies the width and height of a pooling region. poolSize can have one element (for square regions) or two for rectangular regions. This is a way to reduce the number of inputs that need to be evaluated. Typical images have to be a mega-pixel, and it is not practical to use all pixels as inputs. Furthermore, most images, or two-dimensional entities of any sort, don’t really have enough information to require finely divided regions. You can experiment with pooling and see how it works for your application. An alternative is averagePooling2dLayer.
3.2.6 fullyConnectedLayer
3.2.7 softmaxLayer
In this case, the maximum is element four in both cases. This is just a method of smoothing the inputs. Softmax is used for multiclass classification because it guarantees a well-behaved probability distribution. Well behaved means that the sum of the probabilities is one.
3.2.8 classificationLayer
For regression, it is the residual sum of squares. A high loss means a bad fit.
3.2.9 Structuring the Layers
For our first net to identify circles, we will use the following set of layers. The first layer is the input layer, for the 32 × 32 images. These are relatively low-resolution images. You can visually determine which are ellipses or circles, so we would expect the neural network to be able to do the same. Nonetheless, the size of the input images is an important consideration. In our case, our images are tightly cropped around the shape. In a more general problem, the subject of interest, a cat, for example, might be in a general setting.
3.3 Generating Data: Ellipses and Circles
3.3.1 Problem
We want to generate images of ellipses and circles of arbitrary sizes and with different thicknesses in MATLAB.
3.3.2 Solution
Write a MATLAB function to draw circles and ellipses and extract image data from the figure window. Our function will create a set of ellipses and a fixed number of perfect circles as specified by the user. The actual plot and the resulting downsized image will both be shown in a figure window so you can track progress and verify that the images look as expected.
3.3.3 How It Works
- 1.
Get the frame with frame2im
- 2.
Resize to nP by nP using imresize
- 3.
Convert to grayscale using rgb2gray
Note that the image data originally ranges from 0 (black) to 255 (white), but is averaged to lighter gray pixels during the resize operation. The colorbar in the progress window shows you the span of the output image. The image looks black as before since it is plotted with imagesc, which automatically scales the image to use the entire colormap – in this case, the gray colormap.
Figure 3.4 shows the generated ellipses and the first image displayed.
The graphical output is shown in Figure 3.5. It first displays the 100 circles and then the 100 ellipses. It takes some time for the script to generate all the images.
If you open the resulting jpegs, you will see that they are in fact 32 × 32 images with gray circles and ellipses.
3.4 Training and Testing
3.4.1 Problem
We want to train and test our deep learning algorithm on a wide range of ellipses and circles.
3.4.2 Solution
The script that creates, trains, and tests the net is EllipsesNeuralNet.m.
3.4.3 How It Works
Figure 3.6 shows some of the ellipses used in the testing and training. They were obtained randomly from the set using randi.
- 1.
’sgdm’ – Stochastic gradient descent with momentum
- 2.
’adam’ – Adaptive moment estimation (ADAM)
- 3.
’rmsprop’ – Root mean square propagation (RMSProp)
The training window runs in real time with the training process. The window is shown in Figure 3.7. Our network starts with a 50% training accuracy (blue line) since we only have two classes, circles and ellipses. Our accuracy approaches 100% in just five epochs, indicating that our classes of images are readily distinguishable.
The loss plot shows how well we are doing. The lower the loss, the better the neural net. The loss plot approaches zero as the accuracy approaches 100%. In this case, the validation data loss (black dashed line) and the training data loss (red line) are about the same. This indicates good fitting of the neural net with the data. If the validation loss is greater than the training loss, the neural net is overfitting the data. Overfitting happens when you have an overly complex neural network. You can fit the training data, but it may not perform very well with new data, such as the validation data. For example, if you have a system which really is linear, and you fit it to a cubic equation, it might fit the data well but doesn’t model the real system. If the training loss is greater than the validation data loss, your neural net is underfitting. Underfitting happens when your neural net is too simple. The goal is to make both zero.
The one-set network is short enough that the whole thing can be visualized inside the window of analyzeNetwork, as in Figure 3.10. This function will check your layer architecture before you start training and alert you to any errors. The size of the activations and “Learnables” is displayed explicitly.