Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

M. Paluszek et al.Practical MATLAB Deep Learninghttps://doi.org/10.1007/978-1-4842-7912-0_3

3. Finding Circles with Deep Learning

Michael Paluszek¹, Stephanie Thomas² and Eric Ham²

(1)

Plainsboro, NJ, USA

(2)

Princeton, NJ, USA

Abstract

Finding circles is a classification problem. Given a set of geometric shapes, we want the deep learning system to classify a shape as either a circle or something else. This is much simpler than classifying faces or digits. It is a good way to determine how well your classification system works. We will apply a convolutional network to the problem as it is the most appropriate for classifying image data.

3.1 Introduction

In this chapter, we will first generate a set of image data. This will be a set of ellipses, a subset of which will be circles. Then we will build the neural net, using convolution, and train it to identify the circles. Finally, we will test the neural network and try some different options for training options and layer architecture.

3.2 Structure

The convolutional network consists of multiple layers. Each layer has a specific purpose. The layers may be repeated with different parameters as part of the convolutional network. The layer types we will use are

1.
imageInputLayer
2.
convolution2dLayer
3.
batchNormalizationLayer
4.
reluLayer
5.
maxPooling2dLayer
6.
fullyConnectedLayer
7.
softmaxLayer
8.
classificationLayer

You can have multiple layers of each type of layer. Some convolutional nets have hundreds of layers. Krizhevsky [22] and Bai [2] give guidelines for organizing the layers. Studying the loss in the training and validation can guide you to improving your neural network.

3.2.1 imageInputLayer

This tells the network the size of the images. For example:

says the image is RGB and 28 by 28 pixels.

3.2.2 convolution2dLayer

Convolution is the process of highlighting expected features in an image. This layer applies sliding convolutional filters on an image to extract features. You can specify the filters and the stride. Convolution is a matrix multiplication operation. You define the size of the matrices and their contents. For most images, like images of faces, you need multiple filters. Some types of filters are

1.
Blurring filter – ones(3,3)/9
2.
Sharpening filter – [0 -1 0;-1 5 -1;0 -1 0]
3.
Horizontal Sobel filter for edge detection – [-1 -2 -1; 0 0 0; 1 2 1]
4.
Vertical Sobel filter for edge detection – [-1 0 1;-2 0 2;-1 0 1]

We create an n by n mask that we apply to an m by m matrix of data where m is greater than n. We start in the upper-left corner of the matrix, as shown in Figure 3.1. We multiply the mask times the corresponding elements in the input matrix and sum all the elements using two calls to sum. That is the first element of the convolved output. We then move it column by column until the highest column of the mask is aligned with the highest column of the input matrix. We then return it to the first column and increment the row. We continue until we have traversed the entire input matrix and our mask is aligned with the maximum row and maximum column.

The mask represents a feature. In effect, we are seeing if the feature appears in different areas of the image. Here is an example: we have a 2 by 2 mask with an L. The script Convolution.m demonstrates convolution.

Figure 3.1
Convolution process showing the mask at the beginning and end of the process.

The number 3 appears in the input matrix where the “L” is in the image.

We can have multiple masks. There is one bias and one weight for each element of the mask for each feature. In this case, the convolution works on the image itself. Convolutions can also be applied to the output of other convolutional layers or pooling layers. Pooling layers further condense the data. In deep learning, the masks are determined as part of the learning process. Each pixel in a mask has a weight and may have a bias; these are computed from the learning data. Convolution should be highlighting important features in the data. Subsequent convolution layers narrow down features. The MATLAB function has two inputs: the filterSize, specifying the height and width of the filters as either a scalar or an array of [h w], and numFilters, the number of filters.

3.2.3 batchNormalizationLayer

A batch normalization layer normalizes each input channel across a mini-batch. It automatically divides up the input channel into mini-batches, which are subsets of the entire batch. This reduces the sensitivity to the initialization.

3.2.4 reluLayer

reluLayer is a layer that uses the rectified linear unit (ReLU) activation function:

$displaystyle egin{aligned} f(x) = left{ egin{array}{ll} x & x >= 0 \ 0 & x < 0 end{array} ight. end{aligned}$

(3.1)

Its derivative is

$displaystyle egin{aligned} frac{df}{dx}= left{ egin{array}{ll} 1 & x >= 0 \ 0 & x < 0 end{array} ight. end{aligned}$

(3.2)

This is very fast to compute. It says that the neuron is only activated for positive values, and the activation is linear for any value greater than zero. You can adjust the activation point with a bias. This code snippet generates a plot of reluLayer:

Figure 3.2 shows the activation function. An alternative is a leaky reluLayer where the value is not zero for negative input values.

Now the difference is the y computation in the following snippet:

Figure 3.3 shows the leaky function. Below zero, it has a slight slope.

A leaky ReLU layer solves the dead ReLU problem where the network stops learning because the inputs to the activation problem are below zero, or whatever the threshold might be. It should let you worry a bit less on how you initialize the network.

3.2.5 maxPooling2dLayer

maxPooling2dLayer creates a layer that breaks the 2D input into rectangular pooling regions, and outputs the maximum value of each region. The input poolSize specifies the width and height of a pooling region. poolSize can have one element (for square regions) or two for rectangular regions. This is a way to reduce the number of inputs that need to be evaluated. Typical images have to be a mega-pixel, and it is not practical to use all pixels as inputs. Furthermore, most images, or two-dimensional entities of any sort, don’t really have enough information to require finely divided regions. You can experiment with pooling and see how it works for your application. An alternative is averagePooling2dLayer.

3.2.6 fullyConnectedLayer

The fully connected layer connects all of the inputs to the outputs with weights and biases. For example:

creates ten outputs from any number of inputs. You don’t have to specify the inputs. Effectively, this is the equation:

$$displaystyle egin{aligned} y = ax + b end{aligned} $$

(3.3)

If there are m inputs and n outputs, b is a column bias matrix of length n and a is n by m.

3.2.7 softmaxLayer

Softmax finds a maximum of a set of values using the logistic function. The softmax is the maximum value of the set

$displaystyle egin{aligned} p_k = frac{e^q_k}{sum e^{q_k}} end{aligned}$

(3.4)

In this case, the maximum is element four in both cases. This is just a method of smoothing the inputs. Softmax is used for multiclass classification because it guarantees a well-behaved probability distribution. Well behaved means that the sum of the probabilities is one.

3.2.8 classificationLayer

A classification layer computes the cross-entropy loss for multiclass classification problems with mutually exclusive classes. Let us define loss. The loss is the sum of the errors in training the neural net. It is not a percentage. For classification, the loss is usually the negative log likelihood, which is

$$displaystyle egin{aligned} L(y) = -log(y) end{aligned} $$

(3.5)

where y is the output of the softmax layer.

For regression, it is the residual sum of squares. A high loss means a bad fit.

Cross-entropy loss is a useful metric when an item being classified can only be in one class. The number of classes is inferred from the output of the previous layer. In this problem, we have only two classes, circle and ellipse, so the number of outputs of the previous layer must be two. Cross-entropy is the distance between the original probability distribution and what the model believes it should be. It is defined as

$displaystyle egin{aligned} H(y,p) = -sum_i y_ilog{p_i} end{aligned}$

(3.6)

where i is the index for the class. It is a widely used replacement for mean squared error. It is used in neural nets where softmax activations are in the output layer.

3.2.9 Structuring the Layers

For our first net to identify circles, we will use the following set of layers. The first layer is the input layer, for the 32 × 32 images. These are relatively low-resolution images. You can visually determine which are ellipses or circles, so we would expect the neural network to be able to do the same. Nonetheless, the size of the input images is an important consideration. In our case, our images are tightly cropped around the shape. In a more general problem, the subject of interest, a cat, for example, might be in a general setting.

We use a convolution2dLayer, batchNormalizationLayer, and reluLayer in sequence, with a pool layer in between. There are three sets of convolution layers, each with an increasing number of filters. The output set of layers consists of a fullyConnectedLayer, softmaxLayer, and, finally, the classificationLayer.

3.3 Generating Data: Ellipses and Circles

3.3.1 Problem

We want to generate images of ellipses and circles of arbitrary sizes and with different thicknesses in MATLAB.

3.3.2 Solution

Write a MATLAB function to draw circles and ellipses and extract image data from the figure window. Our function will create a set of ellipses and a fixed number of perfect circles as specified by the user. The actual plot and the resulting downsized image will both be shown in a figure window so you can track progress and verify that the images look as expected.

3.3.3 How It Works

This is implemented in GenerateEllipses.m. The output of the function is a cell array with both the ellipse data and a set of image data obtained from a MATLAB figure using getframe. The function also outputs the type of image, that is, the “truth” data.

The first section of the code generates random ellipses and circles. They are all centered in the image.

The next section produces a 3D plot showing all the ellipses and circles. This is just to show you what you have produced. The code puts all the ellipses between z ± 1. You might want to adjust this when generating larger numbers of ellipses.

The next section converts the frames to nP by nP sized images in grayscale. We set the figure and the axis to be square, and set the axis to ’equal’, so that the circles will have the correct aspect ratio and in fact be circular in the images. Otherwise, they would also appear as ellipses, and our neural net would not be able to categorize them. This code block also draws the resulting resized image on the right-hand side of the window, with a title showing the current step. There is a brief pause between each step. In effect, it is an animation that serves to inform you of the script’s progress.

The conversion is rgb2gray(imresize(frame2im(frame),[nP nP])), which performs these steps:

1.
Get the frame with frame2im
2.
Resize to nP by nP using imresize
3.
Convert to grayscale using rgb2gray

Note that the image data originally ranges from 0 (black) to 255 (white), but is averaged to lighter gray pixels during the resize operation. The colorbar in the progress window shows you the span of the output image. The image looks black as before since it is plotted with imagesc, which automatically scales the image to use the entire colormap – in this case, the gray colormap.

The built-in demo generates ten ellipses and five circles.

Figure 3.4 shows the generated ellipses and the first image displayed.

The script CreateEllipses.m generates 100 ellipses and 100 circles and stores them in the Ellipses folder along with the type of each image. Note that we have to do a small trick with the filename. If we simply append the image number to the filename, 1, 2, 3, …200, the images will not be in this order in the datastore; in alphabetical order, the images would be sorted as 1, 10, 100, 101, and so on. In order to have the filenames in alphabetical order match the order we are storing with the type, we generate a number a factor of 10 higher than the number of images and add it to the image index before appending it to the file. Now we have image 1001, 1002, and so on.

Figure 3.4
Ellipses and a resulting image.

The graphical output is shown in Figure 3.5. It first displays the 100 circles and then the 100 ellipses. It takes some time for the script to generate all the images.

If you open the resulting jpegs, you will see that they are in fact 32 × 32 images with gray circles and ellipses.

This recipe provides the data that will be used for the deep learning examples in the following sections. You must run CreateEllipses.m before you can run the neural net examples.

Figure 3.5
Ellipses and a resulting image. 100 circles and 100 ellipse images are stored.

3.4 Training and Testing

3.4.1 Problem

We want to train and test our deep learning algorithm on a wide range of ellipses and circles.

3.4.2 Solution

The script that creates, trains, and tests the net is EllipsesNeuralNet.m.

3.4.3 How It Works

First, we need to load in the generated images. The script in Recipe 3.3 generates 200 files. Half are circles and half ellipses. We will load them into an image datastore. We display a few images from the set to make sure we have the correct data and it is tagged correctly – that is, that the files have been correctly matched to their type, circle (1) or ellipse (0).

Once we have the data, we need to create the training and testing sets. We have 100 files with each label (0 or 1, for an ellipse or circle). We create a training set of 80% of the files and reserve the remaining as a test set using splitEachLabel. Labels could be names, like “circle” and “ellipse.” You are generally better off with descriptive “labels.” After all, a 0 or 1 could mean anything. The MATLAB software handles many types of labels.

The layers of the net are defined as in the previous recipe. The next step is training. The trainNetwork function takes the data, set of layers, and options, runs the specified training algorithm, and returns the trained network. This network is then invoked with the classify function, as shown later in this recipe. This network is a series network. The network has other methods which you can read about in the MATLAB documentation.

Figure 3.6 shows some of the ellipses used in the testing and training. They were obtained randomly from the set using randi.

The training options need explanation. This is a subset of the parameter pairs available for trainingOptions. The first input to the function, ’sgdm’, specifies the training method. There are three to choose from:

1.
’sgdm’ – Stochastic gradient descent with momentum
2.
’adam’ – Adaptive moment estimation (ADAM)
3.
’rmsprop’ – Root mean square propagation (RMSProp)

The ’InitialLearnRate’ is the initial speed of learning. Higher learn rates mean faster learning, but the training may get stuck in a suboptimal point. The default rate for sgdm is 0.01. ’MaxEpochs’ is the maximum number of epochs to be used in the training. In each epoch, the training sees the entire training set, in batches of MiniBatchSize. The number of iterations in each epoch is therefore determined by the amount of data in the set and the MiniBatchSize. We are using a smaller dataset, so we reduce the MiniBatchSize from the default of 128 to 16, which will give us ten iterations per epoch. ’Shuffle’ tells the training how often to shuffle the training data. If you don’t shuffle, the data will always be used in the same order. Shuffling should improve the accuracy of the trained neural network. ’ValidationFrequency’ is how often, in number of iterations, ’ValidationData’ is used to verify the trained neural network. This validation data will be the data we reserved for testing when using splitEachLabel. The default frequency is every 30 iterations. We can use a validation frequency for our small problem of one, two, or five iterations. ’Verbose’ means print out status information to the command window. ’Plots’ only has the option ’training-progress’ (besides ’none’). This is the plot you see in this chapter.

Figure 3.6
A subset of the ellipses used in the training and testing.

The training window runs in real time with the training process. The window is shown in Figure 3.7. Our network starts with a 50% training accuracy (blue line) since we only have two classes, circles and ellipses. Our accuracy approaches 100% in just five epochs, indicating that our classes of images are readily distinguishable.

The loss plot shows how well we are doing. The lower the loss, the better the neural net. The loss plot approaches zero as the accuracy approaches 100%. In this case, the validation data loss (black dashed line) and the training data loss (red line) are about the same. This indicates good fitting of the neural net with the data. If the validation loss is greater than the training loss, the neural net is overfitting the data. Overfitting happens when you have an overly complex neural network. You can fit the training data, but it may not perform very well with new data, such as the validation data. For example, if you have a system which really is linear, and you fit it to a cubic equation, it might fit the data well but doesn’t model the real system. If the training loss is greater than the validation data loss, your neural net is underfitting. Underfitting happens when your neural net is too simple. The goal is to make both zero.

Finally, we test the net. Remember that this is a classification problem. An image is either an ellipse or a circle. We therefore use classify to implement the network. predLabels is the output of the net, that is, the predicted labels for the test data. This is compared to the truth labels from the datastore to compute an accuracy.

Figure 3.7
The training window with a learn rate of 0.01. The top plot is the accuracy expressed as a percentage. The bottom plot shows the loss. As the accuracy increases, the loss decreases. The plots show the results with the training set and the validation set. The elapsed time is three seconds.

The output of the testing is shown in the following. The accuracy of this run was 97.50%. On some runs, the net reaches 100%.

Figure 3.8
The training window with a learn rate of 0.01 and a leaky reluLayer. The elapsed time is four seconds.

We can try different activation functions. The script EllipsesNeuralNetLeaky.m shows a leaky reluLayer. We replaced reluLayer with leakyReluLayer. The output is similar, but in this case, learning was achieved even faster than before. See Figure 3.8 for a training run.

The output with the leaky layer is as follows:

We can try fewer layers. EllipsesNeuralNetOneLayer.m has only one set of layers.

Figure 3.9
The training window for a net with one set of layers.

The training window is shown in Figure 3.9. The results with only one set of layers are not as good as the neural network with multiple layers. The accuracy is reduced but it is still acceptable. This shows that you need to try different options with your net architecture as well.

The one-set network is short enough that the whole thing can be visualized inside the window of analyzeNetwork, as in Figure 3.10. This function will check your layer architecture before you start training and alert you to any errors. The size of the activations and “Learnables” is displayed explicitly.

In this chapter, we both generated our own image data and trained a neural net to classify features in our images. In this example, we were able to achieve 100% accuracy, but not after some debugging was required in creating and naming the images. It is critical to carefully examine your training and test data to ensure it contains the features you wish to identify. You should be prepared to experiment with your layers and training parameters as you develop nets for different problems.

Figure 3.10
The analyze window for the one-set convolutional network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.