© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
M. Paluszek et al.Practical MATLAB Deep Learninghttps://doi.org/10.1007/978-1-4842-7912-0_11

11. Image Classification

Michael Paluszek1  , Stephanie Thomas2   and Eric Ham2  
(1)
Plainsboro, NJ, USA
(2)
Princeton, NJ, USA
 

11.1 Introduction

Image classification can be done with pretrained networks. MATLAB makes it easy to access and use these networks. This chapter shows you two examples. First, we will use AlexNet, then GoogLeNet.

11.2 Using AlexNet

11.2.1 Problem

We want to use the pretrained network AlexNet for image classification.

11.2.2 Solution

Depending on your version of MATLAB, install AlexNet from the Add-On Explorer or download the support package for GoogLeNet. Load some images and test. These are classification networks, so we will use classify to run them.

11.2.3 How It Works

First, we need to download the support packages with the Add-On Explorer. If you attempt to run alexnet or googlenet without having them installed, you will get a link directly to the package in the Add-On Explorer. You will need your MathWorks password.

AlexNet is a pretrained convolutional neural network (CNN) that has been trained on approximately 1.2 million images from the ImageNet Dataset (http://​image-net.​org/​index). The model has 25 layers and can classify images into 1000 object categories. It can be used for all sorts of object classification. However, if an object was not in the training set, it won’t be able to identify the object. If a banana was in the training set, you could expect the CNN to correctly identify a new picture of a banana. But if you gave it a picture of a plantain, and plantain was NOT in the CNN, then it might not find a match or, more likely, it might incorrectly classify it like a banana.
The network layers printout is shown as follows:

There are many layers in this convolutional network. ReLU and softmax are the activation functions. In the first layer, “zero center” normalization is used. This means the images are normalized to have a mean of zero and a standard deviation of one. Two layers are new, Cross Channel Normalization and Grouped Convolution. Filter groups, also known as grouped convolution, were introduced with AlexNet in 2012. You can think of the output of each filter as a channel and filter groups as groups of the channels. Filter groups allowed more efficient parallelization across GPUs. They also improved performance. Cross Channel Normalization normalizes across channels, instead of one channel at a time. We’ve discussed convolution in Chapter 3. The weights in each filter are determined during training. Dropout is a layer that ignores nodes, randomly, when training the weights. This prevents interdependencies between nodes.

For our first example, we load an image that comes with MATLAB of a set of peppers. This image is larger than the input size of the network, so we use the top-left corner of the image. Note that each pretrained network has a fixed input image size that we can determine from the first layer.
The images and results for the AlexNet example are shown in Figure 11.1. The pepper scores are tightly clustered.
Figure 11.1

Test image labeled with the classification and the scores. The image is classified as a “bell pepper”.

For fun, and to learn more about this network, we print out the categories that had the next highest scores, sorted from high to low. The categories are stored in the last layer of the net in its Classes.
The results show that the net was considering all fruits and vegetables. The Granny Smith had the next highest score, followed by cucumber, while the fig and lemon had much smaller scores. This makes sense since Granny Smiths and cucumbers are also usually green.
We also have two of our test images. One is of a cat and one of a metal box, shown in Figure 11.2.
Figure 11.2

Raw test images Cat.png and Box.jpg.

The scores for the cat classification are shown as follows:

The selected label is tabby. The net can recognize that the photo is of a cat, as the other highest scored categories are also kinds of cats. Although what a tiger cat might be, as distinguished from a tabby, we can’t say…

The metal box proves the biggest challenge to the net. The category scores above 0.05 are shown in the following, and the images with their label are shown in Figure 11.3.
In this case, the hard disc is by far the highest score, but the score is much lower than that of the tabby cat – roughly 0.3 vs. 0.8. The summary of scores is
Figure 11.3

Test images and the classification by AlexNet. They are classified as “tabby” and “hard disc”.

11.3 Using GoogLeNet

11.3.1 Problem

Now let’s compare these results to GoogLeNet. GoogLeNet is a pretrained model that has also been trained on a subset of the ImageNet database which is used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). The model is trained on more than a million images, has 144 layers (a lot more than the AlexNet), and can classify images into 1000 object categories.

11.3.2 Solution

Install GoogLeNet from the Add-On Explorer (try running it from the command line to get an install link). Load some images and test using classify.

11.3.3 How It Works

First, we load the pretrained network as in the previous recipe.
The net display is shown as follows. It is a different type than AlexNet, a DAGNetwork. This network has its layers arranged as a directed acyclic graph; layers have inputs from and outputs to multiple layers.
Next, we test it on the image of peppers.
As before, the image is correctly identified as having a bell pepper, and the score is similar to AlexNet. However, the remaining categories are a little different. In this case, the cucumber and, for some reason, a maraca scored higher than a Granny Smith. Maracas are also round and oblong. The highest categories are as follows:
We also test this net on the images of the cat and box. The image size for this network is 224 × 224. The categories for the cat are the same, with the addition of a lynx, and note that the tabby score is significantly lower than for AlexNet.
The box scores prove the most interesting, and while the hard disc is among the highest scores, in this case, the net returns iPod. A cellular telephone is added to the mix this time. The net identifies that it is a rectangular metal object, but beyond that, there is no clear evidence for one category over another.
The GoogLeNet score arrays for Cat.png and Box.png are shown in Figure 11.4. The box scores are visibly spread all over the place. This reinforces that the choice of “iPod” is less certain than the pepper or cat. This shows that even highly trained networks are not necessarily reliable if the input strays too far from the test set.
Figure 11.4

GoogLeNet scores for Cat.png and Box.png.

We can also grab random images from the Internet. The site http://​picsum.​photos calls itself the “Lorem Ipsum” for photos and provides a random photo with every call to the URL. Consider, for example:

We got some interesting results using this website. It produces good results for some landscape photos, but other times see objects that are not there. Figure 11.5 shows four examples. An overexposed image of a sunset over a train yard is identified as a “volcano.” Two landscapes are appropriately labeled as “lakeside” and “seashore.” However, in the last image, a person on a bench gazing at a beach or desert landscape is inexplicably identified as “geyser.” This may have to do with the shape of the sky or clouds.

These nets are not trained on people; however, it can be interesting to test them on images of people. We tested GoogLeNet on our author headshots, Figure 11.6. In both cases, it identified our clothing fairly accurately!
Figure 11.5

GoogLeNet identification results of random images from picsum.photos.

Figure 11.6

Author headshots with GoogLeNet labels.

While these nets perform very well on images that do exist in their database, from lions to landscapes, it is important to remember that they are limited in application. Results can be unexpected and even silly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset