11.1 Introduction
Image classification can be done with pretrained networks. MATLAB makes it easy to access and use these networks. This chapter shows you two examples. First, we will use AlexNet, then GoogLeNet.
11.2 Using AlexNet
11.2.1 Problem
We want to use the pretrained network AlexNet for image classification.
11.2.2 Solution
Depending on your version of MATLAB, install AlexNet from the Add-On Explorer or download the support package for GoogLeNet. Load some images and test. These are classification networks, so we will use classify to run them.
11.2.3 How It Works
First, we need to download the support packages with the Add-On Explorer. If you attempt to run alexnet or googlenet without having them installed, you will get a link directly to the package in the Add-On Explorer. You will need your MathWorks password.
There are many layers in this convolutional network. ReLU and softmax are the activation functions. In the first layer, “zero center” normalization is used. This means the images are normalized to have a mean of zero and a standard deviation of one. Two layers are new, Cross Channel Normalization and Grouped Convolution. Filter groups, also known as grouped convolution, were introduced with AlexNet in 2012. You can think of the output of each filter as a channel and filter groups as groups of the channels. Filter groups allowed more efficient parallelization across GPUs. They also improved performance. Cross Channel Normalization normalizes across channels, instead of one channel at a time. We’ve discussed convolution in Chapter 3. The weights in each filter are determined during training. Dropout is a layer that ignores nodes, randomly, when training the weights. This prevents interdependencies between nodes.
The selected label is tabby. The net can recognize that the photo is of a cat, as the other highest scored categories are also kinds of cats. Although what a tiger cat might be, as distinguished from a tabby, we can’t say…
11.3 Using GoogLeNet
11.3.1 Problem
Now let’s compare these results to GoogLeNet. GoogLeNet is a pretrained model that has also been trained on a subset of the ImageNet database which is used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). The model is trained on more than a million images, has 144 layers (a lot more than the AlexNet), and can classify images into 1000 object categories.
11.3.2 Solution
Install GoogLeNet from the Add-On Explorer (try running it from the command line to get an install link). Load some images and test using classify.
11.3.3 How It Works
We got some interesting results using this website. It produces good results for some landscape photos, but other times see objects that are not there. Figure 11.5 shows four examples. An overexposed image of a sunset over a train yard is identified as a “volcano.” Two landscapes are appropriately labeled as “lakeside” and “seashore.” However, in the last image, a person on a bench gazing at a beach or desert landscape is inexplicably identified as “geyser.” This may have to do with the shape of the sky or clouds.
While these nets perform very well on images that do exist in their database, from lions to landscapes, it is important to remember that they are limited in application. Results can be unexpected and even silly.