Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Michael Paluszek and Stephanie Thomas, MATLAB Machine Learning, 10.1007/978-1-4842-2250-8_7

7. Face Recognition with Deep Learning

Michael Paluszek¹ and Stephanie Thomas¹

(1)New Jersey, USA

A general neural net is shown in Figure 7.1. This is a “deep learning” neural net because it has multiple internal layers.

Figure 7.1 Deep learning neural net.

A convolutional neural network is a pipeline with multiple stages. The images go into one end and the probability that the image is a cat comes out the other. There are three types of layers:

Convolutional layers (hence the name)
Pooling layers
Fully connected layers

A convolutional neural net is shown in Figure 7.6. This is also a “deep learning” neural net because it has multiple internal layers, but now the layers are of the three types described above.

Figure 7.2 Deep learning convolutional neural net [1].

We can have as many layers as we want. A neuron in a neural net is (7.1)
where w is a weight, b is a bias, and σ() is the nonlinear function that operates on the input wx + b. This is the activation function. There are many possible sigmoid functions.

A sigmoid or hyperbolic tangent is often used as the activation function. The function Activation generates activation functions.

Figure 7.3 shows the three activation functions with k=1. A third is the rectified linear output function or

Figure 7.3 Activation function.

$displaystyle{ f(x) = left {egin{array}{ll} x&x > 0\ 0 &x leq 0 end{array} ight. }$ (7.2) This seems a bit strange for an image processing network where the inputs are all positive. However, the bias term can make the argument negative and previous layers may also change the sign.

The following recipes will detail each step in the chain. We will start with gathering image data. We will then describe the convolution process. The next recipe will implement pooling. We will show a recipe for Softmax. We will then demonstrate the full network using random weights. Finally, we will train the network using a subset of the images and see if we can identify the other images.

7.1 Obtain Data Online: For Training a Neural Network

7.1.1 Problem

We want to find photographs online for training a face recognition neural net.

7.1.2 Solution

Go to ImageNet to find images.

7.1.3 How It Works

ImageNet, http://www.image-net.org , is an image database organized according to the WordNet hierarchy. Each meaningful concept in WordNet is called a “synonym set.” There are more than 100,000 sets and 14 million images in ImageNet. For example, type in “Siamese cat.” Click on the link. You will see 445 images. You’ll notice that there are a wide variety of shots from many angles and a wide range of distances.

Synset: Siamese cat, Siamese

Definition: a slender, short-haired, blue-eyed breed of cat having a pale coat with dark ears, paws, face, and tail tip.

Popularity percentile:: 57%

Depth in WordNet: 8

This is a great resource! However, we are going to instead use pictures of our cats for our test to avoid copyright issues.

7.2 Generating Data for Training a Neural Net

7.2.1 Problem

We want grayscale photographs for training a face recognition neural net.

7.2.2 Solution

Take photographs using a digital camera.

7.2.3 How It Works

We first take pictures of several cats. We’ll use them to train the net. The photos are taken using an iPhone 6. We take just facial photos; to make the problem easier, we limit the photos to facial shots of the cats. We then frame the shots so that they are reasonably consistent in size and minimize the background. We then convert them to grayscale.

We use the function ImageArray to read in the images. It takes a path to a folder containing the images to be processed.

The function has a demo with our local folder of cat images.

ImageArray uses averages the three colors to convert the color images to grayscale. It flips them upside down since the image coordinates are opposite that of MATLAB. We used GraphicConverter 10^TM to crop the images around the cat’s face and make them all 1024 x 1024 pixels. One of the challenges of image matching is to do this process automatically. Also, training typically uses thousands of images. We are using just a few to see if our neural net can determine if the test image is a cat, or even one we have used in training! ImageArray scales the image using the function ScaleImage

Notice that it creates the new image array as uint8. Figure 7.4 shows the results of scaling.

Figure 7.4 Image scaled from 1024 × 1024 to 256 × 256.

The images are shown in Figure 7.5.

Figure 7.5 (64 × 64)-pixel grayscale cat images.

7.3 Convolution

7.3.1 Problem

We want to implement convolution to reduce the number of weights in the network.

7.3.2 Solution

Implement convolution using MATLAB matrix operations.

7.3.3 How It Works

We create an n-x-n mask that we apply to the input matrix. The matrix dimensions are m x m, where m is greater than n. We start in the upper left corner of the matrix. We multiply the mask times the corresponding elements in the input matrix and do a double sum. That is the first element of the convolved output. We then move it column by column until the highest column of the mask is aligned with the highest column of the input matrix. We then return it to the first column and increment the row. We continue until we have traversed the entire input matrix and our mask is aligned with the maximum row and maximum column.

The mask represents a feature. In effect, we are seeing if the feature appears in different areas of the image. We can have multiple masks. There are one bias and one weight for each element of the mask for each feature. In this case, instead of 16 sets of weights and biases, we only have 4. For large images, the savings can be substantial. In this case the convolution works on the image itself. Convolutions can also be applied to the output of other convolutional layers or pooling layers, as shown in Figure 7.6.

Figure 7.6 Convolution process showing the mask at the beginning and end of the process.

Convolution is implemented in Convolve.m.

The demo produces the following results.

>> Convolve

a =

1 0 1

0 1 0

1 0 1

b =

1 1 1 0 0 0

0 1 1 1 0 1

0 0 1 1 1 0

0 0 1 1 0 1

0 1 1 0 0 1

ans =

4 3 4 1

2 4 3 5

2 3 4 2

3 3 2 3

7.4 Convolution Layer

7.4.1 Problem

We want to implement a convolution connected layer.

7.4.2 Solution

Use code from Convolve to implement the layer.

7.4.3 How It Works

The “convolution” neural net scans the input with the mask. Each input to the mask passes through an activation function that is identical for a given mask. This reduces the number of weights.

Figure 7.7 shows the inputs and outputs from the demo. The tanh activation function is used in this demo. The weights and biases are random.

Figure 7.7 Inputs and outputs for the convolution layer.

7.5 Pooling

7.5.1 Problem

We want to pool the outputs of the convolution layer to reduce the number of points we need to process.

7.5.2 Solution

Implement a function to take the output of the convolution function.

7.5.3 How It Works

Pooling layers take a subset of the outputs of the convolutional layers and pass that on. They do not have any weights. Pooling layers can use the maximum value of the pool or take the median or mean value. Our pooling function has all there as an option. The pooling function divides the input into n x n subregions and returns an n x n matrix.

Pooling is implemented in Pool.m. Notice we use str2func instead of a switch statement.

The demo produces the following results.

The built-in demo creates 4 pools from an 4 x 4 matrix.

>> Pool

a =

0.9031 0.7175 0.5305 0.5312

0.1051 0.1334 0.8597 0.9559

0.7451 0.4458 0.6777 0.0667

0.7294 0.5088 0.8058 0.5415

ans =

0.4648 0.7193

0.6073 0.5229

7.6 Fully Connected Layer

7.6.1 Problem

We want to implement a fully connected layer.

7.6.2 Solution

Use Activation to implement the network.

7.6.3 How It Works

The “fully connected” neural net layer is the traditional neural net where every input is connected to every output as shown in Figure 7.8.

Figure 7.8 Fully connected neural net. This shows only one output.

We implement the fully connected network with n inputs and m outputs. Each path to an output can have a different weight and bias. FullyConnectedNN can handle any number of inputs or outputs.

Figure 7.9 shows the outputs from the demo. The tanh activation function is used in this demo. The weights and biases are random. The change in shape from input to output is the result of the activation function.

Figure 7.9 The two outputs from the demo function are shown vs. the two inputs.

7.7 Determining the Probability

7.7.1 Problem

We want to get a probability from neural net outputs.

7.7.2 Solution

Implement the Softmax function. This will be used for the output nodes of our network.

7.7.3 How It Works

Given a set of inputs, the Softmax function, a generalization of the logistic function, calculates a set of positive values p that add to 1. It is
$displaystyle{ p_{j} = frac{e^{q_{j}}} {sum _{k=1}^{N}e^{q_{k}}} }$ (7.3)
where q are the inputs and N is the number of inputs.

The function is implemented in Softmax.m.

The results of the demo are

>> Softmax

p =

0.0236 0.0643 0.1747 0.4748 0.0236 0.0643 0.1747

pMax =

0.4748

kMax =

ans =

1.0000

The last number is the sum of p, which should be (and is) 1.

7.8 Test the Neural Network

7.8.1 Problem

We want to integrate convolution, pooling, a fully connected layer, and Softmax.

7.8.2 Solution

The solution is write a convolutional neural net. We integrate the convolution, pooling, fully connected net, and Softmax functions. We then test it with randomly generated weights.

7.8.3 How It Works

Figure 7.10 shows the image processing neural network. It has one convolutional layer, one pooling layer, and a fully connected layer, and the final layer is the Softmax.

Figure 7.10 Neural net for the image processing.

>> TestNN

Image IMG_3886.png has a 13.1% chance of being a cat

As expected, the neural net does not identify the cat! The code in ConvolutionNN that performs the test is shown below.

Figure 7.11 shows the output of the various stages.

Figure 7.11 Stages in the convolutional neural net processing.

7.9 Recognizing an Image

7.9.1 Problem

We want to determine if an image is that of a cat.

7.9.2 Solution

We train the neural network with a series of cat images. We then use one picture from the training set and a separate picture and compute the probabilities that they are cats.

7.9.3 How It Works

We run the script TrainNN to see if the input image is a cat.

The script returns that the image is probably a cat.

>> TrainNN

Image IMG_3886.png has a 56.0% chance of being a cat

We can improve the results with

More images
More features (masks)
Changing the connections in the fully connected layer
Adding the ability of ConvolutionalNN to handle RGB images directly
Changing ConvolutionalNN

Summary

This chapter has demonstrated facial recognition using MATLAB. Convolutional neural nets were used to process pictures of cats for learning. When trained, the neural net was asked to identify other pictures to determine if they were pictures of a cat. Table 7.1 lists the code introduced in this chapter.

Table 7.1 Chapter Code Listing

File	Description
Activation	Generate activation functions
ImageArray	Read in images in a folder and convert to grayscale
ConvolutionalNN	Implement a convolutional neural net
ConvolutionLayer	Implement a convolutional layer
Convolve	Convolve a two-dimensional array using a mask
Pool	Pool a two-dimensional array
FullyConnectedNN	Implement a fully connected neural network
ScaleImage	Scale an image
Softmax	Implement the Softmax function
TrainNN	Train the convolutional neural net
TestNN	Test the convolutional neural net
TrainingData.mat	Data from TestNN

References

[1] Matthijs Hollemans. Convolutional neural networks on the iPhone with VGGNet. http://matthijshollemans.com/2016/08/30/vggnet-convolutional-neural-network-iphone/ , 2016.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Face Recognition with Deep Learning

Create new playlist

Sign In

Sign Up

7. Face Recognition with Deep Learning

Figure 7.1 Deep learning neural net.

Figure 7.2 Deep learning convolutional neural net [1].

Figure 7.3 Activation function.

7.1 Obtain Data Online: For Training a Neural Network

7.1.1 Problem

7.1.2 Solution

7.1.3 How It Works

7.2 Generating Data for Training a Neural Net

7.2.1 Problem

7.2.2 Solution

7.2.3 How It Works

Figure 7.4 Image scaled from 1024 × 1024 to 256 × 256.

Figure 7.5 (64 × 64)-pixel grayscale cat images.

7.3 Convolution

7.3.1 Problem

7.3.2 Solution

7.3.3 How It Works

Figure 7.6 Convolution process showing the mask at the beginning and end of the process.

7.4 Convolution Layer

7.4.1 Problem

7.4.2 Solution

7.4.3 How It Works

Figure 7.7 Inputs and outputs for the convolution layer.

7.5 Pooling

7.5.1 Problem

7.5.2 Solution

7.5.3 How It Works

7.6 Fully Connected Layer

7.6.1 Problem

7.6.2 Solution

7.6.3 How It Works

Figure 7.8 Fully connected neural net. This shows only one output.

Figure 7.9 The two outputs from the demo function are shown vs. the two inputs.

7.7 Determining the Probability

7.7.1 Problem

7.7.2 Solution

7.7.3 How It Works

7.8 Test the Neural Network

7.8.1 Problem

7.8.2 Solution

7.8.3 How It Works

Figure 7.10 Neural net for the image processing.

Figure 7.11 Stages in the convolutional neural net processing.

7.9 Recognizing an Image

7.9.1 Problem

7.9.2 Solution

7.9.3 How It Works

Summary

Table 7.1 Chapter Code Listing

Table of Contents for
7. Face Recognition with Deep Learning