Chapter 4. Convolutional Neural Networks

In this chapter, we will talk about CNNs, which are a feather in the cap of deep learning. CNNs have achieved excellent results in many practical applications, particularly in the field of object recognition in images. We will explain and implement the LeNet architecture (LeNet5), which was the first CNN to have great success with the classic MNIST digit classification system. We will also analyze AlexNet, which is a deep CNN that was invented by Alex Krizhevsky. We'll use these networks to introduce transfer learning, which is a machine learning method that utilizes a pre-trained neural network. We will also introduce the VGG architecture, which is usually used as a deep CNN for object recognition. This was developed by Oxford University's renowned Visual Geometry Group (VGG), which performed very well with the ImageNet dataset. This architecture gives us the opportunity to show how to use a neural network to draw a picture in a certain artistic style (artistic style learning).

We will move on to the Inception-v3 model, which was created for the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) using the data from the 2012 competition. This is a standard task in computer vision, in which models try to classify 1.2M images of 1000 different categories. We'll demonstrate how to train your own image classifier with Inception in TensorFlow. The last example is taken from the Kaggle platform. The purpose here is to train a network on a series of facial images to classify their emotional stretch. We'll evaluate the accuracy of the model and then test it on a single image that does not belong to the original dataset. The topics covered in this chapter are as follows:

  • Main concepts of CNNs
  • CNNs in action
  • LeNet and the MNIST classification problem
  • AlexNet and transfer learning
  • VGG and artistic style learning
  • Inception-v3 model
  • Emotion recognition

Main concepts of CNNs

Recently, Deep Neural Networks (DNNs) have given fresh impetus to research and therefore they are being used widely. CNNs are a special type of DNN, and they have been used with great success in image classification problems. Before diving into the implementation of an image classifier based on CNNs, we'll introduce some basic concepts in image recognition, such as feature detection and convolution.

In computer vision, it is well known that a real image is associated with a grid composed of a high number of small squares called pixels. The following figure represents a black and white image related to a 5×5 grid of pixels:

Main concepts of CNNs

Figure 1: Pixel view of a black and white image.

Each element in the grid corresponds to a pixel. In the case of a black and white image, a value of 1 is associated with black and a value of 0 is associated with white. Alternatively, for a grayscale image, the allowed values for each grid element are in the range [0, 255], where 0 is associated with black and 255 is associated with white.

Finally, a color image is represented by a group of three matrices, each corresponding to one color channel (red, green, and blue). Each element of each matrix can vary over an interval of 0 to 255 that specifies the brightness of the fundamental color (or base color). This is shown in the following figure, in which each matrix is 4×4 and the number of color channels is three:

Main concepts of CNNs

Figure 2: Color image

Let's focus now on the black and white image 5×5 matrix. Suppose we slide a second matrix of lower dimensions, for example a 3×3 matrix (see the figure below), across the width and height of the image matrix:

Main concepts of CNNs

Figure 3: Kernel filter

This flowing matrix is called a kernel filter or a feature detector. While a kernel filter moves along the input matrix (or input image), it performs a scalar product of the kernel values and the values of the matrix portion to which it is applied. The result is a new matrix called a convolution matrix.

The next figure displays the convolution procedure: the convolved feature (the resulting 3×3 matrix) is generated by the convolution operation, flowing the kernel filter (the 3×3 matrix) on the input image (the 5×5 matrix):

Main concepts of CNNs

Figure 4: Input image (matrix 5×5 on the left), the kernel filter (matrix 3×3 on the input image), and convolved feature (matrix 3×3 on the right)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset