CNN architecture

Taking as an example the input matrix 5x5 as shown earlier, a CNN consists of an input layer consisting of 25 neurons (5x5 = 25) whose task is to acquire the input value corresponding to each pixel and transfer it to the next hidden layer.

In a multilayer network, the outputs of all neurons of the input layer would be connected to each neuron of the hidden layer (fully-connected layer).

In CNN networks, the connection scheme that defines the convolutional layer that we are going to describe is significantly different.

As you can probably guess, this is the main type of layer; the use of one or more of these layers in a CNN is indispensable.

In a convolutional layer, each neuron is connected to a certain region of the input area called the receptive field.

For example, using a 3x3 kernel filter, each neuron will have a bias and 9=3x3 weights connected to a single receptive field. Of course, to effectively recognize an image, we need different kernel filters applied to the same receptive field, because each filter should recognize a different feature's image.

The set of neurons that identify the same feature define a single feature map.

The following figure shows a CNN architecture in action; the input image of a size of 28x28 will be analyzed by a convolutional layer composed of a 32 features map with a size of 28x28. The figure also shows a receptive field and the kernel filter of a 3x3 size:

CNN in action

A CNN may consist of several convolution layers connected in a cascade. The output of each convolution layer is a set of feature maps (each generated by a single kernel filter), and all these matrices define a new input that will be used by the next layer.

Usually in a CNN, each neuron produces an output, after an activation threshold, proportional to the input and not bounded; generally, the activation function used is the ReLU function that we introduced in Chapter 3, Using TensorFlow on a Feed-Forward Neural Network.

CNNs also use pooling layers positioned immediately after the convolutional layers. A pooling layer divides a convolutional region into sub regions and selects a single representative value (max-pooling or average pooling) to reduce the computational time of subsequent layers and increase the robustness of the feature with respect to its spatial position.

The last hidden layer of a convolutional network is generally a fully-connected network with a softmax activation function for the output layer.

Table of Contents for CNN architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
CNN architecture