Convolution neural networks

This section is provided as a brief introduction to convolution neural networks without the Scala implementation.

So far, the layers of perceptrons were organized as a fully connected network. It is clear that the number of synapses or weights increases significantly as the number and size of hidden layers increases. For instance, a network for a features set of dimension 6, 3 hidden layers of 64 nodes each, and one output value requires 7*64 + 2*65*64 + 65*1 = 8833 weights!

Applications such as image or character recognition require very large features set, making training a fully connected layered perceptron very computational intensive. Moreover, these applications need to convey spatial information such as the proximity of pixels as part of the features vector.

A recent approach, known as convolution neural networks, consists of limiting the number of nodes in the hidden layers a input node is connected to. In other words, the methodology leverages spatial localization to reduce the complexity of connectivity between the input and the hidden layer [9:15]. The subset of input nodes connected to a single neuron in the hidden layer is known as the local receptive fields.

Local receptive fields

The neuron of the hidden layer learns from the local receptive fields or subimage of n by n pixels, each of those pixels being an input value. The next local receptive field, which is shifted by one pixel in any direction, is connected to the next neuron in the first hidden layer. The first hidden layer is known as the convolution layer. An illustration of the mapping between the input (image) and the first hidden layer (convolution layer) is as follows:

Local receptive fields

The generation of a convolution layer from an image

It would make sense that each n by n local receptive field has a bias element (+1) that connects to the hidden neuron. However, the extra complexity does not lead to a more accurate model, and therefore, the bias is shared across the neurons of the convolution layer.

Sharing of weights

The local receptive fields representing a small section of the image are generated by shifting the fields by one pixel (up, down, left, or right). Therefore, the weights associated with the local receptive fields are also shared across the neurons in the hidden layer. As a matter of fact, the same feature such as an image color or edge can be detected in many pixels across the image. The maps between the input features and neurons in the hidden layer, known as features maps, share weights across the convolution layer. The output is computed using the activation function.

Note

Tanh versus the sigmoid activation

The sigmoid is predominately used in the examples related to the multilayer perceptron as the activation function for the hidden layer. The hyperbolic tangent function is commonly used for convolution networks.

Convolution layers

The output computed from the features maps is expressed as a convolution that is similar to the convolution used in a discrete Fourier transformed-based filter (refer to M11 mathematical expression in the DFT-based filtering section under Fourier analysis in Chapter 3, Data Preprocessing). The activation function that computes the output in the hidden layer has to be modified to take into account the local receptive fields.

Note

The activation of a convolution neural network

M13: The output value zj for a shared bias w0, an activation function σ, a local receptive field of n by n pixels, input values xij , and weights wuv associated with a features map is given by:

Convolution layers

The next step in building the neural network would be to use the output of the convolution layer to a full connected hidden layer. However, the features maps in the convolution layer are usually similar so that they can be reduced to a smaller set of outputs using an intermediate layer known as subsampling layer [9:16].

Subsampling layers

Each features map in the convolution layer is reduced or condensed into a smaller features map. The layer composed of these smaller features map is known as the subsampling layer. The purpose of the sampling is to reduce the sensitivity of the weights to any minute changes in the image between adjacent pixels. The sharing of weights reduces the sensitivity to any nonsignificant changes in the image:

Subsampling layers

The connectivity between features map from a convolution to a subsampling layer

The subsampling layer is sometimes referred to as the pooling layer.

Putting it all together

The last layer of the convolution neural network is the fully connected hidden layer and output layer, subjected to the same transformative formulas as the traditional multilayer perceptron. The output values can be computed using a linear product or a softmax function:

Putting it all together

An overview of a convolution neural network

The error backpropagation algorithm described in the Step 2 – error back propagation section has to be modified to support the features map [9:17].

Note

The architecture of convolution networks

Deep convolution neural networks have multiple sequences of convolution layers and subsampling layers and may have more than one fully connection hidden layer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset