Intuition and justification

We have already mentioned in Chapter 3, Deep Learning Fundamentals, the paper published in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton titled: ImageNet Classification with Deep Convolutional Neural Networks. Though the genesis of convolutional may be traced back to the '80s, that was one of the first papers that highlighted the deep importance of convolutional networks in image processing and recognition, and currently almost no deep neural network used for image recognition can work without some convolutional layer.

An important problem that we have seen when working with classical feed-forward networks is that they may overfit, especially when working with medium to large images. This is often due to the fact that neural networks have a very large number of parameters, in fact in classical neural nets all neurons in a layer are connected to each and every neuron in the next. When the number of parameters is large, over-fitting is more likely. Let's look at the following images: we can fit the data by drawing a line that goes exactly through all the points, or better, a line that will not match exactly the data but is more likely to predict future examples.

Intuition and justification

The points in the figure represent input data points. While they clearly follow the shape of a parabola, because of noise in the data, they may not be precisely plotted onto a parabola

In the first example of the two pictures represented, we overfit the data. In the second we have matched our prediction to the data in such a way that our prediction is more likely to better predict future data. In the first case, we just need three parameters to describe the curve: y = ax2 + bx + c, while in the second case we would need many more than just three parameters to write the equation for that curve. This gives an intuitive explanation of why, sometimes, having too many parameters may not be a good thing and it may lead to over-fitting. A classical feed-forward network for an image as small as those in the cifar10 examples (cifar10 is an established computer-vision dataset consisting of 60000 32 x 32 images divided in to 10 classes, and we will see a couple of examples from this dataset in this chapter) has inputs of size 3 x 32 x 32, which is already about four times as large as a simple mnist digit image. Larger images, say 3 x 64 x 64, would have about as many as 16 times the number of input neurons multiplying the number of connection weights:

Intuition and justification

In the left figure we draw a line that matches the data exactly. In the second figure we draw a line that approximates the shape of the line connecting the data points, but that does not match exactly the data points. The second curve, though less precise on the current input, is more likely to predict future data points than the curve in the first figure.

Convolutional networks reduce the number of parameters needed, since they require neurons to only connect locally to neurons corresponding to neighboring pixels, and therefore help avoid overfitting. In addition, reducing the number of parameters also helps computationally. In the next section, will introduce some convolutional layer examples to help the intuition and then we will move to formally define them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset