Spatial convolution and pooling

Geoffrey Hinton and his team have recently strongly suggested that using pooling with convolution removes spatial relationships in the image. Hinton instead suggests the use of CapsNet, or Capsule Networks. Capsule Networks are a method of pooling that preserves the spatial integrity of the data. Now, this may not be a problem in all cases. For handwritten digits, spatial relationships don't matter that much. However, self-driving cars or networks tasked with spatial tasks, a prime example of which is games, often don't perform as well when using pooling. In fact, the team at Unity do not use pooling layers after convolution; let's understand why.

Pooling or down-sampling is a way of augmenting data by collecting its common features together. The problem with this is that any relationship in the data often gets lost entirely. The following diagram demonstrates MaxPooling(2,2) over a convolution map:

Max pooling at work

Even in the simple preceding diagram, you can quickly appreciate that pooling loses the spatial relationship of the corner (upper-left, bottom-left, lower-right and upper-right) the pooled value started in. Note that, after a couple layers of pooling, any sense of spatial relation will be completely gone.

We can test the effect of removing pooling layers from the model and test this again by following these steps:

Open the Chapter_2_3.py file and note how we commented out a couple of pooling layers, or you can just delete the lines as well, like so:

x = Convolution2D(8, 3, 3)(img_in)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

x = Convolution2D(16, 3, 3)(x)
x = Activation('relu')(x)
#x = MaxPooling2D(pool_size=(2, 2))(x)

x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
#x = MaxPooling2D(pool_size=(2, 2))(x)

Note how we didn't comment out (or delete) all the pooling layers and left one in. In some cases, you may still want to leave a couple of pooling layers in, perhaps to identify features that are not spatially important. For example, when recognizing digits, space is less important with respect to the overall shape. However, if we consider recognizing a face, then the distance between a person's eyes, mouth, and so on, is what distinguishes a face from another face. However, if you just wanted to identify a face, with eyes, mouth, and so on, then just applying pooling could be quite acceptable.
Next, we also increase the dropout rate on our Dropout layer like so:

x = Dropout(.5)(x)

We will explore dropout in some detail in the next section. For now, though, just realize that this change will have a more positive effect on our model.
Lastly, we bump up the number of epochs to 10 with the following code:

model.fit(train_X, train_Y, batch_size=64, epochs=10, validation_data=(val_X, val_Y), callbacks=callbacks_list)

In our previous run, if you were watching the loss rate when training, you would realize the last example more or less started to converge at four epochs. Since dropping the pooling layers also reduces the training data, we need to also bump up the number of epochs. Remember, pooling or down-sampling increases the number of feature maps, and fewer maps means the network needs more training runs. If you are not training on a GPU, this model will take a while, so be patient.
Finally, run the example, again with those minor changes. One of the first things you will notice is that the training time shoots up dramatically. Remember, this is because our pooling layers do facilitate quicker training, but at a cost. This is one of the reasons we allow for a single pooling layer.
When the sample is finished running, compare the results for the Chapter_2_2.py sample we ran earlier. Did it do what you expected it to?

We only focus on this particular blog post because it is extremely well presented and well written. The author obviously knew his stuff, but this example just shows how important it is to understand the fundamentals of these concepts in as much detail as you can handle. This is not such an easy task with the flood of information, but this also reinforces the fact that developing working deep learning models is not a trivial task, at least not yet.

Now that we understand the cost/penalty of pooling layers, we can move on to the next section, where we jump back to understanding Dropout. It is an excellent tool you will use over and over again.

Table of Contents for Spatial convolution and pooling

Create new playlist

Sign In

Sign Up

Table of Contents for
Spatial convolution and pooling