Using TensorFlow, a subsampling layer can normally be represented by a max_pool operation by maintaining the initial parameters of the layer. For max_pool, it has the following signature in TensorFlow:
tf.nn.max_pool(value, ksize, strides, padding, data_format, name)
Now let's learn how to create a function that utilizes the preceding signature and returns a tensor with type tf.float32, that is, the max pooled output tensor:
import tensorflow as tf
def maxpool2d(x, k=2): return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')
In the preceding code segment, the parameters can be described as follows:
- value: This is a 4D tensor of float32 elements and shape (batch length, height, width, and channels)
- ksize: A list of integers representing the window size on each dimension
- strides: The step of the moving windows on each dimension
- data_format: NHWC, NCHW, and NCHW_VECT_C are supported
- ordering: NHWC or NCHW
- padding: VALID or SAME
However, depending upon the layering structures in a CNN, there are other pooling operations supported by TensorFlow, as follows:
- tf.nn.avg_pool: This returns a reduced tensor with the average of each window
- tf.nn.max_pool_with_argmax: This returns the max_pool tensor and a tensor with the flattened index of max_value
- tf.nn.avg_pool3d: This performs an avg_pool operation with a cubic-like
- window; the input has an added depth
- tf.nn.max_pool3d: This performs the same function as (...) but applies the max operation
Now let's see a concrete example of how the padding thing works in TensorFlow. Suppose we have an input image x with shape [2, 3] and one channel. Now we want to see the effect of both VALID and SAME paddings:
- valid_pad: Max pool with 2 x 2 kernel, stride 2, and VALID padding
- same_pad: Max pool with 2 x 2 kernel, stride 2, and SAME padding
Let's see how we can attain this in Python and TensorFlow. Suppose we have an input image of shape [2, 4], which is one channel:
import tensorflow as tf x = tf.constant([[2., 4., 6., 8.,], [10., 12., 14., 16.]])
Now let's give it a shape accepted by tf.nn.max_pool:
x = tf.reshape(x, [1, 2, 4, 1])
If we want to apply the VALID padding with the max pool with a 2 x 2 kernel, stride 2:
VALID = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
On the other hand, using the max pool with a 2 x 2 kernel, stride 2 and SAME padding:
SAME = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
For VALID padding, since there is no padding, the output shape is [1, 1]. However, for the SAME padding, since we pad the image to the shape [2, 4] (with - inf) and then apply the max pool, the output shape is [1, 2]. Let's validate them:
print(VALID.get_shape()) print(SAME.get_shape())
>>> (1, 1, 2, 1) (1, 1, 2, 1)