Applying pooling operations in TensorFlow

Using TensorFlow, a subsampling layer can normally be represented by a max_pool operation by maintaining the initial parameters of the layer. For max_pool, it has the following signature in TensorFlow:

tf.nn.max_pool(value, ksize, strides, padding, data_format, name) 

Now let's learn how to create a function that utilizes the preceding signature and returns a tensor with type tf.float32, that is, the max pooled output tensor:

import tensorflow as tf
def maxpool2d(x, k=2): return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')

In the preceding code segment, the parameters can be described as follows:

  • value: This is a 4D tensor of float32 elements and shape (batch length, height, width, and channels)
  • ksize: A list of integers representing the window size on each dimension
  • strides: The step of the moving windows on each dimension
  • data_format: NHWC, NCHW, and NCHW_VECT_C are supported
  • ordering: NHWC or NCHW
  • padding: VALID or SAME

However, depending upon the layering structures in a CNN, there are other pooling operations supported by TensorFlow, as follows:

  • tf.nn.avg_pool: This returns a reduced tensor with the average of each window
  • tf.nn.max_pool_with_argmax: This returns the max_pool tensor and a tensor with the flattened index of max_value
  • tf.nn.avg_pool3d: This performs an avg_pool operation with a cubic-like
  • window; the input has an added depth
  • tf.nn.max_pool3d: This performs the same function as (...) but applies the max operation

Now let's see a concrete example of how the padding thing works in TensorFlow. Suppose we have an input image x with shape [2, 3] and one channel. Now we want to see the effect of both VALID and SAME paddings:

  • valid_pad: Max pool with 2 x 2 kernel, stride 2, and VALID padding
  • same_pad: Max pool with 2 x 2 kernel, stride 2, and SAME padding

Let's see how we can attain this in Python and TensorFlow. Suppose we have an input image of shape [2, 4], which is one channel:

import tensorflow as tf 
x = tf.constant([[2., 4., 6., 8.,], 
                 [10., 12., 14., 16.]]) 

Now let's give it a shape accepted by tf.nn.max_pool:

x = tf.reshape(x, [1, 2, 4, 1]) 

If we want to apply the VALID padding with the max pool with a 2 x 2 kernel, stride 2:

VALID = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') 

On the other hand, using the max pool with a 2 x 2 kernel, stride 2 and SAME padding:

SAME = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME') 

For VALID padding, since there is no padding, the output shape is [1, 1]. However, for the SAME padding, since we pad the image to the shape [2, 4] (with - inf) and then apply the max pool, the output shape is [1, 2]. Let's validate them:

print(VALID.get_shape())  
print(SAME.get_shape())  
>>> 
(1, 1, 2, 1) 
(1, 1, 2, 1) 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset