Convolution operations in TensorFlow

TensorFlow provides a variety of methods for convolution. The canonical form is applied by the conv2d operation. Let's have a look at the usage of this operation:

conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
dilations=[1, 1, 1, 1],
name=None
)

The parameters we use are as follows:

  • input: The operation will be applied to this original tensor. It has a definite format of four dimensions, and the default dimension order is shown next.
  • filter: This is a tensor representing a kernel or filter. It has a very generic method: (filter_height, filter_width, in_channels, and out_channels).
  • strides: This is a list of four int tensor datatypes, which indicate the sliding windows for each dimension.
  • padding: This can be SAME or VALIDSAME will try to conserve the initial tensor dimension, but VALID will allow it to grow if the output size and padding are computed. We will see later how to perform padding along with the pooling layers.
  • use_cudnn_on_gpu: This indicates whether to use the CUDA GPU CNN library to accelerate calculations.
  • data_format: This specifies the order in which data is organized (NHWC or NCWH).
  • dilations: This signifies an optional list of ints. It defaults to (1, 1, 1, 1). 1D tensor of length 4. The dilation factor for each dimension of input. If it is set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format; see the preceding code example for details. Dilations in the batch and depth dimensions must be 1.
  • name: A name for the operation (optional).

The following is an example of a convolutional layer. It concatenates a convolution, adds a bias parameter sum, and finally returns the activation function we have chosen for the whole layer (in this case, the ReLU operation, which is a frequently used one):

def conv_layer(data, weights, bias, strides=1): 
   x = tf.nn.conv2d(x,  
               weights,  
               strides=[1, strides, strides, 1],  
               padding='SAME') 
   x = tf.nn.bias_add(x, bias) 
   return tf.nn.relu(x) 

Here, x is the 4D tensor input (batch size, height, width, and channel). TensorFlow also offers a few other kinds of convolutional layers. For example:

  • tf.layers.conv1d() creates a convolutional layer for 1D inputs. This is useful, for example, in NLP, where a sentence may be represented as a 1D array of words, and the receptive field covers a few neighboring words.
  • tf.layers.conv3d() creates a convolutional layer for 3D inputs.
  • tf.nn.atrous_conv2d() creates an a trous convolutional layer (a trous is French for with holes). This is equivalent to using a regular convolutional layer with a filter dilated by inserting rows and columns of zeros. For example, a 1 × 3 filter equal to (1, 2, 3) may be dilated with a dilation rate of 4, resulting in a dilated filter (1, 0, 0, 0, 2, 0, 0, 0, 3). This allows the convolutional layer to have a larger receptive field at no computational price and using no extra parameters.
  • tf.layers.conv2d_transpose () creates a transpose convolutional layer, sometimes called a deconvolutional layer, which up-samples an image. It does so by inserting zeros between the inputs, so you can think of this as a regular convolutional layer using a fractional stride.
  • tf.nn.depthwise_conv2d() creates a depth-wise convolutional layer that applies every filter to every individual input channel independently. Thus, if there are fn filters and fn input channels, then this will output f× fn feature maps.
  • tf.layers.separable_conv2d() creates a separable convolutional layer that first acts like a depth-wise convolutional layer and then applies a 1 × 1 convolutional layer to the resulting feature maps. This makes it possible to apply filters to arbitrary sets of inputs channels.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset