Building the model

Now, it's time to build the core of the model. The computational graph includes all the layers we mentioned earlier in this chapter. We'll start by defining some functions that will be used to define variables of a specific shape and randomly initialize them:

def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))

def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))

Now, let's define the function that will be responsible for creating a new convolution layer based on some input layer, input channels, filter size, number of filters, and whether to use pooling parameters or not:

def conv_layer(input, # the output of the previous layer.
                   input_channels, 
                   filter_size, 
                   filters, 
                   use_pooling=True): # Use 2x2 max-pooling.

    # preparing the accepted shape of the input Tensor.
    shape = [filter_size, filter_size, input_channels, filters]

    # Create weights which means filters with the given shape.
    filters_weights = new_weights(shape=shape)

    # Create new biases, one for each filter.
    filters_biases = new_biases(length=filters)

    # Calling the conve2d function as we explained above, were the strides parameter
    # has four values the first one for the image number and the last 1 for the input image channel
    # the middle ones represents how many pixels the filter should move with in the x and y axis
    conv_layer = tf.nn.conv2d(input=input,
                         filter=filters_weights,
                         strides=[1, 1, 1, 1],
                         padding='SAME')

    # Adding the biase to the output of the conv_layer.
    conv_layer += filters_biases

    # Use pooling to down-sample the image resolution?
    if use_pooling:
        # reduce the output feature map by max_pool layer
        pool_layer = tf.nn.max_pool(value=conv_layer,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME')

    # feeding the output to a ReLU activation function.
    relu_layer = tf.nn.relu(pool_layer)

  
    # return the final results after applying relu and the filter weights
    return relu_layer, filters_weights

As we mentioned previously, the pooling layer produces a 4D tensor. We need to flatten this 4D tensor to a 2D one to be fed to the fully connected layer:

def flatten_layer(layer):
    # Get the shape of layer.
    shape = layer.get_shape()

    # We need to flatten the layer which has the shape of The shape [num_images, image_height, image_width, num_channels]
    # so that it has the shape of [batch_size, num_features] where number_features is image_height * image_width * num_channels

    number_features = shape[1:4].num_elements()
    
    # Reshaping that to be fed to the fully connected layer
    flatten_layer = tf.reshape(layer, [-1, number_features])


    # Return both the flattened layer and the number of features.
    return flatten_layer, number_features

This function creates a fully connected layer which assumes that the input is a 2D tensor:

def fc_layer(input, # the flatten output.
                 num_inputs, # Number of inputs from previous layer
                 num_outputs, # Number of outputs
                 use_relu=True): # Use ReLU on the output to remove negative values

    # Creating the weights for the neurons of this fc_layer
    fc_weights = new_weights(shape=[num_inputs, num_outputs])
    fc_biases = new_biases(length=num_outputs)

    # Calculate the layer values by doing matrix multiplication of
    # the input values and fc_weights, and then add the fc_bias-values.
    fc_layer = tf.matmul(input, fc_weights) + fc_biases

    # if use RelU parameter is true
    if use_relu:
        relu_layer = tf.nn.relu(fc_layer)
        return relu_layer

    return fc_layer

Before building the network, let's define a placeholder for the input images where the first dimension is None to represent an arbitrary number of images:

input_values = tf.placeholder(tf.float32, shape=[None, image_size_flat], name='input_values')

As we mentioned previously, the convolution step expects the input images to be in the shape of a 4D tensor. So, we need to reshape the input images to be in the following shape:

[num_images, image_height, image_width, num_channels]

So, let's reshape the input values to match this format:

input_image = tf.reshape(input_values, [-1, image_size, image_size, num_channels])

Next, we need to define another placeholder for the actual class values, which will in one-hot encoding format:

y_actual = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_actual')

Also, we need to define a placeholder to hold the integer values of the actual class:

y_actual_cls_integer = tf.argmax(y_actual, axis=1)

So, let's start off by building the first CNN:

conv_layer_1, conv1_weights = 
        conv_layer(input=input_image,
                   input_channels=num_channels,
                   filter_size=filter_size_1,
                   filters=filters_1,
                   use_pooling=True)

Let's check the shape of the output tensor that will be produced by the first convolution layer:

conv_layer_1

Output:
<tf.Tensor 'Relu:0' shape=(?, 14, 14, 16) dtype=float32>

Next, we will create the second convolution network and feed the output of the first one to it:

conv_layer_2, conv2_weights = 
         conv_layer(input=conv_layer_1,
                   input_channels=filters_1,
                   filter_size=filter_size_2,
                   filters=filters_2,
                   use_pooling=True)

Also, we need to double-check the shape of the output tensor of the second convolution layer. The shape should be (?, 7, 7, 36), where the ? mark means an arbitrary number of images.

Next, we need to flatten the 4D tensor to match the expected format for the fully connected layer, which is a 2D tensor:

flatten_layer, number_features = flatten_layer(conv_layer_2)

We need to double-check the shape of the output tensor of the flattened layer:

flatten_layer

Output:
<tf.Tensor 'Reshape_1:0' shape=(?, 1764) dtype=float32>

Next, we will create a fully connected layer and feed the output of the flattened layer to it. We will also feed the output of the fully connected layer to a ReLU activation function before feeding it to the second fully connected layer:

fc_layer_1 = fc_layer(input=flatten_layer,
                         num_inputs=number_features,
                         num_outputs=fc_num_neurons,
                         use_relu=True)

Let's double-check the shape of the output tensor of the first fully connected layer:

fc_layer_1

Output:
<tf.Tensor 'Relu_2:0' shape=(?, 128) dtype=float32>

Next, we need to add another fully connected layer, which will take the output of the first fully connected layer and produce an array of size 10 for each image that represents the scores for each target class being the correct one:

fc_layer_2 = fc_layer(input=fc_layer_1,
                         num_inputs=fc_num_neurons,
                         num_outputs=num_classes,
                         use_relu=False)

fc_layer_2

Output:
<tf.Tensor 'add_3:0' shape=(?, 10) dtype=float32>

Next, we'll normalize these scores from the second fully connected layer and feed it to a softmax activation function, which will squash the values to be between 0 and 1:

y_predicted = tf.nn.softmax(fc_layer_2)

Then, we need to choose the target class that has the highest probability by using the argmax function of TensorFlow:

y_predicted_cls_integer = tf.argmax(y_predicted, axis=1)

Table of Contents for Building the model

Create new playlist

Sign In

Sign Up

Table of Contents for
Building the model