The DNN architecture is as follows (the components are built in sequential order):
- A convolution of 16 filters of dimension 8 x 8 with 4 strides and rectifier nonlinearity.
- A convolution of 32 filters of dimension 4 x 4 with 2 strides and rectifier nonlinearity.
- A convolution of 32 filters of dimension 3 x 3 with 1 strides and rectifier nonlinearity.
- A dense layer of 128 units and ReLU activation.
- A dense layer with a number of units equal to the actions that are allowed in the environment and a linear activation.
In cnn, we define the first three convolutional layers, while in fnn, we define the last two dense layers:
def cnn(x):
x = tf.layers.conv2d(x, filters=16, kernel_size=8, strides=4, padding='valid', activation='relu')
x = tf.layers.conv2d(x, filters=32, kernel_size=4, strides=2, padding='valid', activation='relu')
return tf.layers.conv2d(x, filters=32, kernel_size=3, strides=1, padding='valid', activation='relu')
def fnn(x, hidden_layers, output_layer, activation=tf.nn.relu, last_activation=None):
for l in hidden_layers:
x = tf.layers.dense(x, units=l, activation=activation)
return tf.layers.dense(x, units=output_layer, activation=last_activation)
In the preceding code, hidden_layers is a list of integer values. In our implementation, this is hidden_layers=[128]. On the other hand, output_layer is the number of agent actions.
In qnet,the CNN and FNN layers are connected with a layer that flattens the 2D output of the CNN:
def qnet(x, hidden_layers, output_size, fnn_activation=tf.nn.relu, last_activation=None):
x = cnn(x)
x = tf.layers.flatten(x)
return fnn(x, hidden_layers, output_size, fnn_activation, last_activation)
The deep neural network is now fully defined. All we need to do is connect it to the main computational graph.