To pool or not to pool

As we discussed in Chapter 2, Convolutional and Recurrent Networks, ML-Agents does not use any pooling in order to avoid any loss of spatial relationships in data. However, as we saw in our self-driving vehicle example, a single pooling layer or two up at the higher feature level extraction (convolutional layers) can in fact help. Although our example was tested on a much more complex network, it will be helpful to see how this applies to a more complex ML-Agents CNN embedding. Let's try this out, and apply a layer of pooling to the last example by completing the following exercise:

Open the models.py file in your Python editor of choice. Visual Studio with the Python data extensions is an excellent platform, and also provides the ability to interactively debug code.
Locate the following block of code, which is as we last left it in the previous exercise:

conv1 = tf.layers.conv2d(image_input, 16, kernel_size=[8, 8], strides=[4, 4], activation=tf.nn.elu, reuse=reuse, name="conv_1")
conv2 = tf.layers.conv2d(conv1, 32, kernel_size=[4, 4], strides=[2, 2], activation=tf.nn.elu, reuse=reuse, name="conv_2")
conv3 = tf.layers.conv2d(image_input, 64, kernel_size=[2, 2], strides=[2, 2], activation=tf.nn.elu, reuse=reuse, name="conv_3")

hidden = c_layers.flatten(conv3)

We will now inject a layer of pooling by modifying the block of code, like so:

conv1 = tf.layers.conv2d(image_input, 16, kernel_size=[8, 8], strides=[4, 4], activation=tf.nn.elu, reuse=reuse, name="conv_1")
#################### ADD POOLING
conv2 = tf.layers.conv2d(conv1, 32, kernel_size=[4, 4], strides=[2, 2], activation=tf.nn.elu, reuse=reuse, name="conv_2")
conv3 = tf.layers.conv2d(image_input, 64, kernel_size=[2, 2], strides=[2, 2], activation=tf.nn.elu, reuse=reuse, name="conv_3")

hidden = c_layers.flatten(conv3)

This now sets up our previous sample to use a single layer of pooling. You can think of this as extracting all the upper features, such as the sky, wall, or floor, and pooling the results together. When you think about it, how much spatial information does the agent need to know regarding one sky patch versus another? All the agent really needs to know is that the sky is always up.
Open your command shell or Anaconda window and train the sample by running the following code:

mlagents-learn config/trainer_config.yaml --run-id=vh_conv_wpool1 --train

As always, watch the performance of the agent and notice how the agent moves as it trains. Watch the training until completion, or as much as you observed others.

Now, depending on your machine or environment you may have noticed a substantial improvement in training time, but actual performance suffered slightly. This means that each training iteration executed much quicker, two to three times or more, but the agent needs slightly more interactions. In this case, the agent will train quicker time-wise, but in other environments, pooling at higher levels maybe more detrimental. When it comes down to it, it will depend on the visuals of your environment, how well you want your agent to perform, and, ultimately, your patience.

In the next section, we will look at another characteristic of state – memory, or sequencing. We will look at how recurrent networks are used to capture the importance of remembering sequences or event series.

Table of Contents for To pool or not to pool

Create new playlist

Sign In

Sign Up

Table of Contents for
To pool or not to pool