Tuning recurrent hyperparameters

As we learned in our discussion of recurrent networks, LSTM layers may receive variable input, but we still need to define a maximum sequence length that we want the network to remember. There are two critical hyperparameters we need to play with when using recurrent networks. A description of these parameters, at the time of writing, and as listed in the ML-Agents docs, is as follows:

sequence_length: Corresponds to the length of the sequences of experience that are passed through the network during training. This should be long enough to capture whatever information your agent might need to remember over time. For example, if your agent needs to remember the velocity of objects, then this can be a small value. If your agent needs to remember a piece of information that's given only once at the beginning of an episode, then this should be a larger value:
- Typical Range: 4 – 128
memory_size: Corresponds to the size of the array of floating point numbers that are used to store the hidden state of the recurrent neural network. This value must be a multiple of four, and should scale with the amount of information you expect the agent will need to remember to successfully complete the task:
- Typical Range: 64 – 512

The description of the recurrent sequence_length and memory_size hyperparameters was extracted directly from the Unity ML-Agents documentation.

If we look at our VisualHallway example configuration in the trainer_config.yaml file, we can see that the parameters are defined as follows:

VisualHallwayLearning:
    use_recurrent: true
    sequence_length: 64
    num_layers: 1
    hidden_units: 128
    memory_size: 256
    beta: 1.0e-2
    gamma: 0.99
    num_epoch: 3
    buffer_size: 1024
    batch_size: 64
    max_steps: 5.0e5
    summary_freq: 1000
    time_horizon: 64

This effectively means that our agent will remember 64 frames or states of input using a memory size of 256. The documentation is unclear as to how much memory a single input takes, so we can only assume that the default visual convolutional encoding network, the original two layer model, requires four per frame. We can assume that, by increasing our convolutional encoding in the previous examples, the agent may have not been able to remember every frame of state. Therefore, let's modify the configuration in the VisualHallway example to account for that increase in memory, and see the effect it has in the following exercise:

Open up the VisualHallway example to where we last left it in the previous exercises, with or without pooling enabled. Just be sure to remember if you are or are not using pooling, as this will make a difference to the required memory.

Open the trainer_config.yaml file located in the ML-Agents/ml-agents/config folder.
Modify the VisualHallwayLearning config section, as follows:

VisualHallwayLearning:
    use_recurrent: true
    sequence_length: 128
    num_layers: 1
    hidden_units: 128
    memory_size: 2048 without pooling, 1024 with pooling
    beta: 1.0e-2
    gamma: 0.99
    num_epoch: 3
    buffer_size: 1024
    batch_size: 64
    max_steps: 5.0e5
    summary_freq: 1000
    time_horizon: 64

We are increasing the agent's memory from 64 to 128 sequences, thus doubling its memory. Then, we are increasing the memory to 2,048 when not using pooling, and 1,024 when using pooling. Remember that pooling collects features and reduces the number of feature maps that are produced at every step of convolution.
Save the file after you finish editing it.
Open your command or Anaconda window and start training with the following command:

mlagents-learn config/trainer_config.yaml --run-id=vh_recurrent --train

When prompted, start the training session in the editor by pressing Play and watch the action unfold.
Wait for the agent to train, like you did for the other examples we ran. You should notice another increase in training performance, as well as the choice of actions the agent makes, which should look better coordinated.

As we can see, a slight tweaking of hyperparameters allowed us to improve the performance of the agent. Understanding the use of the many parameters that are used in training will be critical to your success in building remarkable agents. In the next section, we will look at further exercises you can use to improve your understanding and skill.

Table of Contents for Tuning recurrent hyperparameters

Create new playlist

Sign In

Sign Up

Table of Contents for
Tuning recurrent hyperparameters