Understanding Backplay

In late 2018, Cinjon Resnick released an innovative paper, titled Backplay: Man muss immer umkehren, (https://arxiv.org/abs/1807.06919) that introduced a refined form of Curriculum Learning called Backplay. The basic premise is that you start the agent more or less at the goal, and then progressively move the agent back during training. This method may not work for all situations, but we will use this method with Curriculum Training to see how we can improve the VisualHallway example in the following exercise:

Open the VisualHallway scene from the Assets | ML-Agents | Examples | Hallway | Scenes folder.
Make sure the scene is reset to the default starting point. If you need to, pull down the source from ML-Agents again.
Set the scene for learning using the VisualHallwayLearning brain, and make sure that the agent is just using the default visual observations of 84x84.

Select the Academy object and in the Inspector window add a new Hallway Academy | Reset Parameter called distance, as shown in the following excerpt:

Setting a new Reset Parameter on the Academy

You can use Reset Parameters for more than just Curriculum Learning, as they can help you easily configure training parameters within the editor. The parameter we are defining here is going to set the distance, the agent is away from the back goal region. This sample is intended to show the concept of Backplay, and in order to properly implement it we would need to move the agent right in front of the proper goal—we will defer from doing this for now.
Select the VisualHallwayArea | Agent and open the Hallway Academy script in your code editor of choice.

Scroll down to the AgentReset method and adjust the top line to that shown as follows:

public override void AgentReset()
{
  float agentOffset = academy.resetParameters["distance"];
  float blockOffset = 0f;
  // ... rest removed for brevity

This single line of code will adjust the starting offset of the agent to the now preset Reset Parameters of the Academy. Likewise, as the Academy updates those parameters during training, the agent will also see updated values.
Save the file and return to the editor. The editor will recompile your code changes and let you know if everything is okay. A red error in the console will typically mean you have a compiler error, likely caused by incorrect syntax.
Open a prepared Python/Anaconda window and run the training session with the following command:

mlagents-learn config/trainer_config.yaml --run-id=vh_backplay --train

This will run the session in regular mode, without Curriculum Learning, but it will adjust the starting position of the agent to be closer to the goals. Let this sample run and see how well the agent performs now that it starts so close to the goals.

Let the training run for a while and observe the difference in training from the original. One thing you will notice is that the agent can't help but run into the reward now, which is what we are after. The next piece we need to implement is the Curriculum Learning part, where we will move the agent back as it learns to find the reward in the next section.

Table of Contents for Understanding Backplay

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding Backplay