Implementing Backplay through Curriculum Learning

In the last section, we implemented the first part of Backplay, which is having the agent start next to, or very close to the goal. The next part we need to accomplish is progressively moving the agent back to its intended starting point using Curriculum Learning. Open up the Unity editor to the VisualHallway scene again and follow these steps:

  1. Open the ML-Agents/ml-agents/config folder with a file explorer or command shell.
  2. Create a new folder called hallway and navigate to the new folder.
  3. Open a text editor or create a new JSON text file called VisualHallwayLearning.json in the new directory. JavaScript Object Notation (JSON) is intended to describe objects in JavaScript, it has become a standard for configuration settings as well.
  4. Enter the following JSON text in the new file:
{
"measure" : "rewards",
"thresholds" : [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7],
"min_lesson_length": 100,
"signal_smoothing" : true,
"parameters" :
{
"distance" : [12, 8, 4, 2, -2, -4, -8, -12]
}
  1. This configuration file defines a curriculum that we will use to train an agent on Backplay. The file defines a measure of rewards and thresholds that define when the agent will advance to the next level of training. When a reward threshold is hit for a minimum episode length of 100 steps, than the training will advance to the next distance parameter. Notice how we define the distance parameter with 12, representing a distance close to the goals, and then decreasing. You could, of course, create a function that maps different range values, but we will leave that up to you.
  2. Save the file after you are done editing.
  3.  Launch a training session from a Python/Anaconda window with the following command:
mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/hallway/ --run-id=hallway-curriculum --train

  1. After the training starts, notice how the curriculum is getting set in the Python/Anaconda window, as shown in the following screenshot:

Watching the curriculum parameters getting set in training
  1. Wait for the agent to train, and see how many levels of training it can accomplish before the end of the session.

Now, one thing we need to come clean about is that this sample is more an innovative example than a true example of Backplay. Actual Backplay is described as putting the agent at the goal and working backward. In this example, we are putting the agent almost at the goal and working backward. The difference is subtle, but, by now, hopefully you can appreciate that, in terms of training, it could be significant.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset