Curriculum Learning

Curriculum Learning allows for an agent to progressively learn a difficult task by stepping up the reward function. While the reward remains absolute, the agent finds or achieves the goal in a simpler manner, and so learns the purpose of the reward. Then, as the training progresses and as the agent learns, the difficulty of receiving a reward increases, which, in turn, forces the agent to learn. 

Unity, of course, has a few samples of this, and we will look at the WallJump example of how a Curriculum Learning sample is set up in the following exercise:

  1. Open the WallJump scene from the Assets | ML-Agents | Examples | WallJump | Scenes folder.
  2. Select the Academy object in the Hierarchy window.
  3. Click both Control options on Wall Jump Academy | Brains | Control parameter as shown in the following excerpt:

Setting the multiple brains to learning
  1. This sample uses multiple brains in order to better separate the learning by task. In fact, all the brains will be trained in tandem.
  2. Curriculum Learning uses a second configuration file to describe the curriculum or steps of learning the agent will undergo.
  3. Open the ML-Agents/ml-agents/config/curricul/wall-jump folder.
  4. Open the SmallWallJumpLearning.json file in a text editor. The file is shown for reference as follows:
      {
"measure" : "progress",
"thresholds" : [0.1, 0.3, 0.5],
"min_lesson_length": 100,
"signal_smoothing" : true,
"parameters" :
{
"small_wall_height" : [1.5, 2.0, 2.5, 4.0]
}
}
  1. This JSON file defines the configuration the SmallWallJumpLearning brain will take as part of its curriculum or steps to learning. The definition for all these parameters are well documented in the Unity documentation, but we will take a look at parameters from the documentation as follows:
    • measure What to measure learning progress, and advancement in lessons by:
      • reward – Uses a measure received reward.
      • progress – Uses ratio of steps/max_steps.
    • thresholds (float array) – Points in value of measure where the lesson should be increased.
    • min_lesson_length (int) The minimum number of episodes that should be completed before the lesson can change. If a measure is set to reward, the average cumulative reward of the last min_lesson_length episodes will be used to determine if the lesson should change. Must be non-negative.
  2. What we can see by reading this file is that there are three lessons set by a measure of progress defined by the number of episodes. The episode boundaries are defined at .1 or 10%, .3 or 30%, and .5 or 50% of the total episodes. With each lesson, we set parameters defined by boundaries, and in this example the parameter is small_wall_height with a first lesson boundary of 1.5 to 2.0, a second lesson boundary of 2.0 to 2.5, and a third lesson at 2.5 to 4.0
  3. Open up a Python/Anaconda window and prepare it for training.
  4. Launch the training session with the following command:
mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/wall-jump/ --run-id=wall-jump-curriculum --train
  1. The extra bit that is highlighted adds the folder to the secondary curriculum configuration. 
  2. You will need to wait for at least half of the full training steps to run in order to see all three levels of training. 

This example introduced one technique we can use to solve the problem of sparse or difficult to achieve rewards. In the next section, we look at a specialized form of Curriculum Training called Backplay.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset