IL, or behavioral cloning

IL, or behavioral cloning, is the process by which observations and actions are captured from a human, or perhaps another AI, and used as input into training an agent. The agent essentially becomes guided by the human and learns by their actions and observations. A set of learning observations can be received by real-time play (online) or be extracted from saved games (offline). This provides the ability to capture play from multiple agents and train them in tandem or individually. IL provides the ability to train or, in effect, program agents for tasks you may find impossible to train for using regular RL, and because of this, it will likely become a key RL technique that we use for most tasks in the near future.

It is hard to gauge the value something gives you until you see what things are like without it. With that in mind, we will first start by looking at an example that uses no IL, but certainly could benefit from it. Open up the Unity editor and follow this exercise:

  1. Open up the Tennis scene from the Assets | ML-Agents | Examples | Tennis | Scenes folder.
  2. Select and disable the extra agent training areas, TennisArea(1) to TennisArea(17).
  3. Select AgentA and make sure Tennis Agent | Brain is set to TennisLearning. We want each agent to be against the other agent in this example.
  4. Select AgentB and make sure Tennis Agent | Brain is set to TennisLearning
    In this example, for a brief instance, we are training multiple agents in the same environment. We will cover more scenarios where agents play other agents as a way of learning in Chapter 11, Building Multi-Agent Environments.
  5. Select Academy and make sure that Tennis Academy | Brains is set to TennisLearning and the Control option is enabled, as shown in the following screenshot:
Setting Control to enabled on Academy
  1. Open a Python/Anaconda window and prepare it for training. We will launch training with the following command:
mlagents-learn config/trainer_config.yaml --run-id=tennis --train
  1. Watch the training for several thousand iterations, enough to convince yourself the agents are not going to learn this task easily. When you are convinced, stop the training and move on.

You can see by just looking at this first example that ordinary training and the other advanced methods we looked at, such as Curriculum and Curiosity Learning, would be difficult to implement, and in this case could be counterproductive. In the next section, we look at how to run this example with IL in online training mode.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset