Online training

Online Imitation Learning is where you teach the agent to learn the observations of a player or another agent in real time. It also is one of the most fun and engaging ways to train agents or bots. Let's jump in and set up the tennis environment for online Imitation Learning in the next exercise:

Select the TennisArea | AgentA object and set Tennis Agent | Brain to TennisPlayer. In this IL scenario, we have one brain acting as a teacher, the player, and a second brain acting as the student, the learner.
Select the AgentB object and make sure Tennis Agent | Brain is set to TennisLearning. This will be the student brain.
Open the online_bc_config.yaml file from the ML-Agents/ml-agents/config folder. IL does not use the same configuration as PPO so the parameters will have similar names but may not respond to what you have become used to.
Scroll down in the file to the TennisLearning brain configuration as shown in the following code snippet:

 TennisLearning:
    trainer: online_bc
    max_steps: 10000
    summary_freq: 1000
    brain_to_imitate: TennisPlayer
    batch_size: 16
    batches_per_epoch: 5
    num_layers: 4
    hidden_units: 64
    use_recurrent: false
    sequence_length: 16

Looking over the hyperparameters, we can see there are two new parameters of interest. A summary of those parameters is as follows:
- trainer: online_ or offline_bc—using online or offline Behavioral Cloning. In this case, we are performing online.
- brain_to_imitate: TennisPlayer—this sets the brain that the learning brain should attempt to imitate.
  We won't make any changes to the file at this point.
Open your prepared Python/Anaconda window and launch training with the following command:

mlagents-learn config/online_bc_config.yaml --run-id=tennis_il --train --slow

After you press Play in the editor, you will be able to control the left paddle with the W, A, S, D keys. Play the game, and you may be surprised at how quickly the agent learns and can get quite good. The following is an example of the game being played:

Playing and teaching the agent with IL

Keep playing the example until completion if you like. It can also be interesting to switch players during a game, or even train the brain and use the trained model to play against later. You do remember how to run a trained model, right?

At some point while playing through the last exercise, you may have wondered why we don't we train all RL agents this way. A good question, but as you can imagine, it depends. While IL is very powerful, and quite a capable learner, it doesn't always do what we expect it to do. Also, an IL agent is only going to learn the search space (observations) it is shown and remain within those limitations. In the case of AlphaStar, IL was the main input for training, but the team also mentioned that the AI did have plenty of time to self-play, which likely accounted for many of its winning strategies. So, while IL is cool and powerful, it is not the golden goose that will solve all our RL problems. However, you are likely to have a new and greater appreciation for RL, and in particular IL, after this exercise. In the next section, we explore using offline IL.

Table of Contents for Online training

Create new playlist

Sign In

Sign Up

Table of Contents for
Online training