Feeding the agent

When we performed online IL, we only fed one agent at a time in the tennis scene. This time, however, we are going to train multiple agents from the same demonstration recording in order to improve training performance.

We have already set up for training, so let's start feeding the agent in the following exercise:

Open a Python/Anaconda window and set it up for training from the new ML-Agents folder. You did reclone the source, right?
Open the offline_bc_config.yaml file from the ML-Agents/ml-agents_b/config folder. The contents of the file are as follows for reference:

default:
    trainer: offline_bc
    batch_size: 64
    summary_freq: 1000
    max_steps: 5.0e4
    batches_per_epoch: 10
    use_recurrent: false
    hidden_units: 128
    learning_rate: 3.0e-4
    num_layers: 2
    sequence_length: 32
    memory_size: 256
    demo_path: ./UnitySDK/Assets/Demonstrations/<Your_Demo_File>.demo

HallwayLearning:
    trainer: offline_bc
    max_steps: 5.0e5
    num_epoch: 5
    batch_size: 64
    batches_per_epoch: 5
    num_layers: 2
    hidden_units: 128
    sequence_length: 16
    use_recurrent: true
    memory_size: 256
    sequence_length: 32
    demo_path: ./UnitySDK/Assets/Demonstrations/demo.demo

Change the last line of the HallwayLearning or VisualHallwayLearning brain to the following:

HallwayLearning:
    trainer: offline_bc
    max_steps: 5.0e5
    num_epoch: 5
    batch_size: 64
    batches_per_epoch: 5
    num_layers: 2
    hidden_units: 128
    sequence_length: 16
    use_recurrent: true
    memory_size: 256
    sequence_length: 32
    demo_path: ./UnitySDK/Assets/Demonstrations/AgentRecording.demo

Note that if you are using the VisualHallwayLearning brain, you will need to also change the name in the preceding config script.
Save your changes when you are done editing.
Go back to your Python/Anaconda window and launch training with the following command:

mlagents-learn config/offline_bc_config.yaml --run-id=hallway_il --train

When prompted, press Play in the editor and watch the training unfold. You will see the agent play using very similar moves to yourself, and if you played well, the agent will quickly start learning and you should see some impressive training, all thanks to IL.

RL can be thought of as the brute-force approach to learning, while the refinement of Imitation Learning and training by observation will clearly dominate the future of agent training. Of course, is it really any wonder? After all, we simple humans learn that way.

In the next section, we look at another exciting area of deep learning, transfer learning, and how it applies to games and DRL.

Table of Contents for Feeding the agent

Create new playlist

Sign In

Sign Up

Table of Contents for
Feeding the agent