Feeding the agent

When we performed online IL, we only fed one agent at a time in the tennis scene. This time, however, we are going to train multiple agents from the same demonstration recording in order to improve training performance.

We have already set up for training, so let's start feeding the agent in the following exercise:

  1. Open a Python/Anaconda window and set it up for training from the new ML-Agents folder. You did reclone the source, right?
  2. Open the offline_bc_config.yaml file from the ML-Agents/ml-agents_b/config folder. The contents of the file are as follows for reference:
default:
trainer: offline_bc
batch_size: 64
summary_freq: 1000
max_steps: 5.0e4
batches_per_epoch: 10
use_recurrent: false
hidden_units: 128
learning_rate: 3.0e-4
num_layers: 2
sequence_length: 32
memory_size: 256
demo_path: ./UnitySDK/Assets/Demonstrations/<Your_Demo_File>.demo

HallwayLearning:
trainer: offline_bc
max_steps: 5.0e5
num_epoch: 5
batch_size: 64
batches_per_epoch: 5
num_layers: 2
hidden_units: 128
sequence_length: 16
use_recurrent: true
memory_size: 256
sequence_length: 32
demo_path: ./UnitySDK/Assets/Demonstrations/demo.demo
  1. Change the last line of the HallwayLearning or VisualHallwayLearning brain to the following:
HallwayLearning:
trainer: offline_bc
max_steps: 5.0e5
num_epoch: 5
batch_size: 64
batches_per_epoch: 5
num_layers: 2
hidden_units: 128
sequence_length: 16
use_recurrent: true
memory_size: 256
sequence_length: 32
demo_path: ./UnitySDK/Assets/Demonstrations/AgentRecording.demo
  1. Note that if you are using the VisualHallwayLearning brain, you will need to also change the name in the preceding config script.
  2. Save your changes when you are done editing.
  3. Go back to your Python/Anaconda window and launch training with the following command:
mlagents-learn config/offline_bc_config.yaml --run-id=hallway_il --train
  1. When prompted, press Play in the editor and watch the training unfold. You will see the agent play using very similar moves to yourself, and if you played well, the agent will quickly start learning and you should see some impressive training, all thanks to IL.

RL can be thought of as the brute-force approach to learning, while the refinement of Imitation Learning and training by observation will clearly dominate the future of agent training. Of course, is it really any wonder? After all, we simple humans learn that way.

In the next section, we look at another exciting area of deep learning, transfer learning, and how it applies to games and DRL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset