Imitation Transfer Learning

One of the problems with Imitation Learning is that it often focuses the agent down a path that limits its possible future moves. This isn't unlike you being shown the improper way to perform a task and then doing it that way, perhaps without thinking, only to find out later that there was a better way. Humanity, in fact, has been prone to this type of problem over and over again throughout history. Perhaps you learned as a child that swimming right after eating was dangerous, only to learn later in life through your own experimentation, or just common knowledge, that that was just a myth, a myth that was taken as fact for a very long time. Training an agent through observation is no different you limit the agent's vision in many ways to a narrow focus that is limited by what it was taught. However, there is a way to allow an agent to revert back to the partial brute-force or trial-and error exploration in order to expand its training.

With ML-Agents we can combine IL with a form oftransfer learningin order to allow an agent to learn first from observation, then by furthering its training by learning from the once student. This form of IL chaining, if you will, allows you to train an agent to auto-train multiple agents. Let's open up Unity to the TennisIL scene and follow the next exercise:

Select the TennisArea | Agent object and in the Inspector, disable the BC Teacher Helper component, and then add a new Demonstration Recorder as shown in the following screenshot:

Checking that the BC Teacher is attached to the Agent

BC Teacher Helper is a recorder that works just like the Demonstration Recorder. The BC recorder allows you to turn the recording on and off as the agent runs, which is perfect for online training, but at the time of writing, the component was not working.

Make sure Academy is set to Control the TennisLearning brain.
Save the scene and project.
Open a Python/Anaconda window and launch training with the following command:

mlagents-learn config/online_bc_config.yaml --run-id=tennis_il --train --slow

Press Play when prompted to run the game in the editor. Control the blue paddle with the W, A, S, D keys and play for a few seconds to warm up.
After you are warmed up, press the R key to begin recording a demo observation. Play the game for several minutes and let the agent become capable. After the agent is able to return the ball, stop the training session.

This will not only train the agent, which is fine, but it will also create a demo recording playback we can use to further train the agents to learn how to play each other in a similar way to how AlphaStar was trained. We will set up our tennis scene to now run in offline training mode with multiple agents in the next section.

Table of Contents for Imitation Transfer Learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Imitation Transfer Learning