Marathon RL

So far, our focus has been on discrete actions and episodic environments, where the agent often learns to solve a puzzle or accomplish some task. The best examples of such environments are GridWorld, and, of course, the Hallway/VisualHallway samples, where the agent discretely chosses actions such as up, left, down, or right, and, using those actions, has to navigate to some goal. While these are great environments to play with and learn the basic concepts of RL, they can be quite tedious environments to learn from, since results are not often automatic and require extensive exploration. However, in marathon RL environments, the agent is always learning by receiving rewards in the form of control feedback. In fact, this form of RL is analogus to control systems for robotics and simulations. Since these environments are rich with rewards in the form of feedback, they provide us with better immediate feedback when we alter/tune hyperparameters, which will make these types of environments perfect for our own learning purposes.

Unity provides several examples of marathon RL environments, and at the time of writing featured the Crawler, Reacher, Walker, and Humanoid example environments, but these will likely be changed in the future.

Marathon environments are constructed differently, and we should probably understand some of these differences before going any further. Open up the Unity editor and your Python command window of choice, set up to run mlagents-learn, and complete the following the exercise:

Open the CrawlerDynamicTarget example scene from the Assets/ML-Agents/Examples/Crawler/Scenes folder. This example features an agent with four movable limbs, each with two joints that can move as well. The goal is for the agent to move toward some dynamic target that keeps changing.
Select the DynamicPlatform | Crawler object in the Hierarchy window and take note of the Crawler Agent component and CrawlerDynamicLearning brain, as shown in the following
screenshot:

Inspecting the Crawler agent and brain

Notice how the space size of the brain is 129 vector observations and 20 continuous actions. A continuous action returns a value that determines the degree to which a joint may rotate, thus allowing the agent to learn how to coordinate these joint actions into movements that will allow it to crawl to a goal.

Click the target icon beside the Crawler Agent component, and from the context menu, select Edit Script.
After the script opens, scroll down and look for the CollectObservations method:

public override void CollectObservations()
{
  jdController.GetCurrentJointForces();

  AddVectorObs(dirToTarget.normalized);
  AddVectorObs(body.transform.position.y);
  AddVectorObs(body.forward);
  AddVectorObs(body.up);
  foreach (var bodyPart in jdController.bodyPartsDict.Values)
  {
    CollectObservationBodyPart(bodyPart);
  }
}

Again, the code is in C#, but it should be fairly self-explanatory as to what inputs the agent is perceiving. We can first see that the agent takes the direction to target, its up and forward, as well as observations from each body part as input.
Select Academy in the scene and make sure the Brain configuration is set for Control (learning).
From your previously prepared command window or Anaconda window, run the mlagents-learn script as follows:

mlagents-learn config/trainer_config.yaml --run-id=crawler --train

Quite quickly after the training begins, you will see the agent making immediate measurable progress.

This agent can impressively train very quickly, and will be incredibly useful for testing our knowledge of how RL works in the coming sections. Feel free to look through and explore this sample, but avoid tuning any parameters, as we will begin doing that in the next section.

Table of Contents for Marathon RL

Create new playlist

Sign In

Sign Up

Table of Contents for
Marathon RL