Learning to tune PPO 

In this section, we are going to learn to tune a modified/new control learning environment. This will allow us to learn more about some inner workings of the Unity example, but will also show you how to modify a new or modified sample on your own later. Let's begin by opening up the Unity editor so we can complete the following exercise:

  1. Open the Reacher scene, set it for learning, and run it in training. You should be able to do this part in your sleep now. Let the agent train for a substantial amount of time so you can establish a baseline, as always.
  2. From the menu, select Assets/Import Package/Custom Package. Locate Chapter_8_Assets.unitypackage from the Chapter08 folder of the books downloaded to the source code.
  3. Open up the Reacher_3_joint scene from the Assets/HoDLG/Scenes folder. This is the modified scene, but we will go through its construction as well.
  4. First, notice that there is only a single Reacher arm active, but now with three joints, as shown in the following screenshot:
Inspecting the Agent game object
  1. Notice how the arm now has three sections, with the new section called Capsule(2) and identified as Pendulum C. The order of the joints is now out of order, meaning Pendulum C is actually the middle pendulum and not the bottom.
  2. Select each of the Capsule objects and inspect their configuration and placement, as summarized in the following screenshot:
Inspecting the Capsule objects
  1. Be sure to note the Configurable Joint | Connected Body object for each of the capsules as well. This property sets the body that the object will hinge or join to. There are plenty of other properties on the Configurable Joint component that would allow you to mimic this joint interaction in any form, perhaps even biological. For example, you may want to make the joints in this arm to be more human-like by only allowing certain angles of movement. Likewise, if you were designing a robot with limited motion, then you could simulate that with this joint component as well.
  2. At this stage, we can set up and run the example. Open and set up for training a Python console or Anaconda window.
  3. Run the sample in training and observe the progress of the agent. Let the agent run for enough iterations in order to compare training performance with the baseline.

At this stage, we have our sample up and running and we are ready to start tuning new parameters in to optimize training. However, before we do that, we will step back and take a look at the C# code changes required to make the last sample possible. The next section covers the C# code changes, and is optional for those developers not interested in the code. If you plan to build your own control or marathon environments in Unity, you will need to read the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset