Coding changes required for control projects

As we already mentioned, this section is optional and is for those curious about getting into the details of building their own control sample using Unity C#. It is also likely that, in the future, no coding changes will be required to modify these types of samples, and that is the other reason this section is optional.

Complete the following exercise to go through the coding changes needed to add a joint in the Reacher control example:

Select the Agent object in the Hierarchy window and then, in the Inspector window, note the Reacher Agent_3 component. This is the modified script that we will be inspecting.
Click the target icon beside the Reach Agent_3 component, and from the context menu, select Edit Script.
This will open the ReacherAgent_3.cs script in your C# code editor of choice.
The first thing to note under the declarations is the addition of new variables, highlighted in bold as follows:

public GameObject pendulumA;
public GameObject pendulumB;
public GameObject pendulumC;
public GameObject hand;
public GameObject goal;
private ReacherAcademy myAcademy;
float goalDegree;
private Rigidbody rbA;
private Rigidbody rbB;
private Rigidbody rbC;
private float goalSpeed;
private float goalSize;

Two new variables, pendulumC and rbC, are added for holding the new joints GameObject and RigidBody. Now, Rigidbody in Unity physics denotes an object that can be moved or manipulated by the physics engine.
Unity is in the process of performing an upgrade to their physics engine that will alter some of the teachings here. The current version of ML-Agents uses the old physics system, so this example will as well.
The next thing of importance to note is the addition of additional agent observations, as shown in the following CollectObservations method:

public override void CollectObservations()
    {
        AddVectorObs(pendulumA.transform.localPosition);
        AddVectorObs(pendulumA.transform.rotation);
        AddVectorObs(rbA.angularVelocity);
        AddVectorObs(rbA.velocity);

        AddVectorObs(pendulumB.transform.localPosition);
        AddVectorObs(pendulumB.transform.rotation);
        AddVectorObs(rbB.angularVelocity);
        AddVectorObs(rbB.velocity);

        AddVectorObs(pendulumC.transform.localPosition);
        AddVectorObs(pendulumC.transform.rotation);
        AddVectorObs(rbC.angularVelocity);
        AddVectorObs(rbC.velocity);

        AddVectorObs(goal.transform.localPosition);
        AddVectorObs(hand.transform.localPosition);
        
        AddVectorObs(goalSpeed);
  }

The section in bold is adding the new observations for pendulumC and rbC, which total another 13 vectors. Recall that this means we also needed to switch our brain from 33 vector observations to 46 observations, as shown in the following screenshot:

Inspecting the update ReacherLearning_3 brain

Next, we will look to the AgentAction method; this is where the Python trainer code calls the agent and tells it what movements it makes, and is as follows:

public override void AgentAction(float[] vectorAction, string textAction)
  {
        goalDegree += goalSpeed;
        UpdateGoalPosition();

        var torqueX = Mathf.Clamp(vectorAction[0], -1f, 1f) * 150f;
        var torqueZ = Mathf.Clamp(vectorAction[1], -1f, 1f) * 150f;
        rbA.AddTorque(new Vector3(torqueX, 0f, torqueZ));

        torqueX = Mathf.Clamp(vectorAction[2], -1f, 1f) * 150f;
        torqueZ = Mathf.Clamp(vectorAction[3], -1f, 1f) * 150f;
        rbB.AddTorque(new Vector3(torqueX, 0f, torqueZ));

        torqueX = Mathf.Clamp(vectorAction[3], -1f, 1f) * 150f;
        torqueZ = Mathf.Clamp(vectorAction[4], -1f, 1f) * 150f;
        rbC.AddTorque(new Vector3(torqueX, 0f, torqueZ));
    }

In this method, we are extending the code to allow the agent to move the new joint in the form of rigidbody rbC. Did you notice that the new learning brain also added more action space?
Lastly, we look at the AgentReset method to see how the agent will reset itself with the new limb, as follows:

public override void AgentReset()
    {
        pendulumA.transform.position = new Vector3(0f, -4f, 0f) + transform.position;
        pendulumA.transform.rotation = Quaternion.Euler(180f, 0f, 0f);
        rbA.velocity = Vector3.zero;
        rbA.angularVelocity = Vector3.zero;

        pendulumB.transform.position = new Vector3(0f, -10f, 0f) + transform.position;
        pendulumB.transform.rotation = Quaternion.Euler(180f, 0f, 0f);
        rbB.velocity = Vector3.zero;
        rbB.angularVelocity = Vector3.zero;

        pendulumC.transform.position = new Vector3(0f, -6f, 0f) + transform.position;
        pendulumC.transform.rotation = Quaternion.Euler(180f, 0f, 0f);
        rbC.velocity = Vector3.zero;
        rbC.angularVelocity = Vector3.zero;

        goalDegree = Random.Range(0, 360);
        UpdateGoalPosition();

        goalSize = myAcademy.goalSize;
        goalSpeed = Random.Range(-1f, 1f) * myAcademy.goalSpeed;

        goal.transform.localScale = new Vector3(goalSize, goalSize, goalSize);
    }

All this code does is reset the position of the arm to its original position and stop all movement.

That covers the only required code changes for this example. Fortunately, only one script needed to be modified. It is likely that in the future you won't have to modify these scripts at all. In the next section, we will follow up by refining the sample's training by tuning extra parameters and introducing another training optimization for policy learning methods.

Table of Contents for Coding changes required for control projects

Create new playlist

Sign In

Sign Up

Table of Contents for
Coding changes required for control projects