Understanding state

The Hallway and VisualHallway examples are essentially the same game problem, but provide a different perspective, or what we may refer to in reinforcement learning as environment or game state. In the Hallway example, the agent learns by sensor input, which is something we will look at shortly, while in the VisualHallway example, the agent learns by a camera or player view. What will be helpful at this point is to understand how each example handles state, and how we can modify it.

In the following exercise, we will modify the Hallway input state and see the results:

  1. Jump back into the Hallway scene with learning enabled as we left it at the end of the last exercise. 
  2. We will need to modify a few lines of C# code, nothing very difficult, but it may be useful to install Visual Studio (Community or another version) as this will be our preferred editor. You can, of course, use any code editor you like as long as it works with Unity.
  3. Locate the Agent object in the Hierarchy window, and then, in the Inspector window, click the Gear icon over the Hallway Agent component, as shown in the following screenshot:

Opening the HallwayAgent.cs script 
  1. From the context menu, select the Edit Script option, as shown in the previous screenshot. This will open the script in your code editor of choice.

 

  1. Locate the following section of C# code in your editor:
public override void CollectObservations()
{
if (useVectorObs)
{
float rayDistance = 12f;
float[] rayAngles = { 20f, 60f, 90f, 120f, 160f };
string[] detectableObjects = { "orangeGoal", "redGoal", "orangeBlock", "redBlock", "wall" };
AddVectorObs(GetStepCount() / (float)agentParameters.maxStep);
AddVectorObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 0f, 0f));
}
}
  1. The CollectObservations method is where the agent collects its observations or inputs its state. In the Hallway example, the agent has useVectorObs set to true, meaning that it detects state by using the block of code that's internal to the if statement. All this code does is cast a ray or line from the agent in angles of 20f, 60f, 120f, and 160f degrees at a distance defined by rayDistance and detect objects defined in detectableObjects. The ray perception is done with a helper component called rayPer of the RayPerception type, and it executes rayPer.Percieve to collect the environment state it perceives. This, along with the ratio of steps, is added to the vector observations or state the agent will input. At this point, the state is 36 vectors in length. As of this version, this needs to be constructed in code, but this will likely change in the future.
  2. Alter the rayAngles line of code so that it matches the following:
float[] rayAngles = { 20f, 60f };
  1.  This has the effect of narrowing the agent's vision or perception dramatically from 180 to 60 degrees. Another way to think of it is reducing the input state.
  2. After you finish the edit, save the file and return to Unity. Unity will recompile the code when you return to the editor. 

 

  1. Locate the HallwayLearning brain in the Assets | ML-Agents | Examples | Hallway | Brains folder and change the Vector Observation | Space Size to 15, as shown in the following screenshot:
Setting the Vector Observation Space Size
  1. The reason we reduce this to 15 is that the input now consists of two angle inputs, plus one steps input. Each angle input consists of five detectable objects, plus two boundaries for seven total perceptions or inputs. Thus, two angles times seven perceptions plus one for steps, equals 15. Previously, we had five angles times seven perceptions plus one step, which equals 35.
  2. Make sure that you save the project after modifying the Brain scriptable objects.
  3. Run the example again in training and watch how the agent trains. Take some time and pay attention to the actions the agent takes and how it learns. Be sure to let this example run as long as you let the other Hallway sample run for, hopefully to completion.

Were you surprised by the results? Yes, our agent with a smaller view of the world actually trained quicker. This may seem completely counter-intuitive, but think about this in terms of mathematics. A smaller input space or state means the agent has less paths to explore, and so should train quicker. This is indeed what we saw in this example when we reduced the input space by more than half. At this point, we definitely need to see what happens when we reduce the visual state space in the VisualHallway example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset