Exploring the training environment

One of the things that often pushes us to success, or pushes us to learn, is failure. As humans, when we fail, one of two things happens: we try harder or we quit. Interestingly, this is not unlike a negative reward in reinforcement learning. In RL, an agent that gets a negative reward may quit exploring a path if it sees no future value, or that it predicts will not give enough benefit. However, if the agent feels like more exploration is needed, or it hasn't exhausted the path fully, it will push on and, often, this leads it to the right path. Again, this is certainly not unlike us humans. Therefore, in this section, we are going to train one of the more difficult example agents to push ourselves to learn how to fail and fix training failures.

Unity is currently in the process of building a multi-level bench marking tower environment that features multiple levels of difficulty. This will allow DRL enthusiasts, practitioners, and researchers to test their skills/models on baseline environments. The author has been told, on reasonably good authority, that this environment should be completed by early/mid 2019.

We will need to use many of the advanced features of the Unity ML-Agents toolkit ultimately get this example working. This will require you to have a good understanding of the first five chapters of this book. If you skipped those chapters to get here, please go back and review them as needed. In many places in this chapter, helpful links have been provided to previous relevant chapters.

The training sample environment we will focus on is the VisualHallway, not to be confused with the standard Hallway example. The VisualHallway differs in that it uses the camera as the complete input state into the model, while the other Unity examples we previously looked at used some form of multi-aware sensor input, often allowing the agent to see 90 to 360 degrees at all times, and be given other useful information. This is fine for most games, and, in fact, many games still allow such cheats or intuition for NPC or computer opponents as part of their AI. Putting these cheats in for a game's AI has been an accepted practice for many years, but perhaps that will soon change.

After all, good games are fun to play, and make sense to the player. Games of the not so distant past could get away with giving the AI cheats. However, now, players are expecting more, they want their AI to play by the same rules as them. The previous perception that computer AI was hindered by technological limitations is gone, and now a game AI must play by the same rules as the player, which makes our focus on getting the VisualHallway sample working/training more compelling.

There is, of course, another added benefit to teaching an AI to play/learn like a player, and that is the ability to transfer that capability to play in other environments using a concept called transfer learning. We will explore transfer learning in Chapter 10, Imitation and Transfer Learning, where we will learn how to adapt pretrained models/parameters and apply them to other environments.

The VisualHallway/Hallway samples start by dropping the agent into a long room or hallway at random. In the center of this space is a colored block, and at one end of the hallway in each corner is a colored square covering the floor. The block is either red or gold (orange/yellow) and is used to inform the agent of the target square that is the same color. The goal is for the agent to move to the correct colored square. In the standard Hallway example, the agent is given 360 degree sensor awareness. In the Visual Hallway example, the agent is only shown a camera view of the room, exactly as the player version of the game would see. This puts our agent on equal footing with a player.

Before we get to training, let's open up the example and play it as a player would, and see how we do. Follow this exercise to open the VisualHallway sample:

Ensure you have a working installation of ML-Agents and can train a brain externally in Python before continuing. Consult the previous chapter if you need help.
Open the VisualHallway scene from the Assets | ML-Agents | Examples | Hallway | Scenes folder in the Project window.

Make sure that Agent | Hallway Agent | Brain is set to VisualHallwayPlayer, as shown in the following screenshot:

Hallway Agent | Brain set to player

Press Play in the editor to run the scene, and use the W, A, S, and D keys to control the agent. Remember, the goal is to move to the square that is the same color as the center square.
Play the game and move to both color squares to see what happens when a reward is given, either negative or positive. The game screen will flash with green or red when a reward square is entered.

This game environment is typical of a first person shooter, and perfect for training an agent to play in first person as well. Training an agent to play as a human would be the goal of many an AI practitioner, and one you may or may not strive to incorporate in your game. As we will see, depending on the complexity of your game, this type of learning/training may not even be a viable option. At this point, we should look at how to set up and train the agent visually.

Table of Contents for Exploring the training environment

Create new playlist

Sign In

Sign Up

Table of Contents for
Exploring the training environment