Machine Learning (ML) has evolved drastically over the last decade. It is no longer solely the domain of scientists or companies making headlines beating world chess champions. Now, we can easily write AI that can master most human components for a wide variety of games. We can even write AI that will learn how to play a game using information only available to a human player (MarI/O comes to mind, an ML algorithm that taught itself to play Super Mario). I remember, many years ago, spending months writing my own evolutionary learning algorithms to do something very similar to what you will accomplish in this very chapter (and hopefully it won't take months to complete!). The best news is, we don't have to write our own ML algorithms (although this is still possible if that's the route you want to take!). Instead, we will use ML-Agents to create an intelligent character (or Agent).
ML-Agents is short for Machine Learning Agents. It is an open source Unity package used to train intelligent Agents using evolutionary algorithms and reinforcement learning. An Agent can be trained over time to become significantly better at a designated task.
In this chapter, we will create an Agent that will teach itself how to navigate an environment, colliding with objects we mark as safe and avoiding objects that we designate as hazards. Along the way, we'll look at how to do the following:
There is a lot to cover in this chapter, and some of the information can be quite complex. The goal here is not to provide an in-depth look at ML-Agents and ML in general, but to get you up and running as smoothly as possible so that you can start using ML-Agents in your own projects.
This chapter differs from the previous chapters on Finite State Machines (FSM) as we won't have to write the AI behavior we expect explicitly. We also won't be writing an attack or chase state. Instead, we will provide information about the Agent's environment and assign positive and negative rewards based on the Agent's behavior. The ML algorithm will use this information and adapt its behavior over time to increase the chance of obtaining a positive reward.
In an ideal world, we would be able to write one AI character that could adapt to any environment within a game and provide an engaging experience for the player. While we're a little way off from this ideal, the introduction of easy-to-use ML algorithms is a big step in the right direction, and by the end of this chapter, you'll see the possibilities offered by ML-Agents.
This chapter assumes that you have not only completed the projects from the previous chapters but also have a good, basic knowledge of C# scripting generally, though not necessarily in Unity. In this chapter, you will use many of the assets from the project started in Chapter 8, Continuing with Intelligent Enemies, so you should read that chapter if you haven't already.
The starting project and assets can be found in the book's companion files in the Chapter10/Start folder. You can start here and follow along with this chapter if you do not have your own project already. The completed project is included in the Chapter10/End folder.
This chapter uses an asset from the Unity Asset Store, which, along with its author details, can be found at https://assetstore.unity.com/packages/3d/characters/meshtint-free-boximon-fiery-mega-toon-series-153958.
ML-Agents relies on a number of Python packages that require Python version 3.6.1 or higher.
First released in September 2017, ML-Agents has rapidly evolved with the input from ML scientists, game developers, and the wider Unity fanbase due to its open source nature. This rapid progress can, at times, make it challenging to learn how to use it, with many tutorials quickly becoming outdated. However, with the release of version 1 of ML-Agents, these significant backward-incompatible updates should slow down as the project stabilizes. This means it is a great time to jump into the world of ML in Unity!
The ML-Agents toolkit consists of the following:
Important Note
For our purposes, we only require mlagents and mlagents_env. Detailed instructions on how to install them are included in the Installing ML-Agents section.
We'll use the ML-Agents package to create a learning Agent in the Unity Editor. This process will be very similar to anything we do in Unity. We'll create a GameObject and write custom components to extend its functionality. Once we have created the Agent and are ready for training, we will then go outside the Unity Editor and run the mlagents Python package, which will use mlagents_env to talk to our Unity project and train the monster.
The learning algorithm works on a cycle with the following steps:
Every loop through this cycle is called a simulation step. Multiple steps make up an episode. As we train our agent in the Training the Agent section, we'll progress through thousands of steps to achieve our goal. But that's getting ahead of ourselves. Before we start creating and training the Agent, let's take a sneak peek at the finished project so that we know where we're heading.
In this chapter, we will create an Agent that will teach itself to move around the environment, avoiding rocks and eating food. We won't explicitly tell it what to do, only reward good behavior (eating the food) and punish bad behavior (eating/bumping into rocks). The Agent will then use this information to work out what to do.
We'll modify the world we created in the previous two chapters, creating a minimal version without the extra foliage and trees. For food, we'll modify the chick prefab we created in the previous chapters—the hunter will become the hunted.
As for the learning Agent, we'll use a new asset—a monster that fits the artistic style:
In Figure 10.1, the monster is chasing a chick (that is, the "food"). There are several rocks randomly distributed in the environment, which the monster must learn to avoid.
We already have many of the assets we need, including the level, rock, and chick assets. We only need to import the monster asset for the learning Agent and the ML-Agents package. We'll start with the monster asset.
The monster asset is premade and available on the Unity Asset Store. We'll use the model provided in this package as our learning Agent.
The process of importing packages has been covered in detail in previous chapters and will only be outlined briefly here:
Tip
Many developers offer free assets on the Unity Asset Store from models such as a monster to full-blown games. It's worth exploring the store and adding any assets you think will be useful to your collection. The assets will then be available in the Package Manager from within Unity.
Unity will create a Meshtint Free Boximon Fiery Mega Toon Series folder in your Assets folder. This is almost everything we need to get started; there's just one thing left to install: the ML-Agents package itself. Let's do that now.
With the advent of ML-Agents v1.0, the process of installing ML-Agents has been improved significantly. However, there are still a few additional steps beyond installing the package through the Package Manager. We will cover them in detail in this section.
As outlined in the Introducing ML-Agents section, the ML-Agent toolkit consists of three components: the ML-Agents Unity package and three Python packages, called mlagents, mlagents_env, and gym_unity. If you're unfamiliar with Python, don't worry; we'll go step-by-step through how to install the packages here.
Important Note
As gym_unity is a wrapper to interface with OpenAI Gym, which we're not using, we won't be using the gym_unity Python package. For more information on OpenAI Gym, please see https://gym.openai.com.
From the components in the toolkit, we can start to understand how we'll train the Agent. We'll use the ML-Agents package to create a learning Agent in the Unity Editor. This process will be very similar to anything we do in Unity. We'll create a GameObject and write custom components to extend its functionality. Once we have created the Agent and are ready for training, we will then go outside the Unity Editor and run the mlagents package, which will use mlagents_env to talk to our Unity project and train the monster.
But we're getting ahead of ourselves again; before we can do any of that, we need to install the toolkit:
Important Note
The Python installer will also install a tool called pip. We'll use pip to install the Python packages required by ML-Agents.
This command will install the mlagents Python package, which is the package we will use to train our Agent. As you can see from Figure 10.3, this will also install any dependencies that mlagents requires, including the mlagents_env package.
This command will print the parameters we can use with the mlagents-learn command. Don't worry if the number of options feels overwhelming. In reality, we'll only need the one command, and we'll go through that in detail later in the chapter.
Important Note
If ML-Agents is not in the package list, follow the instructions at https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md to add it to the list of available packages.
And that's it for the setup; we've seen what we want to achieve and imported all the assets we need to reach that goal. Next, we will create an environment where the learning will take place.
Before we dive into the world of ML, we need to create an environment to house the monster. In previous chapters, we created a level with trees, shrubs, and rocks. We'll use some of that here but will simplify it so that we can focus on training the Agent and not have to worry about level design:
This will also set all child objects to the same layer:
With the environment ready, the next step is to create the logic that will spawn the chick and rock objects before we start working on ML.
The monster will interact with the environment by colliding with the level's walls, the chick, and the rock objects. When the monster collides with a chick or rock object, that object is removed from the environment, and a new object of the same type is spawned at a random location. We also need functionality to reset the environment by removing all spawned objects and re-adding them. However, before we can implement any of this, we need to create the rock and chick objects, so let's do that now.
In this section, we'll create two prefabs: a chick and a rock. We'll base these prefabs on pre-existing assets and modify them to our needs. When we come to implement the ML algorithm, we will award the monster for eating chicks (sorry, chicks!) and penalize it for colliding with rocks:
Creating a Prefab Variant is similar to inheriting from a base class. The Prefab Variant is stored as a separate prefab but shares (or inherits) the properties of the original prefab. These properties can be overridden, or the variant extended by adding new components, differentiating it from the original. For the chick prefab, the animator and Mesh Renderer are the same as the original prefab, but we've added additional functionality in this variant with the addition of physics.
Important Note
All the data associated with the prefab is still stored with the original prefab asset. It's only the changes that are stored with the Prefab Variant. If you want to modify a property that should be applied to all the variants of the same prefab, you should modify the original prefab asset.
That's all for the chick object; next, we'll create the rock prefab:
Now that we have our prefabs of both the chick and rock, we can write the code that will instantiate and position them in the environment.
With the prefabs created, we could place a set number around the environment and attempt to train the Agent. However, in doing so, we would quickly realize that this will not work as we had hoped. The monster would collide with all of the objects in the scene before it learns not to, and as the objects don't re-spawn, the Agent will never learn to avoid the behavior we desire. There would not be enough data to train the Agent. As such, we must have a mechanic in place that spawns any chicks or rocks removed from the game. As we now have the rock and chick prefabs, we can create that functionality:
public class ObjectSpawner : MonoBehaviour
{
public int NumOfEachPrefabToSpawn = 6;
public GameObject FoodPrefab;
public GameObject RockPrefab;
private IList<Transform> SpawnLocations = new List<Transform>();
private int CurrentIndex = 0;
void Awake()
{
foreach (Transform Child in transform)
{ SpawnLocations.Add(Child); }
SpawnLocations.Shuffle();
}
}
The following points summarize the code sample:
NumOfEachPrefabToSpawn defines how many of each prefab exists in the game at any one time. Once training has begun, you can tweak this number to see how it affects the speed at which the Agent learns.
A list of possible spawn locations is stored in the SpawnLocations variable . This list is populated in the Awake function by iterating over every child transform of the current GameObject. This list is then shuffled randomly by calling a C# extension that we will write shortly.
public class ObjectSpawner : MonoBehaviour
{
public void Reset()
{
foreach (var SpawnedLoc in SpawnLocations)
{
if (SpawnedLoc.childCount > 0)
{ Destroy(SpawnedLoc.GetChild(0). gameObject); }
}
for (int i = 0; i < NumOfEachPrefabToSpawn; ++i)
{
SpawnFood();
SpawnRock();
}
}
}
The Reset function will be called by our Agent script (which we will write shortly). It removes any currently spawned objects by looping through all possible spawn locations and destroying any children. This works because when we spawn a prefab using the SpawnPrefab function, we add the prefab as a child of the spawn location, as you'll soon see.
public class ObjectSpawner : MonoBehaviour
{
public void SpawnFood()
{
SpawnPrefab(FoodPrefab);
}
public void SpawnRock()
{
SpawnPrefab(RockPrefab);
}
private void SpawnPrefab(GameObject Prefab)
{
Instantiate(Prefab, SpawnLocations[CurrentIndex], false);
CurrentIndex = (CurrentIndex + 1) % SpawnLocations.Count;
}
}
The SpawnRock and SpawnFood functions will be called by our Agent script whenever one of those objects is removed from the game. This process ensures that there is always a set number of each object in the environment.
Important Note
As the SpawnLocations list has been shuffled, we can step through them sequentially, and the objects will still be spawned at a random location. If we didn't shuffle the list, the objects would be spawned in the order that we added locations to the SpawnedLocations list.
Once we've spawned an object, we increment the CurrentIndex by 1, and by using modulus, if we reached the end of the spawn location list, it wraps around to 0. This check ensures that we can continue to spawn objects continuously without hitting an IndexOutOfRange exception.
using System.Collections.Generic;
public static class Extensions
{
public static void Shuffle<T>(this IList<T> ThisList)
{
var Count = ThisList.Count;
var Last = Count - 1;
for (var i = 0; i < Last; ++i)
{
var RandomIndex = UnityEngine.Random.Range(i, Count);
var Temp = ThisList[i];
ThisList[i] = ThisList[RandomIndex];
ThisList[RandomIndex] = Temp;
}
}
}
C# extension methods provide us with the ability to add functions to existing types without having to create a custom type. This is demonstrated perfectly with the Shuffle function. I could have created my own list class that implements the IList interface and add the Shuffle function that way, but instead, I can write an extension method that adds the functionality I require.
Important Note
An extension method is static but is called on an instance of an object.
The Shuffle function re-orders the elements in the list in a random manner. There are many resources written on the subject of random sorting, but for our purposes, this basic algorithm is all we need.
Important Note
Extensions are part of the C# language and are not specific to Unity.
With the script complete, we can add it to our scene:
The ObjectSpawner script stores a list of Transform components. This list is populated by iterating over every child object of the ObjectSpawner parent object and storing their Transform components. Specifying every possible spawn location using GameObjects instead of spawning them at a random position provides greater control over where the objects can spawn, at the cost of the additional time required to create the spawn locations.
Tip
I've enabled the Icon option for the spawn points to make them visible to aid in positioning them in the environment. Instructions on how to do this can be found in Chapter 6, Continuing the 2D Adventure.
Having functionality to spawn objects continuously is vital for our learning Agent. The Agent will move around the environment and occasionally collide with the rocks and food we've spawned in the environment. These collisions will be random at first, but over time, as the Agent learns, they will move purposefully toward the chicks and avoid the rocks. When they do collide, the chick or rock object will be destroyed, and using the spawning functionality we wrote in this section, a new object of the same type will be added to the simulation. This consistent environment will help us control the learning process. However, we still want it to be dynamic, which is why we spawn the objects at (semi) random locations to ensure that the Agent is learning to avoid the objects and not, for example, specific parts of the map. Anyway, we're getting ahead of ourselves. We still are yet to create the learning Agent, so let's do that now.
The monster will be a learning Agent. Over time, it will get better at moving around the environment, avoiding rocks, and seeking out food. We've laid the groundwork to make this possible. It is now time to create and configure the Agent. Creating a learning Agent involves three steps:
We'll start by configuring the behavior parameters as the other two steps require this.
Every learning Agent requires a Behavior Parameters component. This component defines, among other things, the size of the input and output of the learning algorithm.
We'll add one to the Agent object:
You may have noticed the warning at the bottom of the component, stating that we cannot run inference as there is no model, but we can still train. This warning is alluding to the fact that we have not provided a model (as you can see in Figure 10.16, the Model field is set to None), so we will be unable to run inference. Inference is where our Agent can "infer" things about the environment using data from a pre-trained model. As we don't yet have a pre-trained model, this can be safely ignored. Later, in the Embedding the model section, we'll step through how to run inference.
Important Note
Unity uses a cross-platform inference library, which they have codenamed Barracuda. More information on this library can be found at https://docs.unity3d.com/Packages/[email protected]/manual/index.html.
The behavior type can be changed using the Behavior Type field. There are three choices: Default, Heuristic Only, and Inference Only. Default (which is what we've selected) will train the Agent if we connect the Python trainer or will attempt to perform inference using a pre-trained model. The Heuristic option will cause the Agent to use a heuristic function, which is beyond the scope of this chapter. The Inference Only option means the Agent will only ever perform inference use a pre-trained model. This last option would be the one to select when we want to release a game using a model that we've previously trained, and we no longer want that model to evolve.
As was alluded to earlier, the Use Child Sensors option is important. At the moment, by setting the observation space to 0, we have told the learning algorithm that we won't provide any input. However, without input, our Agent won't be able to make choices based on its environment. For example, it won't know that there's a chick within reach or that it's about to collide with a rock. ML-Agents provides several special sensor components that will automatically provide data on an Agent's environment. We'll configure one of those sensors next.
Defining the correct input into our Agent's neural network is crucial if we want the monster to learn from its environment. For example, we shouldn't expect the monster to evolve chick-seeking behavior if it is never told where the chicks are. We know that we want the Agent to avoid walls and rocks while eating chicks. Therefore, at a minimum, we need to tell the Agent's neural network where these objects are in relation to our Agent. To do this, we'll use a component provided with the ML-Agents package:
Important Note
More information on raycasts can be found in Chapter 8, Continuing with Intelligent Enemies.
These tags define the objects we want to observe. Any objects with tags not on this list will be ignored.
These settings add a vertical offset to the start and end position of the ray. By setting 0.5 here, we have raised the rays to around the monster's mouth level, as shown in Figure 10.19, so they collide with any objects that are on a slight hill:
These settings are shown in Figure 10.20:
The learning algorithm will be informed if any of the rays hit an object and what the object type is. This information will be numerical only. For example, in our level, the Agent has 17 rays, and we're interested in 3 tags. The input into the neural network will consist of an array of 17 numbers. Each number will have a range of 4, where one number represents no object hit, and the other three will represent an object with a specific tag hit.
Now that we've configured the behavior parameters and the input component, it's time to bring it all together and write the script that will respond to the output from the neural network and assign rewards based on the monster's behavior in that step.
Every learning Agent requires an Agent script. This script will receive the output from the neural network and assign rewards for good behavior (positive reward) and bad behavior (negative reward). We'll add custom behavior by creating a script that inherits from Agent:
using UnityEngine;
using Unity.MLAgents;
public class MonsterAgent : Agent
{
}
Inheriting from the Agent class provides many useful helper functions. It is required for every learning Agent (although you may be able to get away with using the Agent class directly for very simple examples).
public class MonsterAgent : Agent
{
public float MoveSpeed = 1.0f;
public float TurnSpeed = 90.0f;
public float MaxVelocity = 10.0f;
private Rigidbody ThisRigidbody;
private ObjectSpawner SceneObjectSpawner;
void Awake()
{
ThisRigidbody = GetComponent<Rigidbody>();
SceneObjectSpawner = FindObjectOfType<ObjectSpawner>();
}
}
The following points summarize the code sample:
MoveSpeed, TurnSpeed, and MaxVelocity will be used shortly to move the Agent.
We store a reference to the Rigidbody component so that we can move the object based on the output from the neural network.
A reference to the ObjectSpawner class we wrote earlier in the chapter is stored so that we can reset the spawned items at the beginning of an episode.
public class MonsterAgent : Agent
{
…
public override void OnActionReceived(float[] VectorAction)
{
var MovementAction = (int)VectorAction[0];
var RotationAction = (int)VectorAction[1];
var MovementDir = Vector3.zero;
if (MovementAction == 1) {
MovementDir = transform.forward; } else if (MovementAction == 2) {
MovementDir = -transform.forward; }
var RotationDir = Vector3.zero; if (RotationAction == 1) {
RotationDir = -transform.up; }
else if (RotationAction == 2) {
RotationDir = transform.up; }
ApplyMovement(MovementDir, RotationDir);
}
}
The following points summarize the code sample:
OnActionReceived is called every time the Agent receives an action. The contents of VectorAction differ depending on whether the vector action space is Continuous or Discrete. In the Defining the network's input section, we defined our vector action space to be Discrete, have a size of 2, and each index to have a maximum value of 2.
We define the first index of the action space as MovementAction and the second index as RotationAction. This selection is entirely arbitrary. We could swap them around if we want, and the learning algorithm will adapt.
The values of MovementAction and RotationAction can be 0, 1, or 2 (as defined on the Behavior Parameters component). Consequently, we have three possible actions we can perform for both movement and rotation.
If MovementAction equals 1, we apply a forward force, moving the character forward. If MovementAction is 2, we add a backward force. If MovementAction is 0, no movement force is applied to the character.
It's a similar situation for RotationAction. When it is equal to 1, the character is rotated left, and if it is equal to 2, the character is rotated right. If RotationAction equals 0, no rotation is applied to this step.
public class MonsterAgent : Agent
{
…
private void ApplyMovement(Vector3 MovementDir, Vector3 RotationDir)
{
ThisRigidbody.AddForce(MovementDir * MoveSpeed, ForceMode.VelocityChange);
transform.Rotate(RotationDir, Time. fixedDeltaTime * TurnSpeed);
if (ThisRigidbody.velocity.sqrMagnitude > MaxVelocity) {
ThisRigidbody.velocity *= 0.95f;
}
}
There's nothing new in this function. We use the provided MovementDir and RotationDir options to move and rotate the Agent. The maximum velocity of the Agent is clamped if it is greater than MaxVelocity.
Important Note
One thing worth noting is that the ML algorithm updates in time with FixedUpdate, rather than Update, so any movement we apply using the OnActionReceived function should use fixedDeltaTime.
public class MonsterAgent : Agent
{
…
public override void OnEpisodeBegin()
{
SceneObjectSpawner.Reset();
ThisRigidbody.velocity = Vector3.zero;
transform.position = new Vector3(0, 2, 0);
transform.rotation = Quaternion.Euler(Vector3. zero);
}
}
At the start of each episode, the OnEpisodeBegin function is invoked. We use this function to reset the environment back to its initial state so that the learning can begin anew. To reset the state, we call the Reset function on the ObjectSpawner class, which destroys and re-adds the pre-determined number of chick and rock objects. We also stop the Agent's movement and reset its position and rotation. This function will be called automatically as we train our Agent.
The MonsterAgent script is almost complete. We've written the functionality to interpret the output of the learning Agent; we move the Agent based on that interpretation and reset the environment at the beginning of each episode. The last piece of functionality we need is to assign a reward. You may remember, from the Introducing ML-Agents section earlier in the chapter, that assigning a reward informs the learning algorithm when the Agent has done something we approve of, or if it is doing something we would rather it didn't. In our simulation, we will assign a positive reward for colliding with a chick, and a negative reward for colliding with rocks and walls.
public class MonsterAgent : Agent
{
…
void OnCollisionEnter(Collision OtherCollision)
{
if (OtherCollision.gameObject. CompareTag("chick"))
{
Destroy(OtherCollision.gameObject);
SceneObjectSpawner.SpawnFood();
AddReward(2f);
}
else if (OtherCollision.gameObject. CompareTag("Rock"))
{
Destroy(OtherCollision.gameObject);
SceneObjectSpawner.SpawnRock();
AddReward(-1f);
}
else if (OtherCollision.gameObject. CompareTag("Wall"))
{
AddReward(-1f);
}
}
}
The OnCollisionEnter function handles collisions with three objects: the chick, rock, and wall. Based on which object it collides with, it will add a positive or negative reward. The reward is a signal that the Agent has done something right (positive number) or something wrong (negative number). The size of the reward can be changed to create drastically different behaviors. For our Agent, we want to ensure that they seek out the chick objects, so we assign a reward value of 2 for every collision. The reward system can be difficult to get right, especially with complex simulations, and often requires trial and error.
As well as assigning rewards, when the Agent collides with a chick or rock object, it removes them from the simulation by calling the Destroy function and also requests that a replacement object is spawned in the scene using the ObjectSpawner class. By always replacing what we remove from the simulation, we ensure that there is a consistent number of good and bad objects in the scene.
Important Note
You may have noticed in the Inspector that there is a MaxStep field associated with the Monster Agent component that we did not write. This field is provided by the Agent class and inherited by the MonsterAgent class.
As discussed in the Introducing ML-Agents section, the learning algorithm works on a cycle with the following steps:
The Decision Requester component performs the second stage for us. It requests a decision to be made at regular intervals. If we did not use this component, we would have to call the Agent.RequestDecision function manually.
Wow, we've covered a lot in this section. Give yourself a pat on the back for making it this far! You've successfully created an Agent that is capable of learning. You've defined the input into the neural network using a Ray Perception Sensor component. Using this component, the Agent will be able to observe its environment using several raycasts. We've also defined the output of the neural network and wrote the functionality that responds to that output. By applying either a movement or rotational force, the Agent will move around the environment, colliding with objects. With the input and output defined, we then moved on to rewarding the monster for good behavior (and telling it off for bad behavior). By assigning a reward value whenever the monster collides with an object, our training algorithm can calculate how well the monster is doing. You've now created everything the monster needs to start learning.
So far, there hasn't been much action up to this point. There has been a fair amount of setup, and that's important in a project like this. But the good news is, it's time to take our little monster for a spin. We'll use everything we've created up to this point (the environment, the spawning functionality, and the learning agent) and bring it all together to train the Agent.
The chapter has been building up to this point. You will shortly be able to see your work up to now translate into something tangible—a pre-trained model that can be reused in any of your projects. We've created a suitable environment, populated it with a never-ending supply of opportunities for the monster to do right or wrong (by colliding with the chick or rock, respectively), and configured the learning algorithm's input and how we respond to its output. Now it is time to run the learning process, and this is where the Python packages we installed in the Installing ML-Agents section come in handy (well, essential really; without them, the monster won't be learning anything!). However, before we can run the learning command, we need to create a settings file. This file will store all the settings related to training the Agent and can have a small effect on how our monster learns:
default:
trainer: ppo
batch_size: 512
beta: 5.0e-3
buffer_size: 10240
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
learning_rate_schedule: linear
max_steps: 2.0e6
memory_size: 128
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 10000
use_recurrent: false
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
These settings can be quite technical in nature and are beyond the scope of this book. If you are interested in knowing what each setting is, you can find more information at https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md.
Tip
If you would prefer not to type and add the settings manually, you can find the complete settings file in the Chapter10/End/config folder.
Now that we have the settings file, we can start training our Agent:
cd projects/Unity-2020-By-Example/Chapter10/End
mlagents-learn config/monster_config.yaml --run-id=Monster
mlagents-learn is the command being executed. We pass it the location of the settings file and also run-id. run-id should be unique for each type of Agent. If we wanted to train an Agent with a different type of behavior, we would pass a different run-id here.
When you run the command, you should receive output similar to what is shown in Figure 10.23:
Tip
If you want more information on what type of parameters mlagents-learn accepts, you can run the mlagents-learn –help command.
Once the game starts, the Agent will run around the environment. And I really mean run, as the Agent will move much quicker than it usually would. The speed increase is normal and decreases the time required for the Agent to learn. When we embed the model and are no longer running training, the Agent will move at a more reasonable speed.
As the monster moves around the environment, the rays it casts will turn from white to red, as shown in the following screenshot. A red ray signifies that it has hit an object with a tag that we are interested in, and a white ray means no object was hit:
The training has now begun. Leave this running for a while and observe the output in the command prompt:
After every 10,000 steps, a summary is printed. As part of the summary, you get the Step count, the total time elapsed in seconds, the mean reward, and the standard deviation (Std) of the reward. As the training progresses, you want to make sure that the mean reward is trending upward. There may be an occasional dip between summaries, but a mean reward that is increasing over time shows that your Agent is learning. For example, in Figure 10.25, you can see that the monster starts with a mean reward of -17.667. A negative number tells us that the monster has performed more negative actions (colliding with a wall or rock) than positive actions (colliding with the chicks). You will notice that this number increases relatively quickly to reach 9.6; the monster now performs more positive than negative actions. Our little guy is learning!
Tip
The summary frequency can be configured in the monster_config.yaml file by altering the summary_freq field.
Every 50,000 steps, you'll see the output Saved Model. This file is the model the monster will use to run inference in the following Embedding the Model section.
I find that around 1 hour is enough time to provide good results (a mean reward of nearly 100). When you do this, you'll see the following output in your command console:
The path shown in the output is the path to the model file that represents the trained neural network for our monster. In the next section, we'll take advantage of the trained model. By embedding it in our Agent, we can take advantage of the trained network and have a functional AI that performs its role admirably without the need for hardcoding any behavior.
In the last section, the neural network controlling the monster evolved to get better at navigating the environment and eating food. When we stopped the training, a .nn file was created. This file represents the trained neural network for the monster. We do not need to re-train the Agent; we can instead use this model to achieve the behavior we expect. We could even provide this model to any number of Agents (as long as they have the same configuration as our monster), and these new Agents would happily move around the environment, avoiding walls and rocks while chasing chicks. But that's getting ahead of ourselves. Let's first use the model with the original monster:
Locate the Monster.nn model exported in the previous section. As you can see from Figure 10.26, the path to my model is located in the <project root>/models/Monster folder.
The model we trained has now been embedded. You'll notice that the warning shown in Figure 10.16 is now gone. As we've selected the default behavior type, our Agent can now operate in two modes: the training mode, where we run the mlagents-learn command, or the inference mode, where it uses an already-trained model.
The Agent's ability at this point is down to how well the neural network has been trained.
Important Note
The monster is not being trained while running inference. If it is not performing adequately, you can redo the steps in the Training the Agent section.
And that's it! You've successfully trained an Agent and embedded that trained model so that it can be used by NPCs within your game. This model can be reused by any agent that has the same number of vector observations and actions. Feel free to play around with this. For example, what would happen if you duplicated the Monster object and assigned the same model to both of them? Or how about trying to introduce the monster to the FSM project completed in the previous chapter? Have the monster chase the chick, which in turn chases you. The opportunities are (almost) limitless.
By completing this chapter, you will have gained a good understanding of how to use the ML-Agents toolkit to create intelligent Agents to use in your games. We have created a monster that learns over time how to accomplish a task we set. The Agent will observe its environment using its ray sensors, and if it sees a chick, it will actively seek them out while avoiding any walls or rocks in its path. It does all of this without us ever having to write code that explicitly tells it what to do.
By adapting the input and rewards applied to the Agent, you can create drastically different AI. Want an Agent to race around a track instead? Add a reward for progressing around the track, while keeping the negative reward for hitting walls. How about making the monster shoots chicks instead of eating them? Give the little guy a gun and reward him for each projectile that hits its target. While I'm oversimplifying here – there will undoubtedly be a lot of tweaks and trial and error involved – hopefully, you can see the benefits of using ML-Agents in your own projects; and with Unity making ML so accessible to a wider audience, we will start to see even more novel uses in the future.
In this book, we have now seen two distinct methods of imbuing our game with AI characters: the FSM we started in Chapter 7, Creating Artificial Intelligence and ML-Agents. Both approaches have their pros and cons, and it is up to you to decide which (if any) approach is suitable for your project. In the next chapter, we will move away from AI and enter the world of virtual reality (VR). We'll look at how to set up VR, configuring input and creating a realistic environment complete with light probes.
Q1. The ML-Agents toolkit includes packages written in which language?
A. Rust
B. JavaScript
C. Python
D. Ruby
Q2. Each step of the learning algorithm goes through which cycle?
A. Observation-decision-action-reward
B. Decision-reward-action-observation
C. Observation-action-reward-decision
D. Action-observation-decision-reward
Q3. The command used to train an Agent is…
A. ml-agents –learn
B. mlagents-learn
C. ML-Agents learn
D. mlagents_env-learn
Q4. The name of the input vector into the ML algorithm is…
A. Vector Action
B. Observation List
C. Pre-Action Collection
D. Vector Observation
Q5. The name of the output from the ML algorithm is…
A. Vector Action
B. Post-Observation List
C. Vector Observation
D. Action List
Q6. The name of the function that is called at the beginning of each episode is…
A. ResetState
B. OnEpisodeBegin
C. SimulationStepComplete
D. EpisodeBegin
Q7. You can embed a generated model into a different Agent if…
A. The number of vector observations and actions are the same.
B. The name of the Agent is the same.
C. The agent has the same tag.
D. All of the above.
The following resources include more information on the topics discussed in this chapter: