Adding individuality with intrinsic rewards

As we learned in Chapter 9, Rewards and Reinforcement Learning, intrinsic reward systems and the concept of agent motivation is currently implemented as just curiosity learning in ML-Agents. This whole area of applying intrinsic rewards or motivation combined with RL has wide applications to gaming and interpersonal applications such as servant agents.   

In the next exercise, we are going to add intrinsic rewards to a couple of our agents and see what effect this has on the game. Open up the scene from the previous exercise and follow these steps:

  1. Open up the ML-Agents/ml-agents/config/trainer_config.yaml file in a text editor. We never did add any specialized configuration to our agents, but we are going to rectify that now and add some extra configurations.
  1. Add the following four new brain configurations to the file:
BlueStrikerLearning:
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 128
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128
num_layers: 2
normalize: false

BlueGoalieLearning:

use_curiosity: true
summary_freq: 1000
curiosity_strength: 0.01
curiosity_enc_size: 256
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 320
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
time_horizon: 128
num_layers: 2
normalize: false

RedStrikerLearning:
use_curiosity: true
summary_freq: 1000
curiosity_strength: 0.01
curiosity_enc_size: 256
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 128
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
time_horizon: 128
num_layers: 2
normalize: false

RedGoalieLearning:
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 320
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128
num_layers: 2
normalize: false
  1. Note how we have also enabled use_curiosity: true on the BlueGoalieLearning and RedStrikerLearning brains. You can copy and paste most of this from the original GoalieLearning and StrikerLearning brain configurations already in the file; just pay attention to the details.
  2. Save the file when you are done editing.
  3. Open your Python/Anaconda console and start training with the following command:
mlagents-learn config/trainer_config.yaml --run-id=soccer_icl --train
  1. Let the agents train for a while, and you will notice that, while they do appear to work more like individuals, their training ability is still subpar, while any improvement we do see in training is likely the cause of giving a couple of agents curiosity.

This ability to add individuality to an agent with intrinsic rewards or motivation will certainly mature as DRL does for games and other potential applications and will hopefully provide other intrinsic reward modules that may not be entirely focused on learning. However, intrinsic rewards can really do much to encourage individuality, so in the next section, we introduce extrinsic rewards to our modified example.

Another excellent application of transfer learning would be the ability to add intrinsic reward modules after agents have been trained on general tasks.  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset