Adding individuality with intrinsic rewards

As we learned in Chapter 9, Rewards and Reinforcement Learning, intrinsic reward systems and the concept of agent motivation is currently implemented as just curiosity learning in ML-Agents. This whole area of applying intrinsic rewards or motivation combined with RL has wide applications to gaming and interpersonal applications such as servant agents.

In the next exercise, we are going to add intrinsic rewards to a couple of our agents and see what effect this has on the game. Open up the scene from the previous exercise and follow these steps:

Open up the ML-Agents/ml-agents/config/trainer_config.yaml file in a text editor. We never did add any specialized configuration to our agents, but we are going to rectify that now and add some extra configurations.

Add the following four new brain configurations to the file:

BlueStrikerLearning:
    max_steps: 5.0e5
    learning_rate: 1e-3
    batch_size: 128
    num_epoch: 3
    buffer_size: 2000
    beta: 1.0e-2
    hidden_units: 256
    summary_freq: 2000
    time_horizon: 128
    num_layers: 2
    normalize: false

BlueGoalieLearning:
    use_curiosity: true
    summary_freq: 1000
    curiosity_strength: 0.01
    curiosity_enc_size: 256
    max_steps: 5.0e5
    learning_rate: 1e-3
    batch_size: 320
    num_epoch: 3
    buffer_size: 2000
    beta: 1.0e-2
    hidden_units: 256 
    time_horizon: 128
    num_layers: 2
    normalize: false

RedStrikerLearning:
    use_curiosity: true
    summary_freq: 1000
    curiosity_strength: 0.01
    curiosity_enc_size: 256
    max_steps: 5.0e5
    learning_rate: 1e-3
    batch_size: 128
    num_epoch: 3
    buffer_size: 2000
    beta: 1.0e-2
    hidden_units: 256 
    time_horizon: 128
    num_layers: 2
    normalize: false

RedGoalieLearning:
    max_steps: 5.0e5
    learning_rate: 1e-3
    batch_size: 320
    num_epoch: 3
    buffer_size: 2000
    beta: 1.0e-2
    hidden_units: 256
    summary_freq: 2000
    time_horizon: 128
    num_layers: 2
    normalize: false

Note how we have also enabled use_curiosity: true on the BlueGoalieLearning and RedStrikerLearning brains. You can copy and paste most of this from the original GoalieLearning and StrikerLearning brain configurations already in the file; just pay attention to the details.
Save the file when you are done editing.
Open your Python/Anaconda console and start training with the following command:

mlagents-learn config/trainer_config.yaml --run-id=soccer_icl --train

Let the agents train for a while, and you will notice that, while they do appear to work more like individuals, their training ability is still subpar, while any improvement we do see in training is likely the cause of giving a couple of agents curiosity.

This ability to add individuality to an agent with intrinsic rewards or motivation will certainly mature as DRL does for games and other potential applications and will hopefully provide other intrinsic reward modules that may not be entirely focused on learning. However, intrinsic rewards can really do much to encourage individuality, so in the next section, we introduce extrinsic rewards to our modified example.

Another excellent application of transfer learning would be the ability to add intrinsic reward modules after agents have been trained on general tasks.

Table of Contents for Adding individuality with intrinsic rewards

Create new playlist

Sign In

Sign Up

Table of Contents for
Adding individuality with intrinsic rewards