Trying ICM on Hallway/VisualHallway

Not unlike the agents we train, we learn quite well from trial and error. This is the reason we practice, practice, and practice more of those very difficult tasks such as dancing, singing, or playing an instrument. RL is no different and requires the practitioner to learn the ins and outs training through the rigors of trial, error, and further exploration. Therefore, in this next exercise, we are going to combine Backplay (Curriculum Learning) and Curiosity Learning together into our old friend, the Hallway, and see what effect it has, as follows:

  1. Open the Hallway or VisualHallway scene (your preference) as we last left it, with Curriculum Learning enabled and set to simulate Backplay. 
  2. Open the trainer_config.yaml configuration file location in the ML-Agents/ml-agents/config folder.
  1. Scroll down to the HallwayLearning or VisualHallwayLearning brain configuration parameters and add the following additional configuration lines:
HallwayLearning:
use_curiosity: true
curiosity_strength: 0.01
curiosity_enc_size: 256
use_recurrent: true
sequence_length: 64
num_layers: 2
hidden_units: 128
memory_size: 256
beta: 1.0e-2
gamma: 0.99
num_epoch: 10
buffer_size: 1024
batch_size: 1000
max_steps: 5.0e5
summary_freq: 1000
time_horizon: 64
  1. This will enable the curiosity module for this example. We use the same settings for curiosity as we used for the last Pyrmarids example.
  2. Make sure this sample is prepared for curriculum Backplay as we configured it in that section. If you need to, go back and review that section and add the capability to this example before continuing. 
This may require you to create a new curricula file that uses the same parameters as we did previously. Remember that the curricula file needs to have the same name as the brain it is being used against.
  1. Open a Python/Anaconda window prepared for training and start training with the following command:
mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/hallway/ --run-id=hallway_bp_cl --train
  1. Let the training run until completion, as the results can be interesting and show some of the powerful possibilities of layering learning enhancements for extrinsic and intrinsic rewards.

This exercise showed how to run an agent with both Curriculum Learning simulating Backplay, and Curiosity Learning adding an aspect of agent motivation to the learning. As you may well imagine, intrinsic reward learning and the whole field of Motivated Reinforcement Learning may lead to some interesting advances and enhancements to our DRL.

In the next section, we will review a number of helpful exercises that should help you learn more about these concepts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset