Creating your own environment

For educational purposes, in this book, we have predominantly used fast and small-scale tasks that could best fit our needs. However, there are plenty of simulators in existence for locomotion tasks (such as Gazebo, Roboschool, and Mujoco), mechanical engineering, transportation, self-driving cars, security, and many more. These existing environments are diverse, but there isn't one for every possible application. Thus, in some situations, you may find yourself in charge of creating your own.

The reward function by itself is difficult to design, but it is a key part of RL. With the wrong reward function, the environment can be impossible to solve and the agent may learn the wrong behaviors. In Chapter 1, The Landscape of Reinforcement Learning, we gave the example of the boat-racing game, in which the boat maximized the reward by driving in a circle to capture repopulating targets instead of running toward the end of the trajectory as fast as possible. These are the kinds of behaviors to avoid while designing the reward function. 

The general advice for designing the reward function (that can be applied in any environment) is to use positive rewards to incentive exploration and discourage the terminal states or negative rewards if the goal is to reach a terminal state as quickly as possible. The shape of the reward function is important to consider. Throughout this book, we have warned against sparse rewards. An optimal reward function should offer a smooth and dense function. 

If, for some reason, the reward function is very difficult to put into formulas, there are two additional ways in which a supervision signal can be provided:

  • Give a demonstration of the task using imitation learning or inverse reinforcement learning.
  • Use human preferences to provide feedback about the agent's behavior. 
The latter point is still a novel approach and if you are interested in it, you may find the paper Deep Reinforcement Learning from Policy-Dependent Human Feedback an interesting read (https://arxiv.org/abs/1902.04257).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset