Why different environments?

While, for real applications, the choice of environment is dictated by the task to be learned, for research applications, usually, the choice is dictated by intrinsic features of the environment. In this latter case, the end goal is not to train the agent on a specific task, but to show some task-related capabilities.

For instance, if the goal is to create a multi-agent RL algorithm, the environment should have at least two agents with a means to communicate with one another, regardless of the end task. Instead, to create a lifelong learner (agents that continuously create and learn more difficult tasks using the knowledge acquired in previous easier tasks), the primary quality that the environment should have is the ability to adapt to new situations and a realistic domain. 

Task aside, environments can differ by other characteristics, such as complexity, observation space, action space, and reward function: 

  • Complexity: Environments can spread across a wide spectrum, from the balance of a pole to the manipulation of physical objects with a robot hand. More complex environments can be chosen to show the capability of an algorithm to deal with a large state space that mimics the complexity of the world. On the other hand, simpler ones can be used to show only some specific qualities.
  • Observation space: As we have already seen, the observation space can range from the full state of the environment to only a partial observation perceived by the perception systems, such as row images.
  • Action space: Environments with a large continuous action space challenge the agent to deal with real-value vectors, whereas discrete actions are easier to learn as they have only a limited number of actions available. 
  • Reward function: Environments with hard explorations and delayed rewards, such as Montezuma's revenge, are very challenging to solve. Surprisingly, only a few algorithms are able to reach human levels. For this reason, these environments are used as a test bed for algorithms that propose to address the exploration problem.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset