Roboschool

Up until this point, we have worked with discrete control tasks such as the Atari games in Chapter 5, Deep Q-Network, and LunarLander in Chapter 6, Learning Stochastic and PG Optimization. To play these games, only a few discrete actions have to be controlled, that is, approximately two to five actions. As we learned in Chapter 6Learning Stochastic and PG Optimization, policy gradient algorithms can be easily adapted to continuous actions. To show these properties, we'll deploy the next few policy gradient algorithms in a new set of environments called Roboschool, in which the goal is to control a robot in different situations. Roboschool has been developed by OpenAI and uses the famous OpenAI Gym interface that we used in the previous chapters. These environments are based on the Bullet Physics Engine (a physics engine that simulates soft and rigid body dynamics) and are similar to the ones of the famous Mujoco physical engine. We opted for Roboschool as it is open source (Mujoco requires a license) and because it includes some more challenging environments.

Specifically, Roboschool incorporates 12 environments, from the simple Hopper (RoboschoolHopper), displayed on the left in the following figure and controlled by three continuous actions, to a more complex humanoid (RoboschoolHumanoidFlagrun) with 17 continuous actions, shown on the right:

Figure 7.1. Render of RoboschoolHopper-v1 on the left and RoboschoolHumanoidFlagrun-v1 on the right

In some of these environments, the goal is to run, jump, or walk as fast as possible to reach the 100 m endpoint while moving in a single direction. In others, the goal is to move in a three-dimensional field while being careful of possible external factors, such as objects that have been thrown. Also included in the set of 12 environments is a multiplayer Pong environment, as well as an interactive environment in which a 3D humanoid is free to move in all directions and has to move toward a flag in a continuous movement. In addition to this, there is a similar environment in which the robot is bombarded with cubes to destabilize the robot, who then has to build a more robust control to keep its balance.

The environments are fully observable, meaning that an agent has a complete view of its state that is encoded in a Box class of variable size, from about 10 to 40. As we mentioned previously, the action space is continuous and it is represented by a Box class of variable size, depending on the environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset