Introducing the OpenAI Gym framework

To implement a Q-learning algorithm we'll use the OpenAI Gym framework, which is a TensorFlow compatible toolkit for developing and comparing Reinforcement Learning algorithms.

OpenAI Gym consists of two main parts:

  • The Gym open source library: A collection of problems and environments that can be used to test Reinforcement Learning algorithms. All these environments have a shared interface, allowing you to write RL algorithms.
  • The OpenAI Gym service: A site and API allowing people to meaningfully compare the performance of their trained agents.
See more references at https://gym.openai.com.

To get started, you'll need to have Python 2.7 or Python 3.5. To install Gym, use the pip installer:

sudo pip install gym.

Once installed, you can list Gym's environments as follows:

>>>from gym import envs 
>>>print(envs.registry.all())

The output list is very long; the following is just an excerpt:

[EnvSpec(PredictActionsCartpole-v0),
EnvSpec(AsteroidsramDeterministic-v0),
EnvSpec(Asteroids-ramDeterministic-v3),
EnvSpec(Gopher-ramDeterministic-v3),
EnvSpec(Gopher-ramDeterministic-v0),
EnvSpec(DoubleDunk-ramDeterministic-v3),
EnvSpec(DoubleDunk-ramDeterministic-v0),
EnvSpec(Carnival-v0),
EnvSpec(FrozenLake-v0),....,
EnvSpec(SpaceInvaders-ram-v3),
EnvSpec(CarRacing-v0), EnvSpec(SpaceInvaders-ram-v0), .....,
EnvSpec(Kangaroo-v0)]

Each EnvSpec defines a task to resolve, for example, the FrozenLake-v0 representation is given in the following figure. The agent controls the movement of a character in a 4x4 grid world (see the following figure). Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain, and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile:

A representation of the FrozenLake v0 grid word

The surface shown previously is described using a grid, such as the following:

SFFF   (S: starting point, safe) 
FHFH (F: frozensurface, safe)
FFFH (H: hole, fall to yourdoom)
HFFG (G: goal, where the frisbee islocated)

The episode ends when we reach the goal or fall in a hole. We receive a reward of one for reaching the goal, and zero otherwise.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset