Preprocessing

The frames in Atari are 210 x 160 pixels with RGB color, thus having an overall size of 210 x 160 x 3. If a history of 4 frames was used, the input would have a dimension of 210 x 160 x 12. Such dimensionality can be computationally demanding and it could be difficult to store a large number of frames in the experienced buffer. Therefore, a preprocessing step to reduce the dimensionality is necessary. In the original DQN implementation, the following preprocessing pipeline is used:

  • RGB colors are converted into grayscale
  • The images are downsampled to 110 x 84 and then cropped to 84 x 84
  • The last three to four frames are concatenated to the current frame
  • The frames are normalized

Furthermore, because the games are run at a high frame rate, a technique called frame-skipping is used to skip  consecutive frames. This technique allows the agent to store and train on fewer frames for each game without significantly degrading the performance of the algorithms. In practice, with the frame-skipping technique, the agent selects an action every  frames and repeats the action on the skipped frames. 

In addition, in some environments, at the start of each game, the agent has to push the fire button in order to start the game. Also, because of the determinism of the environment, some no-ops are taken on the reset of the environment to start the agent in a random position.

Luckily for us, OpenAI released an implementation of the preprocessing pipeline that is compatible with the Gym interface. You can find it in this book's GitHub repository in the atari_wrappers.py file. Here, we will give just a brief explanation of the implementation:

  • NoopResetEnv(n): Takes n no-ops on reset of the environment to provide a random starting position for the agent.
  • FireResetEnv(): Fires on reset of the environment (required only in some games).

  • MaxAndSkipEnv(skip): Skips skip frames while taking care of repeating the actions and summing the rewards.
  • WarpFrame(): Resizes the frame to 84 x 84 and converts it into grayscale.
  • FrameStack(k): Stacks the last k frames.

All of these functions are implemented as a wrapper. A wrapper is a way to easily transform an environment by adding a new layer on top of it. For example, to scale the frames on Pong, we would use the following code:

env = gym.make('Pong-v0')
env = ScaledFloatFrame(env)

 A wrapper has to inherit the gym.Wrapper class and override at least one of the following methods: __init__(self, env), step, reset, render, close, or seed.

We won't show the implementation of all the wrappers listed here as they are outside of the scope of this book, but we will use FireResetEnv and WrapFrame as examples to give you a general idea of their implementation. The complete code is available in this book's GitHub repository:

class FireResetEnv(gym.Wrapper):
def __init__(self, env):
"""Take action on reset for environments that are fixed until firing."""
gym.Wrapper.__init__(self, env)
assert env.unwrapped.get_action_meanings()[1] == 'FIRE'
assert len(env.unwrapped.get_action_meanings()) >= 3

def reset(self, **kwargs):
self.env.reset(**kwargs)
obs, _, done, _ = self.env.step(1)
if done:
self.env.reset(**kwargs)
obs, _, done, _ = self.env.step(2)
if done:
self.env.reset(**kwargs)
return obs

def step(self, ac):
return self.env.step(ac)

First, FireResetEnv inherits the Wrapper class from Gym. Then, during the initialization, it checks the availability of the fire action by unwrapping the environment through env.unwrapped. The function overrides the reset function by calling reset, which was defined in the previous layer with self.env.reset, then takes a fire action by calling self.env.step(1) and an environment-dependent action, self.env.step(2)

WrapFrame has a similar definition:

class WarpFrame(gym.ObservationWrapper):
def __init__(self, env):
"""Warp frames to 84x84 as done in the Nature paper and later work."""
gym.ObservationWrapper.__init__(self, env)
self.width = 84
self.height = 84
self.observation_space = spaces.Box(low=0, high=255,
shape=(self.height, self.width, 1), dtype=np.uint8)
def observation(self, frame):
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
frame = cv2.resize(frame, (self.width, self.height), interpolation=cv2.INTER_AREA)
return frame[:, :, None]

This time, WarpFrame inherits the properties from gym.ObservationWrapper and creates a Box space with values between 0 and 255 and with the shape 84 x 84. When observation() is called, it converts the RGB frames into grayscale and resizes the images to the chosen shape.

We can then create a function, make_env, to apply every wrapper to an environment:

def make_env(env_name, fire=True, frames_num=2, noop_num=30, skip_frames=True):
env = gym.make(env_name)
if skip_frames:
env = MaxAndSkipEnv(env) # Return only every `skip`-th frame
if fire:
env = FireResetEnv(env) # Fire at the beginning
env = NoopResetEnv(env, noop_max=noop_num)
env = WarpFrame(env) # Reshape image
env = FrameStack(env, frames_num) # Stack last 4 frames
return env

The only preprocessing step that is missing is the scaling of the frame. We'll take care of scaling immediately before giving the observation frame as input to the neural network. This is because FrameStack uses a particular memory-efficient array called a lazy array, which is lost whenever scaling is applied as a wrapper.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset