The DQN algorithm

The introduction of a replay buffer and of a separate target network in a deep Q-learning algorithm has been able to control Atari games (such as Space Invaders, Pong, and Breakout) from nothing but images, a reward, and a terminal signal. DQN learns completely end to end with a combination of CNN and fully connected neural networks.

DQN has been trained separately on 49 Atari games with the same algorithm, network architecture, and hyperparameters. It performed better than all the previous algorithms, achieving a level comparable to or better than professional gamers on many games. The Atari games are not easy to solve and many of them demand complex planning strategies. Indeed, a few of them (such as the well-known Montezuma's Revenge) required a level that even DQN hasn't been able to achieve.

A particularity of these games is that, as they provide only images to the agent, they are partially observable. They don't show the full state of the environment. In fact, a single image isn't enough to fully understand the current situation. For example, can you deduce the direction of the ball in the following image?

Figure 5.1. Rendering of pong

You can't, and neither can the agent. To overcome this situation, at each point in time, a sequence of the previous observations is considered. Usually the last two to five frames are used, and in most cases, they give a pretty accurate approximation of the actual overall state.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset