Pseudocode

Now that all the components of DQN have been explained, we can put all the pieces together and show you the pseudocode version of the algorithm to clarify any uncertainties (don't worry if it doesn't  in the next section, you'll implement it and everything will be clearer).

The DQN algorithm involves three main parts: 

  • Data collection and storage. The data is collected by following a behavior policy (for example, -greedy).
  • Neural network optimization (performing SGD on mini-batches that have been sampled from the buffer).
  • Target update.

The pseudocode of DQN is as follows:

Initialize  function with random weight 
Initialize function with random weight
Initialize empty replay memory

for do
Initialize environment
for do
> Collect observation from the env:


> Store the transition in the replay buffer:


> Update the model using (5.4):
Sample a random minibatch from

Perform a step of GD on on
> Update target network:
Every C steps

end for

end for

Here, d is a flag that's returned by the environment that signals whether the environment is in its final state. If d=True, that is, the episode has ended, the environment has to be reset. 
 is a preprocessing step that changes the images to reduce their dimensionality (it converts the images into grayscale and resizes them into smaller images) and adds the last n frames to the current frame. Usually, n is a value between 2 and 4. The preprocessing part will be explained in more detail in the next section, where we'll implement DQN.

In DQN, the experienced replay, , is a dynamic buffer that stores a limited number of frames. In the paper, the buffer contains the last 1 million transitions and when it exceeds this dimension, it discards the older experiences. 

All the other parts have already been described. If you are wondering why the target value, , takes the  if  value, it is because there won't be any other interactions with the environment after and so  is its actual unbiased Q-value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset