Results

To see if DQN actually overestimates the Q-values in respect to DDQN, we reported the Q-value plot in the following diagram. We also included the results of DQN (the orange line) so that we have a direct comparison between the two algorithms:

Figure 5.8. A plot of the estimated training Q-values. The DDQN values are plotted in blue and the DQN values are plotted in orange. The x axis represents the number of steps

 

The performance of both DDQN (the blue line) and DQN (the orange line), which are represented by the average reward of the test games, is as follows:

For all the color references mentioned in the chapter, please refer to the color images bundle at http://www.packtpub.com/sites/default/files/downloads/9781789131116_ColorImages.pdf.

Figure 5.9. A plot of the mean test rewards. The DDQN values are plotted in blue and the DQN values are plotted in orangeThe x axis represents the number of steps

As we expected, the Q-values are always smaller in DDQN than in DQN, meaning that the latter was actually over-estimating the values. Nonetheless, the performance on the test games doesn't seem to be impacted, meaning that those over-estimations were probably not hurting the performance of the algorithm. However, be aware that we only tested the algorithm on Pong. The effectiveness of an algorithm shouldn't be evaluated in a single environment. In fact, in the paper, the authors apply it to all 57 ALE games and reported that DDQN not only yields more accurate value estimates but leads to much higher scores on several games. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset