To see if DQN actually overestimates the Q-values in respect to DDQN, we reported the Q-value plot in the following diagram. We also included the results of DQN (the orange line) so that we have a direct comparison between the two algorithms:
The performance of both DDQN (the blue line) and DQN (the orange line), which are represented by the average reward of the test games, is as follows:
As we expected, the Q-values are always smaller in DDQN than in DQN, meaning that the latter was actually over-estimating the values. Nonetheless, the performance on the test games doesn't seem to be impacted, meaning that those over-estimations were probably not hurting the performance of the algorithm. However, be aware that we only tested the algorithm on Pong. The effectiveness of an algorithm shouldn't be evaluated in a single environment. In fact, in the paper, the authors apply it to all 57 ALE games and reported that DDQN not only yields more accurate value estimates but leads to much higher scores on several games.