Results

For off-policy algorithms (such as DQN), n-step learning works well with small values of n. In DQN, it has been shown that the algorithm works well with values of n between 2 and 4, leading to improvements in a wide range of Atari games. 

In the following graph, the results of our implementation are visible. We tested DQN with a three-step return. From the results, we can see that it requires more time before taking off. Afterward, it has a steeper learning curve but with an overall similar learning curve compared to DQN:

Figure 5.11. A plot of the mean test total reward. The three-step DQN values are plotted in violet and the DQN values are plotted in orangeThe x axis represents the number of steps
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset