For off-policy algorithms (such as DQN), n-step learning works well with small values of n. In DQN, it has been shown that the algorithm works well with values of n between 2 and 4, leading to improvements in a wide range of Atari games.
In the following graph, the results of our implementation are visible. We tested DQN with a three-step return. From the results, we can see that it requires more time before taking off. Afterward, it has a steeper learning curve but with an overall similar learning curve compared to DQN:
Figure 5.11. A plot of the mean test total reward. The three-step DQN values are plotted in violet and the DQN values are plotted in orange. The x axis represents the number of steps