Applying TD3 to BipedalWalker

For a direct comparison of TD3 and DDPG, we tested TD3 in the same environment that we used for DDPG: BipedalWalker-v2. 

The best hyperparameters for TD3 for this environment are listed in this table:

Hyperparameter Actor l.r. Critic l.r. DNN Architecture Buffer Size Batch Size Tau
Policy Update Freq
Sigma
Value 4e-4 4e-4 [64,relu,64,relu] 200000 64 0.005 2 0.2

The result is plotted in the following diagram. The curve has a smooth trend, and reaches good results after about 300K steps, with top peaks at 450K steps of training. It arrives very close to the goal of 300 points, but it does not actually gain them:

Performance of the TD3 algorithm on BipedalWalker-v2

The time spent finding a good set of hyperparameters for TD3 was less compared to DDPG. And, despite the fact that we are comparing the two algorithms on only one game, we think that it is a good first insight into their differences, in terms of stability and performance. The performance of both DDPG and TD3 on BipedalWalker-v2 are shown here:

DDPG versus TD3 performance comparison
If you want to train the algorithms in a harder environment, you can try BipedalWalkerHardcore-v2. It is very similar to BipedalWalker-v2, with the exception that it has ladders, stumps, and pitfalls. Very few algorithms are able to finish and solve this environment. It's also funny to see how the agent fails to pass the obstacles!

The superiority of TD3 compared to DDPG is immediately clear, both in terms of the end performance, the rate of improvement, and the stability of the algorithm. 

For all the color references mentioned in the chapter, please refer to the color images bundle at http://www.packtpub.com/sites/default/files/downloads/9781789131116_ColorImages.pdf.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset