Applying TD3 to BipedalWalker

For a direct comparison of TD3 and DDPG, we tested TD3 in the same environment that we used for DDPG: BipedalWalker-v2.

The best hyperparameters for TD3 for this environment are listed in this table:

Hyperparameter	Actor l.r.	Critic l.r.	DNN Architecture	Buffer Size	Batch Size	Tau	Policy Update Freq	Sigma
Value	4e-4	4e-4	[64,relu,64,relu]	200000	64	0.005	2	0.2

The result is plotted in the following diagram. The curve has a smooth trend, and reaches good results after about 300K steps, with top peaks at 450K steps of training. It arrives very close to the goal of 300 points, but it does not actually gain them:

Performance of the TD3 algorithm on BipedalWalker-v2

The time spent finding a good set of hyperparameters for TD3 was less compared to DDPG. And, despite the fact that we are comparing the two algorithms on only one game, we think that it is a good first insight into their differences, in terms of stability and performance. The performance of both DDPG and TD3 on BipedalWalker-v2 are shown here:

DDPG versus TD3 performance comparison

If you want to train the algorithms in a harder environment, you can try BipedalWalkerHardcore-v2. It is very similar to BipedalWalker-v2, with the exception that it has ladders, stumps, and pitfalls. Very few algorithms are able to finish and solve this environment. It's also funny to see how the agent fails to pass the obstacles!

The superiority of TD3 compared to DDPG is immediately clear, both in terms of the end performance, the rate of improvement, and the stability of the algorithm.

For all the color references mentioned in the chapter, please refer to the color images bundle at http://www.packtpub.com/sites/default/files/downloads/9781789131116_ColorImages.pdf.

Table of Contents for Applying TD3&#xA0;to BipedalWalker

Create new playlist

Sign In

Sign Up

Table of Contents for
Applying TD3 to BipedalWalker