Twin delayed deep deterministic policy gradient (TD3)

DDPG is regarded as one of the most sample-efficient actor-critic algorithms, but it has been demonstrated to be brittle and sensitive to hyperparameters. Further studies have tried to alleviate these problems, by introducing novel ideas, or by using tricks from other algorithms on top of DDPG. Recently, one algorithm has taken over as a replacement of DDPG: twin delayed deep deterministic policy gradient, or for short, TD3 (the paper is Addressing Function Approximation Error in Actor-Critic Methods: https://arxiv.org/pdf/1802.09477.pdf). We have used the word replacement here, because it's actually a continuation of the DDPG algorithms, with some more ingredients that make it more stable, and more performant.

TD3 focuses on some of the problems that are also common in other off-policy algorithms. These problems are the overestimation of the value estimate, and high-variance estimates of the gradient. For the former problem, they employ a solution similar to the one used in DQN, and for the latter, they employ two novel solutions. Let's first consider the overestimation bias problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset