Deep deterministic policy gradient

If you implemented DPG with the deep neural networks that were presented in the previous section, the algorithm would be very unstable and it wouldn't be capable of learning anything. We encountered a similar problem when we extended Q-learning with deep neural networks. Indeed, to combine DNN and Q-learning in the DQN algorithm, we had to employ some other tricks to stabilize learning. The same holds true for DPG algorithms. These methods are off-policy, just like Q-learning, and as we'll soon see, some ingredients that make deterministic policies work with DNN are similar to the ones used in DQN.

DDPG (Continuous Control with Deep Reinforcement Learning by Lillicrap, and others: https://arxiv.org/pdf/1509.02971.pdf) is the first deterministic actor-critic that employs deep neural networks, for learning both the actor and the critic. This model-free, off-policy, actor-critic algorithm extends both DQN and DPG, in that it uses some insight from DQN, such as the replay buffer and the target network, to make DPG work with deep neural networks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset