Actor-Critic algorithms

Actor-Critic (AC) algorithms are on-policy policy gradient algorithms that also learn a value function (generally a Q-function) called a critic to provide feedback to the policy, the actor. Imagine that you, the actor, want to go to the supermarket via a new route. Unfortunately, before arriving at the destination, your boss calls you requiring you to go back to work. Because you didn't reach the supermarket, you don't know if the new road is actually faster than the old one. But if you reached a familiar location, you can estimate the time you'll need to go from there to the supermarket and calculate whether the new path is preferable. This estimate is what the critic does. In this way, you can improve the actor even though you didn't reach the final goal.

Combining a critic with an actor has been shown to be very effective and is commonly used in policy gradient algorithms. This technique can also be combined with other ideas used in policy optimization, such as trust-region algorithms. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset