Hybrid algorithms

Advantages of both value functions and policy gradient algorithms can be merged, creating hybrid algorithms that can be more sample efficient and robust.

Hybrid approaches combine Q-functions and policy gradients to symbiotically and mutually improve each other. These methods estimate the expected Q-function of deterministic actions to directly improve the policy. 

Be aware that because AC algorithms learn and use a value function, they are categorized as policy gradients and not as hybrid algorithms. This is because the main underlying objective is that of policy gradient methods. The value function is only an upgrade to provide additional information.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset