Questions

How do PG algorithms maximize the objective function?
What's the main idea behind policy gradient algorithms?
Why does the algorithm remain unbiased when introducing a baseline in REINFORCE?
What broader class of algorithms does REINFORCE belong to?
How does the critic in AC methods differ from a value function that is used as a baseline in REINFORCE?
If you had to develop an algorithm for an agent that has to learn to move, would you prefer REINFORCE or AC?
Could you use an n-step AC algorithm as a REINFORCE algorithm?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Questions