Proximal Policy Optimization

A work by Schulman and others shows that this is possible. Indeed, it uses a similar idea to TRPO while reducing the complexity of the method. This method is called Proximal Policy Optimization (PPO) and its strength is in the use of the first-order optimization only, without degrading the reliability compared to TRPO. PPO is also more general and sample-efficient than TRPO and enables multi updates with mini-batches. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset