Understanding PPO

We have avoided going too deep into the more advanced inner workings of the proximal policy optimization (PPO) algorithm, even going so far as to avoid any policy-versus-model discussion. If you recall, PPO is the reduced level (RL) method first developed at OpenAI that powers ML-Agents, and is a policy-based algorithm. In this chapter, we will look at the differences between policy-and model-based RL algorithms, as well as the more advanced inner workings of the Unity implementation.

The following is a list of the main topics we will cover in this chapter:

  • Marathon reinforcement learning
  • The partially observable Markov decision process
  • Actor-Critic and continuous action spaces
  • Understanding TRPO and PPO
  • Tuning PPO with hyperparameters

The content in this chapter is at an advanced level, and assumes that you have covered several previous chapters and exercises. For the purposes of this chapter, we will also assume that you are is able to open and run a learning environment in Unity with ML-Agents without difficulty.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset