Understanding PPO

We have avoided going too deep into the more advanced inner workings of the proximal policy optimization (PPO) algorithm, even going so far as to avoid any policy-versus-model discussion. If you recall, PPO is the reduced level (RL) method first developed at OpenAI that powers ML-Agents, and is a policy-based algorithm. In this chapter, we will look at the differences between policy-and model-based RL algorithms, as well as the more advanced inner workings of the Unity implementation.

The following is a list of the main topics we will cover in this chapter:

Marathon reinforcement learning
The partially observable Markov decision process
Actor-Critic and continuous action spaces
Understanding TRPO and PPO
Tuning PPO with hyperparameters

The content in this chapter is at an advanced level, and assumes that you have covered several previous chapters and exercises. For the purposes of this chapter, we will also assume that you are is able to open and run a learning environment in Unity with ML-Agents without difficulty.

Table of Contents for Understanding PPO

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding PPO