Model-Based RL

Reinforcement learning algorithms are divided into two classes—model-free methods and model-based methods. These two classes differ by the assumption made about the model of the environment. Model-free algorithms learn a policy from mere interactions with the environment without knowing anything about it, whereas model-based algorithms already have a deep understanding of the environment and use this knowledge to take the next actions according to the dynamics of the model.

In this chapter, we'll give you a comprehensive overview of model-based approaches, highlighting their advantages and disadvantages vis-Ă -vis model-free approaches, and the differences that arise when the model is known or has to be learned. This latter division is important because it influences how problems are approached and the tools used to solve them. After this introduction, we'll talk about more advanced cases where model-based algorithms have to deal with high-dimensional observation spaces such as images. 

Furthermore, we'll look at a class of algorithms that combine both model-based and model-free methods to learn both a model and a policy in high dimensional spaces. We'll learn their inner workings and give the reasons for using such methods. Then, to deepen our understanding of model-based algorithms, and especially of algorithms that combine both model-based and model-free approaches, we'll develop a state-of-the-art algorithm called model-ensemble trust region policy optimization (ME-TRPO) and apply it to a continuous inverted pendulum.

The following topics will be covered in this chapter:

  • Model-based methods
  • Combining model-based with model-free learning
  • ME-TRPO applied to an inverted pendulum
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset