Summary

In this chapter, we took a break from model-free algorithms and started discussing and exploring algorithms that learn from a model of the environment. We looked at the key reasons behind the change of paradigm that inspired us to develop this kind of algorithm. We then distinguished two main cases that can be found when dealing with a model, the first in which the model is already known, and the second in which the model has to be learned.

Moreover, we learned how the model can either be used to plan the next actions or to learn a policy. There's no fixed rule to choose one over the other, but generally, it is related to the complexity of the action and observation space and the inference speed. We then investigated the advantages and disadvantages of model-free algorithms and deepened our understanding of how to learn a policy with model-free algorithms by combining them with model-based learning. This revealed a new way to use models in very high-dimensional observation spaces such as images. 

Finally, to better grasp all the material related to model-based algorithms, we developed ME-TRPO. This proposed dealing with the uncertainty of the model by using an ensemble of models and trust region policy optimization to learn the policy. All the models are used to predict the next states and thus create simulated trajectories on which the policy is learned. As a consequence, the policy is trained entirely on the learned model of the environment.

This chapter concludes the arguments about model-based learning and, in the next one, we'll introduce new genera of learning. We'll talk about algorithms that learn by imitation. Moreover, we'll develop and train an agent that, by following the behavior of an expert, will be able to play FlappyBird.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset