Summary

In this chapter, we took a break from reinforcement learning algorithms and explored a new type of learning called imitation learning. The novelty of this new paradigm lies in the way in which the learning takes place; that is, the resulting policy imitates the behavior of an expert. This paradigm differentiates from reinforcement learning in the absence of a reward signal and in its ability to leverage the incredible source of information brought by the expert entity.

We saw that the dataset from which the learner learns can be expanded with additional state action pairs to increase the confidence of the learner in new situations. This process is called data aggregation. Moreover, new data could come from the new learned policy and, in this case, we talked about on-policy data (as it comes from the same policy learned). This integration of on-policy states with expert feedback is a very valuable approach that increases the quality of the learner.

We then explored and developed one of the most successful imitation learning algorithms, called DAgger, and applied it to learn the Flappy Bird game.

However, because imitation learning algorithms only copy the behavior of an expert, these systems cannot do better than the expert. Therefore, we introduced inverse reinforcement learning, which overcomes this problem by inferring the reward function from the expert. In this way, the policy can be learned independently of the teacher.

In the next chapter, we'll take a look at another set of algorithms for solving sequential tasks; namely, evolutionary algorithms. You'll learn the mechanisms and advantages of these black-box optimization algorithms so that you'll be able to adopt them in challenging environments. Furthermore, we'll delve into an evolutionary algorithm called evolution strategy in greater depth and implement it.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary