Comparing IL and RL

Let's go more in-depth with the IL approach by highlighting the differences vis-à-vis RL. This contrast is very important. In imitation learning, the learner is not aware of any reward. This constraint can have very big implications.

Going back to our example, the apprentice can only replicate the expert's moves as closely as possible, be it in a passive or an active way. Not having objective rewards from the environment, they are constrained to the subjective supervision of the expert. Thus, even if they wanted to, they aren't able to improve and understand the teacher's reasoning.

So, IL should be seen as a way to copy the moves of the expert but without knowing its main goal. In our example, it's as if the young driver assimilates the trajectories of the teacher very well but, still, they don't know the motivations that made the teacher choose them. Without being aware of the reward, an agent trained with imitation learning cannot maximize the total reward as executed in RL.

This highlights the main differences between IL and RL. The former lacks the understanding of the main objective, and thus cannot surpass the teacher. The latter instead lacks a direct supervision signal and, in most cases, has access only to a sparse reward. This situation is clearly depicted in the following diagram:

The diagram on the left represents the usual RL cycle, while on the right, the imitation learning cycle is represented. Here, the learner doesn't receive any reward; just the state and action given by the expert.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset