IRL

One of the biggest limitations of IL lies in its inability to learn other trajectories to reach a goal, except those learned from the expert. By imitating an expert, the learner is constrained to the range of behaviors of its teacher. They are not aware of the end goal that the expert is trying to reach. Thus, these methods are only useful when there's no intention to perform better than the teacher.

IRL is an RL algorithm, such as IL, that uses an expert to learn. The difference is that IRL uses the expert to learn its reward function. Therefore, instead of copying the demonstrations, as is done in imitation learning, IRL figures out the goal of the expert. Once the reward function is learned, the agent uses it to learn the policy.

With the demonstrations used only to understand the goal of the expert, the agent is not bound to the actions of the teacher and can finally learn better strategies. For example, a self-driving car that learns by IRL would understand that the goal is to go from point A to point B in the minimum amount of time, while reducing the damage to things and people. The car would then learn a policy by itself (for example, with an RL algorithm) that maximizes this reward function.

However, IRL also has a number of challenges that limit its applicability. The expert's demonstration may not be optimal, and, as a result, the learner may not be able to achieve its full potential and may remain stuck in the wrong reward function. The other challenge lies in the evaluation of the learned reward function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset