The DAgger algorithm

Specifically, DAgger proceeds by iterating the following procedure. At the first iteration, a dataset D of trajectories is created from the expert policy and used to train a first policy  that best fits those trajectories without overfitting them. Then, during iteration i, new trajectories are collected with the learned policy  and added to the dataset D. After that, the aggregated dataset D with the new and old trajectories is used to train a new policy, .

As per the report in the Dagger paper (https://arxiv.org/pdf/1011.0686.pdf), there is an active on-policy learning that outperforms many other imitation learning algorithms, and it's also able to learn very complex policies with the help of deep neural networks.

Additionally, at iteration i, the policy can be modified so that the expert takes control of a number of actions. This technique better leverages the expert and lets the learner gradually assume control over the environment.

The pseudocode of the algorithm can clarify this further:

Initialize 
Initialize ( is the expert policy)

for i :
> Populate dataset with . States are given by (sometimes the expert could take the control over it) and actions are given by the expert

> Train a classifier on the aggregate dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset