The DAgger algorithm

Specifically, DAgger proceeds by iterating the following procedure. At the first iteration, a dataset D of trajectories is created from the expert policy and used to train a first policy that best fits those trajectories without overfitting them. Then, during iteration i, new trajectories are collected with the learned policy and added to the dataset D. After that, the aggregated dataset D with the new and old trajectories is used to train a new policy, .

As per the report in the Dagger paper (https://arxiv.org/pdf/1011.0686.pdf), there is an active on-policy learning that outperforms many other imitation learning algorithms, and it's also able to learn very complex policies with the help of deep neural networks.

Additionally, at iteration i, the policy can be modified so that the expert takes control of a number of actions. This technique better leverages the expert and lets the learner gradually assume control over the environment.

The pseudocode of the algorithm can clarify this further:

Initialize 
Initialize  ( is the expert policy)

for i :
    > Populate dataset  with . States are given by  (sometimes the expert could take the control over it) and actions are given by the expert 

    > Train a classifier  on the aggregate dataset

Table of Contents for The DAgger algorithm

Create new playlist

Sign In

Sign Up

Table of Contents for
The DAgger algorithm