Understanding the dataset aggregation algorithm

One of the most successful algorithms that learns from demonstrations is Dataset Aggregation (DAgger). This is an iterative policy meta-algorithm that performs well under the distribution of states induced. The most notable feature of DAgger is that it addresses the distribution mismatch by proposing an active method in which the expert teaches the learner how to recover from the learner's mistakes.

A classic IL algorithm learns a classifier that predicts expert behaviors. This means that the model fits a dataset consisting of training examples, observed by an expert. The inputs are the observations, and the actions are the desired output values. However, following the previous reasoning, the predictions of the learner affect the future state or observation visited, violating the i.i.d assumption.

DAgger deals with the change in distribution by iterating a pipeline of aggregation of new data sampled from the learner multiple times, and training with the aggregated dataset. A simple diagram of the algorithm is shown here:

The expert populates the dataset used by the classifier, but, depending on the iteration, the action performed in the environment may come from the expert or the learner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset