Analyzing the results on Flappy Bird

Before showing the results of the imitation learning approach, we want to provide some numbers so that you can compare these with those of a reinforcement learning algorithm. We know that this is not a fair comparison (the two algorithms work on very different conditions), but nevertheless, they underline why imitation learning can be rewarding when an expert is available.

The expert has been trained with proximal policy optimization for about 2 million steps and, after about 400,000 steps, reached a plateau score of about 138.

We tested DAgger on Flappy Bird with the following hyperparameters:

Hyperparameter	Variable name	Value
Learner hidden layers	hidden_sizes	16,16
DAgger iterations	dagger_iterations	8
Learning rate	p_lr	1e-4
Number of steps for every DAgger iteration	step_iterations	100
Mini-batch size	batch_size	50
Training epochs	train_epochs	2000

The plot in the following screenshot shows the trend of the performance of DAgger with respect to the number of steps taken:

The horizontal line represents the average performance reached by the expert. From the results, we can see that a few hundred steps are sufficient to reach the performance of the expert. However, compared with the experience required by PPO to train the expert, this represents about a 100-fold increase in sample efficiency.

Again, this is not a fair comparison as the methods are in different contexts, but it highlights that whenever an expert is available, it is suggested that you use an imitation learning approach (perhaps at least to learn a starting policy).

Table of Contents for Analyzing the results on Flappy Bird

Create new playlist

Sign In

Sign Up

Table of Contents for
Analyzing the results on Flappy Bird