Landing a spacecraft using AC 

We applied AC to LunarLander-v2, the same environment used for testing REINFORCE. It is an episodic game, and as such, it doesn't fully emphasize the main qualities of the AC algorithm. Nonetheless, it provides a good testbed, and you can freely test it in another environment.

We call the AC function with the following hyperparameters:

AC('LunarLander-v2', hidden_sizes=[64], ac_lr=4e-3, cr_lr=1.5e-2, gamma=0.99, steps_per_epoch=100, num_epochs=8000)

The resulting plot that shows the total reward accumulated in the training epochs is as follows: 

You can see that AC is faster than REINFORCE, as shown in the following plot. However, it is less stable, and after about 200,000 steps, the performance declines a little bit, fortunately continuing to increment afterward:

In this configuration, the AC algorithm updates the actor and critic every 100 steps. In theory, you could use a smaller steps_per_epochs but, usually, it makes the training more unstable. Using a longer epoch can stabilize the training, but the actor learns more slowly. It's all about finding a good trade-off and good learning rates.

For all the color references mentioned in the chapter, please refer to the color images bundle at http://www.packtpub.com/sites/default/files/downloads/9781789131116_ColorImages.pdf.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset