Landing a spacecraft using REINFORCE

The algorithm is complete however, the most interesting part has yet to be explained. In this section, we'll apply REINFORCE to LunarLander-v2, an episodic Gym environment with the aim of landing a lunar lander. 

The following is a screenshot of the game in its initial position, and a hypothetical successful final position: 

This is a discrete problem, and the lander has to land at coordinates (0,0), with a penalty if it lands far from that point. The lander has a positive reward when it moves from the top of the screen to the bottom, but when it fires the engine to slow down, it loses 0.3 points on each frame.

Moreover, depending on the conditions of the landing, it receives an additional -100 or +100 points. The game is considered solved with a total of 200 points. Each game is run for a maximum of 1,000 steps.

For that last reason, we'll gather at least 1,000 steps of experience, to be sure that at least one full episode has been completed (this value is set by the steps_per_epoch hyperparameter).

REINFORCE is run calling the function with the following hyperparameters:

REINFORCE('LunarLander-v2', hidden_sizes=[64], lr=8e-3, gamma=0.99, num_epochs=1000, steps_per_epoch=1000)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset