Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Results on RoboSchoolInvertedPendulum

The performance graph is shown in the following diagram:

The reward is plotted as a function of the number of interactions with the real environment. After 900 steps and about 15 games, the agent achieves the top performance of 1,000. The policy updated itself 15 times and learned from 750,000 simulated steps. From a computational point of view, the algorithm trained for about 2 hours on a mid-range computer.

We noted that the results have very high variability and, if trained with different random seeds, you can obtain very different performance curves. This is also true for model-free algorithms, but here, the differences are more acute. One reason for this may be the different data collected in the real environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Results on&#xA0;RoboSchoolInvertedPendulum

Create new playlist

Sign In

Sign Up

Table of Contents for
Results on RoboSchoolInvertedPendulum