Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The core

In the paper, a version of ES is used that maximizes the average objective value, as follows:

It does this by searching over a population, , that's parameterized by with stochastic gradient ascent. is the objective function (or fitness function) while is the parameters of the actor. In our problems, is simply the stochastic return that's obtained by the agent with in the environment.

The population distribution, ,is a multivariate Gaussian with a mean, , and fixed standard deviation, , as follows:

From here, we can define the step update by using the stochastic gradient estimate, as follows:

With this update, we can estimate the stochastic gradient (without performing backpropagation) using the results of the episodes from the population. We can update the parameters using one of the well-known update methods, such as Adam or RMSProp as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The core

Create new playlist

Sign In

Sign Up

Table of Contents for
The core