Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Policy evaluation

We just saw how using real experience to estimate the value function is an easy process. It is about running the policy in an environment until a final state is reached, then computing the return value and averaging the sampled return, as can be seen in equation (1):

Thus the expected return of a state can be approximated from the experience by averaging the sampling episodes from that state. The methods that estimate the return function using (1) are called Monte Carlo methods. Until all of the state-action pairs are visited and enough trajectory has been sampled, Monte Carlo methods guarantee convergence to the optimal policy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Policy evaluation

Create new playlist

Sign In

Sign Up

Table of Contents for
Policy evaluation