Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Policy iteration

Policy iteration cycles between policy evaluation, which updates under the current policy, , using formula (8), and policy improvement (9), which computes using the improved value function, . Eventually, after cycles, the algorithm will result in an optimal policy, .

The pseudocode is as follows:

Initialize  and  for every state 

while  is not stable:

    > policy evaluation
   while  is not stable:
        for each state s:
             

    > policy improvement
    for each state s:

After an initialization phase, the outer loop iterates through policy evaluation and policy iteration until a stable policy is found. On each of these iterations, policy evaluation evaluates the policy found during the preceding policy improvement steps, which in turn use the estimated value function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Policy iteration

Create new playlist

Sign In

Sign Up

Table of Contents for
Policy iteration