Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Delayed policy updates

Since high variance is attributed to an inaccurate critic, TD3 proposes to delay the update of the policy until the critic error is small enough. TD3 delays the update in an empirical way, by updating the policy only after a fixed number of iterations. In this manner, the critic has time to learn and stabilize itself, before the policy's optimization takes place. In practice, the policy remains fixed only for a few iterations, typically between 1 and 6. If set to 1, then it is the same as in DDPG. The delayed policy updates can be implemented as follows:

            ...
            q1_train_loss, q2_train_loss = sess.run([q1_opt, q2_opt], feed_dict={obs_ph:mb_obs, y_ph:y_r, act_ph: mb_act})
            if step_count % policy_update_freq == 0:
                sess.run(p_opt, feed_dict={obs_ph:mb_obs})
                sess.run(update_target_op)
            ...

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Delayed policy updates

Create new playlist

Sign In

Sign Up

Table of Contents for
Delayed policy updates