Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Value-based algorithms

Value-based algorithms, also known as value function algorithms, use a paradigm that's very similar to the one we saw in the previous section. That is, they use the Bellman equation to learn the Q-function, which in turn is used to learn a policy. In the most common setting, they use deep neural networks as a function approximator and other tricks to deal with high variance and general instabilities. To a certain degree, value-based algorithms are closer to supervised regression algorithms.

Typically, these algorithms are off-policy, meaning they are not required to optimize the same policy that was used to generate the data. This means that these methods can learn from previous experience, as they can store the sampled data in a replay buffer. The ability to use previous samples makes the value function more sample-efficient than other model-free algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Value-based algorithms

Create new playlist

Sign In

Sign Up

Table of Contents for
Value-based algorithms