Value-based algorithms

Value-based algorithms, also known as value function algorithms, use a paradigm that's very similar to the one we saw in the previous section. That is, they use the Bellman equation to learn the Q-function, which in turn is used to learn a policy. In the most common setting, they use deep neural networks as a function approximator and other tricks to deal with high variance and general instabilities. To a certain degree, value-based algorithms are closer to supervised regression algorithms.

Typically, these algorithms are off-policy, meaning they are not required to optimize the same policy that was used to generate the data. This means that these methods can learn from previous experience, as they can store the sampled data in a replay buffer. The ability to use previous samples makes the value function more sample-efficient than other model-free algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset