Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Algorithm diversity

Why are there so many types of RL algorithms? This is because there isn't one that is better than all the others in every context. Each one is designed for different needs and to take care of different aspects. The most notable differences are stability, sample efficiency, and wall clock time (training time). These will be more clear as we progress through the book but as a rule of thumb, policy gradient algorithms are more stable and reliable than value function algorithms. On the other hand, value function methods are more sample efficient as they are off-policy and can use prior experience. In turn, model-based algorithms are more sample efficient than Q-learning algorithms but their computational cost is much higher and they are slower.

Besides the ones just presented, there are other trade-offs that have to be taken into consideration while designing and deploying an algorithm (such as ease of use and robustness), which is not a trivial process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Algorithm diversity

Create new playlist

Sign In

Sign Up

Table of Contents for
Algorithm diversity