Algorithm diversity

Why are there so many types of RL algorithms? This is because there isn't one that is better than all the others in every context. Each one is designed for different needs and to take care of different aspects. The most notable differences are stability, sample efficiency, and wall clock time (training time). These will be more clear as we progress through the book but as a rule of thumb, policy gradient algorithms are more stable and reliable than value function algorithms. On the other hand, value function methods are more sample efficient as they are off-policy and can use prior experience. In turn, model-based algorithms are more sample efficient than Q-learning algorithms but their computational cost is much higher and they are slower.

Besides the ones just presented, there are other trade-offs that have to be taken into consideration while designing and deploying an algorithm (such as ease of use and robustness), which is not a trivial process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset