Best practices of deep RL

Throughout this book, we covered plenty of reinforcement learning algorithms, some of which are only upgrades (for example TD3, A2C, and so on), while others were fundamentally different from the others (such as TRPO and DPG) and propose an alternative way to reach the same objective. Moreover, we addressed non-RL optimization algorithms such as imitation learning and evolution strategies to solve sequential decision-making tasks. All of these alternatives may have created confusion and you may not know exactly which algorithm is best for a particular problem. If that is the case, don't worry, as we'll now go through some rules that you can use in order to decide which is the best algorithm to use for a given task.

Also, if you implemented some of the algorithms we went through in this book, you might find it hard to put all the pieces together to make the algorithm work properly. Deep RL algorithms are notoriously difficult to debug and train, and the training time is very long. As a result, the whole training process is very slow and arduous. Luckily, there are a few strategies that you can adopt that will prevent some terrible headaches while developing deep RL algorithms. But before looking at what these strategies are, let's deal with choosing the appropriate algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset