ES versus RL

ESes are an interesting alternative to RL. Nonetheless, the pros and cons must be evaluated so that we can pick the correct approach. Let's briefly look at the main advantages of ES:

  • Derivative-free methods: There's no need for backpropagation. Only the forward pass is performed for estimating the fitness function (or equivalently, the cumulative reward). This opens the door to all the non-differentiable functions, for example; hard attention mechanisms. Moreover, by avoiding backpropagation, the code gains efficiency and speed. 
  • Very general: The generality of ES is mainly due to its property of being a black-box optimization method. Because we don't care about the agent, the actions that it performs, or the states visited, we can abstract these and concentrate only on its evaluation. Furthermore, ES allows learning without explicit targets and also with extremely sparse feedback. Additionally, ESes are more general in the sense that they can optimize a much larger set of functions.
  • Highly parallelizable and robust: As we'll soon see, ES is much easier to parallelize than RL, and the computations can be spread across thousands of workers. The robustness of evolution strategies is due to the few hyperparameters that are required to make the algorithms work. For example, in comparison to RL, there's no need to specify the length of the trajectories, the lambda value, the discount factor, the number of frames to skip, and so on. Also, the ES is very attractive for tasks with a very long horizon.

On the other hand, reinforcement learning is preferred for the following key aspects:

  • Sample efficiency: RL algorithms make better use of the information that's acquired from the environment and as a consequence, they require less data and fewer steps to learn the tasks.
  • Excellent performance: Overall, reinforcement learning algorithms outperform performance evolution strategies.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset