Summary

In this chapter, you learned about EAs, a new class of black-box algorithms inspired by biological evolution that can be applied to RL tasks. EAs solve these problems from a different perspective compared to reinforcement learning. You saw that many characteristics that we have to deal with when we design RL algorithms are not valid in evolutionary methods. The differences are in both the intrinsic optimization method and the underlying assumptions. For example, because EAs are black-box algorithms, we can optimize whatever function we want as we are no longer constrained to using differentiable functions, like we were with RL. EAs have many other advantages, as we saw throughout this chapter, but they also have numerous downsides.

Next, we looked at two evolutionary algorithms: genetic algorithms and evolution strategies. Genetic algorithms are more complex as they create offspring from two parents through crossover and mutation. Evolution strategies select the best-performing individuals from a population that has been created only by mutation from the previous generation. The simplicity of ES is one of the key elements that enables the immense scalability of the algorithm across thousands of parallel workers. This scalability has been demonstrated in the paper by OpenAI, showing the ability of ES to perform at the levels of RL algorithms in complex environments. 

To get hands-on with evolutionary algorithms, we implemented the scalable evolution strategy from the paper we cited throughout this chapter. Furthermore, we tested it on LunarLander and saw that ES is able to solve the environment with high performance. Though the results are great, ES used two to three times more steps than AC and REINFORCE to learn the task. This is the main drawback of ESes: they need a lot of experience. Despite this, thanks to their capacity to scale linearly to the number of workers, with enough computational power, you might be able to solve this task in a fraction of the time compared to reinforcement learning algorithms. 

In the next chapter, we'll go back to reinforcement learning and talk about a problem known as the exploration-exploitation dilemma. We'll see what it is and why it's crucial in online settings. Then, we'll use a potential solution to the problem to develop a meta-algorithm called ESBAS, which chooses the most appropriate algorithm for each situation. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset