Asynchronous methods

We have seen a lot of interesting methods in this chapter, but they all suffer from the constraint of being very slow to train. This isn't such a problem when we are running on basic control problems, such as the cart-pole task. But for learning Atari games, or the even more complex human tasks that we might want to learn in the future, the days to weeks of training time are far too long.

A big part of the time constraint, for both policy gradients and actor-critic, is that when learning online, we can only ever evaluate one policy at a time. We can get significant speed improvements by using more powerful GPUs and bigger and bigger processors; the speed of evaluating the policy online will always act as a hard limit on performance.

This is the problem that asynchronous methods aim to solve. The idea is to train multiple copies of the same neural networks across multiple threads. Each neural network trains online against a separate instance of the environment running on its thread. Instead of updating each neural network per training step, the updates are stored across multiple training steps. Every x training steps the accumulated batch updates from each thread are summed together and applied to all the networks. This means network weights are being updated with the average change in parameter values across all the network updates.

This approach has been shown to work for policy gradients, actor-critic, and Q-learning. It results in a big improvement to training time and even improved performance. The best version of asynchronous methods was found to be asynchronous advantage actor-critic, which, at the time of writing , is said to be the most successful generalized game learning algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset