Quick recap on reinforcement learning

We first encountered reinforcement learning in Chapter 1, Machine Learning – An Introduction, when we looked at the three different types of learning processes: supervised, unsupervised, and reinforcement. In reinforcement learning, an agent receives rewards within an environment. For example, the agent might be a mouse in a maze and the reward might be some food somewhere in that maze. Reinforcement learning can sometimes feel a bit like a supervised recurrent network problem. A network is given a series of data and must learn a response.

The key distinction that makes a task a reinforcement learning problem is that the responses the agent gives changes the data it receives in future time steps. If the mouse turns left instead of right at a T section of the maze, it changes what its next state would be. In contrast, supervised recurrent networks simply predict a series. The predictions they make do not influence the future values in the series.

The AlphaGo network has already been through supervised training, but now the problem can be reformatted as a reinforcement learning task to improve the agent further. For AlphaGo, a new network was created that shares the structure and weights with the supervised network. Its training is then continued using reinforcement learning and by specifically using an approach called policy gradients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset