Reinforcement learning

RL currently leads the pack in advances compared to other machine learning methodologies. Note the use of the word methodology and not technology. RL is a methodology or algorithm that applies a principle we can use with neural networks, whereas, neural networks are a machine learning technology that can be applied to several methodologies. Previously, we looked at other methodologies that blended with DL, but we focused more on the actual implementation. However, RL introduces a new methodology that requires us to understand more of the inner and outer workings before we understand how to apply it.

RL was popularized by Richard Sutton, a Canadian, and current professor at the University of Alberta. Sutton has also assisted in the development of RL at Google's DeepMind, and is quite often regarded as the father of RL.

At the heart of any machine learning system is the need for training. Often, the AI agent/brain knows nothing, and then we feed it data through some automated process for it to learn. As we have seen, the most common way of doing this is called supervised training. This is when we first label our training data. We have also looked at unsupervised training, where our Generative Adversarial Networks (GANs) were trained by competing against each other. However, neither system replicated the type of learning or training we see in Biology, and that is often referred to as rewards or RL: the type of learning that lets you teach your dog to bark for a treat, fetch the paper, and use the outdoors for nature's calling, a type of learning that lets an agent explore its own environment and learn for itself. This is not unlike the type of learning a general AI would be expected to use; after all, RL is likely similar to the system we use, or so we believe.

David Silver, a former student of Prof Sutton's and now head of DeepMind, has an excellent video series on the theoretical background of RL. The first five videos are quite interesting and recommended viewing, but the later content gets quite deep and may not be for everyone. Here's the link for the videos: https://www.youtube.com/watch?v=2pWv7GOvuf0

RL defines its own type of training called by the same name. This form of reward-based training is shown in the following diagram:

Reinforcement learning

The diagram shows an agent in an environment. That agent reads the state of the environment and then decides and performs an action. This action may, or may not, give a reward, and that reward could be good or bad. After each action and possible reward, the agent collects the state of the environment again. The process repeats itself until the agent reaches a terminal or end state. That is, until it reaches the goal; perhaps it dies or just gets tired. It is important to note a couple of subtle things about the preceding diagram. First, the agent doesn't always receive a reward, meaning rewards could be delayed, until some future goal is reached. This is quite different from the other forms of learning we explored earlier, which provided immediate feedback to our training networks. Rewards can be good or bad, and it is often just as effective to negatively train agents this way, but less so for humans.

Now, as you might expect with any powerful learning model, the mathematics can be quite complex and certainly daunting to the newcomer. We won't go too far into the theoretical details other than to describe some of the foundations of RL in the next section.

Table of Contents for Reinforcement learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Reinforcement learning