A broad perspective on model-based learning

Let's first remember what a model is. A model consists of the transition dynamics and rewards of an environment. Transition dynamics are a mapping from a state, s, and an action, a, to the next state, s'.

Having this information, the environment is fully represented by the model that can be used in its place. And if an agent has access to it, then the agent has the ability to predict its own future.

In the following sections, we'll see that a model can be either known or unknown. In the former case, the model is used as it is to exploit the dynamics of the environment; that is, the model provides a representation that is used in place of the environment. In the latter case, where the model of the environment is unknown, it can be learned by direct interaction with the environment. But since, in most cases, only an approximation of the environment is learned, additional factors have to be taken into account when using it.

Now that we have explained what a model is, we can see how can we use one and how it can help us to reduce the number of interactions with the environment. The way in which a model is used depends on two very important factors—the model itself and the way in which actions are chosen.

Indeed, as we just noted, the model can be known or unknown, and actions can be planned or chosen by a learned policy. The algorithms vary a lot depending on each case, so let's first elaborate on the approaches used when the model is known (meaning that we already have the transition dynamics and rewards of the environment). 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset