Model-based RL

Having a model of the environment means that the state transitions and the rewards can be predicted for each state-action tuple (without any interaction with the real environment). As we already mentioned, the model is known only in limited cases, but when it is known, it can be used in many different ways. The most obvious application of the model is to use it to plan future actions. Planning is a concept used to express the organization of future moves when the consequences of the next actions are already known. For example, if you know exactly what moves your enemy will make, you can think ahead and plan all your actions before executing the first one. As a downside, planning can be very expensive and isn't a trivial process.

A model can also be learned through interactions with the environment, assimilating the consequences (both in terms of the states and rewards) of an action. This solution is not always the best one because teaching a model could be terribly expensive in the real world. Moreover, if only a rough approximation of the environment is understood by the model, it could lead to disastrous results. 

A model, whether known or learned, can be used both to plan and to improve the policy, and can be integrated into different phases of an RL algorithm. Well-known cases of model-based RL involve pure planning, embedded planning to improve the policy, and generated samples from an approximate model.

A set of algorithms that use a model to estimate a value function is called dynamic programming (DP) and will be studied later in this chapter. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset