Model-based methods

Model-free algorithms are a formidable kind of algorithm that have the ability to learn very complex policies and accomplish objectives in complicated and composite environments. As demonstrated in the latest works by OpenAI (https://openai.com/five/) and DeepMind (https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii), these algorithms can actually show long-term planning, teamwork, and adaptation to unexpected situations in challenge games such as StarCraft and Dota 2.

Trained agents have been able to beat top professional players. However, the biggest downside is in the huge number of games that need to be played in order to train agents to master these games. In fact, to achieve these results, the algorithms have been scaled massively to let the agents play hundreds of years' worth of games against themselves. But, what's the problem with this approach?

Well, until you are training an agent for a simulator, you can gather as much experience as you want. The problem arises when you are running the agents in an environment as slow and complex as the world you live in. In this case, you cannot wait hundreds of years before seeing some interesting capabilities. So, can we develop an algorithm that uses fewer interactions with the real environment? Yes. And, as you probably remember, we already tackled this question in model-free algorithms.

The solution was to use off-policy algorithms. However, the gains were relatively marginal and not substantial enough for many real-world problems.

As you might expect, the answer (or at least one possible answer) is in model-based reinforcement learning algorithms. You have already developed a model-based algorithm. Do you remember which one? In Chapter 3, Solving Problems with Dynamic Programming, we used a model of the environment in conjunction with dynamic programming to train an agent to navigate a map with pitfalls. And because DP uses a model of the environment, it is considered a model-based algorithm.

Unfortunately, DP isn't usable in moderate or complex problems. So, we need to explore other types of model-based algorithms that can scale up and be useful in more challenging environments. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset