Q-Learning and SARSA Applications

Dynamic programming (DP) algorithms are effective for solving reinforcement learning (RL) problems, but they require two strong assumptions. The first is that the model of the environment has to be known, and the second is that the state space has to be small enough so that it does not suffer from the curse of dimensionality problem. 

In this chapter, we'll develop a class of algorithms that get rid of the first assumption. In addition, it is a class of algorithms that aren't affected by the problem of the curse of dimensionality of DP algorithms. These algorithms learn directly from the environment and from the experience, estimating the value function based on many returns, and do not compute the expectation of the state values using the model, in contrast with DP algorithms. In this new setting, we'll talk about experience as a way to learn value functions. We'll take a look at the problems that arise from learning a policy through mere interactions with the environment and the techniques that can be used to solve them. After a brief introduction to this new approach, you'll learn about temporal difference (TD) learning, a powerful way to learn optimal policies from experience. TD learning uses ideas from DP algorithms while using only information gained from interactions with the environment. Two temporal difference learning algorithms are SARSA and Q-learning. Though they are very similar and both guarantee convergence in tabular cases, they have interesting differences that are worth acknowledging. Q-learning is a key algorithm, and many state-of-the-art RL algorithms combined with other techniques use this method, as we will see in later chapters.

To gain a better grasp on TD learning and to understand how to move from theory to practice, you'll implement Q-learning and SARSA in a new game. Then, we'll elaborate on the difference between the two algorithms, both in terms of their performance and use.

The following topics will be covered in this chapter:

  • Learning without a model
  • TD learning
  • SARSA
  • Applying SARSA to Taxi-v2
  • Q-learning
  • Applying Q-learning to Taxi-v2
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset