The value function

The value function represents the long-term quality of a state. This is the cumulative reward that is expected in the future if the agent starts from a given state. If the reward measures the immediate performance, the value function measures the performance in the long run. This means that a high reward doesn't imply a high-value function and a low reward doesn't imply a low-value function. 

Moreover, the value function can be a function of the state or of the state-action pair. The former case is called a state-value function, while the latter is called an action-value function:

Here, the diagram shows the final state values (on the left side) and the corresponding optimal policy (on the right side).

Using the same gridworld example used to illustrate the concept of policy, we can show the state-value function. First of all, we can assume a reward of 0 in each situation except for when the agent reaches the star, gaining a reward of +1. Moreover, let's assume that a strong wind moves the agent in another direction with a probability of 0.33. In this case, the state values will be similar to those shown in the left-hand side of the preceding diagram. An optimal policy will choose the actions that will bring it to the next state with the highest state value, as shown in the right-hand side of the preceding diagram.

Action-value methods (or value-function methods) are the other big family of RL algorithms. These methods learn an action-value function and use it to choose the actions to take. Starting from Chapter 3, Solving Problems with Dynamic Programming, you'll learn more about these algorithms. It's worth noting that some policy-gradient methods, in order to combine the advantages of both methods, can also use a value function to learn the appropriate policy. These methods are called actor-critic methods. The following diagram shows the three main families of RL algorithms:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset