Value functions

The return provides a good insight into the trajectory's value, but still, it doesn't give any indication of the quality of the single states visited. This quality indicator is important because it can be used by the policy to choose the next best action. The policy has to just choose the action that will result in the next state with the highest quality. The value function does exactly this: it estimates the quality in terms of the expected return from a state following a policy. Formally, the value function is defined as follows:

The action-value function, similar to the value function, is the expected return from a state but is also conditioned on the first action. It is defined as follows:

The value function and action-value function are also called the V-function and Q-function respectively, and are strictly correlated with each other since the value function can also be defined in terms of the action-value function:

Knowing the optimal , the optimal value function is as follows:

That's because the optimal action is .

Table of Contents for Value functions

Create new playlist

Sign In

Sign Up

Table of Contents for
Value functions