Value functions

The return  provides a good insight into the trajectory's value, but still, it doesn't give any indication of the quality of the single states visited. This quality indicator is important because it can be used by the policy to choose the next best action. The policy has to just choose the action that will result in the next state with the highest quality. The value function does exactly this: it estimates the quality in terms of the expected return from a state following a policy. Formally, the value function is defined as follows:

The action-value function, similar to the value function, is the expected return from a state but is also conditioned on the first action. It is defined as follows:

The value function and action-value function are also called the V-function and Q-function respectively, and are strictly correlated with each other since the value function can also be defined in terms of the action-value function:

Knowing the optimal , the optimal value function is as follows:

That's because the optimal action is .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset