Value iteration is the other dynamic programming algorithm to find optimal values in an MDP, but unlike policy iterations that execute policy evaluations and policy iterations in a loop, value iteration combines the two methods in a single update. In particular, it updates the value of a state by selecting the best action immediately:
The code for value iteration is even simpler than the policy iteration code, summarized in the following pseudocode:
Initialize for every state
while is not stable:
> value iteration
for each state s:
> compute the optimal policy:
The only difference is in the new value estimation update and in the absence of a proper policy iteration module. The resulting optimal policy is as follows: