Value iteration

Value iteration is the other dynamic programming algorithm to find optimal values in an MDP, but unlike policy iterations that execute policy evaluations and policy iterations in a loop, value iteration combines the two methods in a single update. In particular, it updates the value of a state by selecting the best action immediately:

The code for value iteration is even simpler than the policy iteration code, summarized in the following pseudocode: 

Initialize  for every state 

while is not stable:
> value iteration
for each state s:


> compute the optimal policy:

The only difference is in the new value estimation update and in the absence of a proper policy iteration module. The resulting optimal policy is as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset