How to explore

A very effective method that can be used when dealing with such situations is called -greedy exploration. It is about acting randomly with probability  while acting greedily (that means choosing the best action) with probability . For example, if , on average, for every 10 actions, the agent will act randomly 8 times. 

To avoid exploring too much in later stages when the agent is confident about its knowledge,  can decrease over time. This strategy is called epsilon-decay. With this variation, an initial stochastic policy will gradually converge to a deterministic and, hopefully, optimal policy.

There are many other exploration techniques (such as Boltzmann exploration) that are more accurate, but they are also quite complicated, and for the purpose of this chapter, -greedy is a perfect choice.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset