Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

How to explore

A very effective method that can be used when dealing with such situations is called -greedy exploration. It is about acting randomly with probability while acting greedily (that means choosing the best action) with probability . For example, if , on average, for every 10 actions, the agent will act randomly 8 times.

To avoid exploring too much in later stages when the agent is confident about its knowledge, can decrease over time. This strategy is called epsilon-decay. With this variation, an initial stochastic policy will gradually converge to a deterministic and, hopefully, optimal policy.

There are many other exploration techniques (such as Boltzmann exploration) that are more accurate, but they are also quite complicated, and for the purpose of this chapter, -greedy is a perfect choice.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for How&#xA0;to explore

Create new playlist

Sign In

Sign Up

Table of Contents for
How to explore