UCB1

UCB1 belongs to the UCB family, and its contribution is in the selection of .

In UCB1, the  UCB is computed by keeping track of the number of times an action, (), has been selected, along with , and the total number of actions that are selected with , as represented in the following formula:

(12.2)

The uncertainty of an action, is thus related to the number of times it has been selected. If you think about it, this makes sense as, according to the law of large numbers, with an infinite number of trials, you'd be sure about the expected value. On the contrary, if you tried an action only a few times, you'd be uncertain about the expected reward, and only with more experience, would you be able to say whether it is a good or a bad action. Therefore, we'll incentivize the exploration of actions that have been chosen only few times, and that therefore have a high uncertainty. The main takeaway is that if  is small, meaning that the action has been experienced only occasionally, then  will be large, with an overall high uncertain estimate. However, if is large, then will be small, and the estimate will be accurate. We'll then follow  only if it has a high mean reward.

The main advantage of UCB compared to -greedy, is actually due to the counting of the actions. Indeed, the multi-armed bandit problem can be easily solved with this method, by keeping a counter for each action that is taken, and its average reward. These two pieces of information can be integrated into formula (12.1) and formula (12.2), in order to get the best action to take at time (); that is:

 (12.3)

UCB is a very powerful method for exploration, and it achieves a logarithmic expected total regret on the multi-armed bandit problem, therefore reaching an optimal trend. It is worth noting that -greedy exploration could also obtain a logarithmic regret, but it would require careful design, together with a finely-tuned exponential decay, and thus it would be harder to balance.

There are additional variations of UCB, such as UCB2, UCB-Tuned, and KL-UCB.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset