Dueling DQN

In the paper Dueling Network Architectures for Deep Reinforcement Learning (https://arxiv.org/abs/1511.06581), a novel neural network architecture with two separate estimators was proposed: one for the state value function and the other for the state-action advantage value function.

The advantage function is used everywhere in RL and is defined as follows: 

The advantage function tells us the improvement of an action, , compared to the average action in a given state, . Thus, if  is a positive value, this means that the action, , is better then the average action in the state, . On the contrary, if  is a negative value, this means that  is worse than the average action in the state, .

Thus, estimating the value function and the advantage function separately, as done in the paper, allows us to rebuild the Q-function, like so:

(5.8)

Here, the mean of the advantage has been added to increase the stability of the DQN.

The architecture of Dueling DQN consists of two heads (or streams): one for the value function and one for the advantage function, all while sharing a common convolutional module. The authors reported that this architecture can learn which states are or are not valuable, without having to learn the absolute value of each action in a state. They tested this new architecture on the Atari games and obtained considerable improvements regarding their overall performance. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset