In the paper Dueling Network Architectures for Deep Reinforcement Learning (https://arxiv.org/abs/1511.06581), a novel neural network architecture with two separate estimators was proposed: one for the state value function and the other for the state-action advantage value function.
The advantage function is used everywhere in RL and is defined as follows:
The advantage function tells us the improvement of an action, , compared to the average action in a given state, . Thus, if is a positive value, this means that the action, , is better then the average action in the state, . On the contrary, if is a negative value, this means that is worse than the average action in the state, .
Thus, estimating the value function and the advantage function separately, as done in the paper, allows us to rebuild the Q-function, like so:
(5.8)
Here, the mean of the advantage has been added to increase the stability of the DQN.
The architecture of Dueling DQN consists of two heads (or streams): one for the value function and one for the advantage function, all while sharing a common convolutional module. The authors reported that this architecture can learn which states are or are not valuable, without having to learn the absolute value of each action in a state. They tested this new architecture on the Atari games and obtained considerable improvements regarding their overall performance.