The loss function

The deep Q-network is trained by minimizing the loss function (5.2) that we have already presented, but with the further employment of a separate Q-target network, , with a weight, , putting everything together, the loss function becomes:

Here,  is the parameters of the online network.

The optimization of the differentiable loss function (5.4) is performed with our favorite iterative method, namely mini-batch gradient descent. That is, the learning update is applied to mini-batches that have been drawn uniformly from the experienced buffer. The derivative of the loss function is as follows:

Unlike the problem framed in the case of deep Q-learning, in DQN, the learning process is more stable. Furthermore, because the data is more i.i.d. and the target is (somehow) fixed, it's very similar to a regression problem. But on the other hand, the targets still depend on the network weights.

If you optimize the loss function (5.4) at each step and only on a single sample, you would obtain the Q-learning algorithm with function approximation. 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset