FIM and KL divergence

The FIM is defined as the covariance of an objective function. Let's look at how it can help us. To be able to limit the distance between the distributions of our model, we need to define a metric that provides the distance between the new and the old distributions. The most popular choice is to use the KL divergence. It measures how far apart two distributions are and is used in many places in RL and machine learning. The KL divergence is not a proper metric as it is not symmetric, but it is a good approximation of it. The more different two distributions, are the higher the KL divergence value. Consider the plot in the following diagram. In this example, the KL divergences are computed with respect to the green function. Indeed, because the orange function is similar to the green function, the KL divergence is 1.11, which is close to 0. Instead, it's easy to see that the blue and the green lines are quite different. This observation is confirmed by the high KL divergence between the two: 45.8. Note that the KL divergence between the same function will be always 0.

For those of you who are interested, the KL divergence for discrete probability distribution is computed as .

Let's take a look at the following diagram:

Figure 7.5. The KL divergence that's shown in the box is measured between each function and the function colored in green. The bigger the value, the farther the two functions are apart.

Thus, using the KL divergence, we are able to compare two distributions and get an indication of how they relate to each other. So, how can we use this metric in our problem and limit the divergence between two subsequent policies distribution?

It so happens that the FIM defines the local curvature in the distribution space by using the KL divergence as a metric. Thereby, we can obtain the direction and the length of the step that keeps the KL divergence distance constant by combining the curvature (second-order derivative) of the KL divergence with the gradient (first-order derivative) of the objective function (as in formula (7.1)). Thus, the update that follows from formula (7.1) will be more cautious by taking small steps along the steepest direction when the FIM is high (meaning that there is a big distance between the action distributions) and big steps when the FIM is low (meaning that there is a plateau and the distributions don't vary too much).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset