Results

The following diagram shows the results. This plot presents both the learning curve of ESBAS: the complete portfolio (comprising the three neural networks that were listed previously) in the darker shade; and the learning curve of ESBAS, with only one best performing neural network (a deep neural network with two hidden layers of size 64) in orange. We know that ESBAS with only one algorithm in the portfolio will not really leverage the potential of the meta-algorithm, but we introduced it in order to have a baseline with which to compare the results. The plot speaks for itself, showing the blue line always above the orange, thus proving that ESBAS actually chooses the best available option. The unusual shape is due to the fact that we are training the DQN algorithms offline:

The performance of ESBAS with a portfolio of three algorithms in a dark shade, and with only one algorithm in a lighter shade
For all the color references mentioned in the chapter, please refer to the color images bundle at http://www.packtpub.com/sites/default/files/downloads/9781789131116_ColorImages.pdf.

Also, the spikes that you see at the start of the training, and then at around steps, 20K, 65K, and, 131K, are the points at which the policies are trained, and the meta-algorithm is reset.

We can now ask ourselves at which point in time ESBAS prefers one algorithm, compared to the others. The answer is shown in the plot of the following diagram. In this plot, the small neural network is characterized by the value 0, the medium one by the value 1, and the large by the value 2. The dots show the algorithms that are chosen on each trajectory. We can see that, right at the beginning, the larger neural network is preferred, but that this immediately changes toward the medium, and then to the smaller one. After about 64K steps, the meta-algorithm switches back to the larger neural network:

The plot shows the preferences of the meta-algorithm

From the preceding plot, we can also see that both of the ESBAS versions converge to the same values, but with very different speeds. Indeed, the version of ESBAS that leverages the true potential of AS (that is, the one with three algorithms in the portfolio) converges much faster. Both converge to the same values because, in the long run, the best neural network is the one that is used in the ESBAS version with the single option (the deep neural network with two hidden layers of size 64).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset