Unboxing algorithm selection

To better understand what ESBAS does, let's first focus on what algorithm selection (AS) is. In normal settings, a specific and fixed algorithm is developed and trained for a given task. The problem is that if the dataset changes over time, the dataset overfits, or another algorithm works better in some restricted contexts, there's no way of changing it. The chosen algorithm will remain the same forever. The task of algorithm selection overcomes this problem.

AS is an open problem in machine learning. It is about designing an algorithm called a meta-algorithm that always chooses the best algorithm from a pool of different options, called a portfolio, which is based on current needs. A representation of this is shown in the following diagram. AS is based on the assumption that different algorithms in the portfolio will outperform the others in different parts of the problem space. Thus, it is important to have algorithms with complementary capabilities.

For example, in the following diagram, the meta-algorithm chooses which algorithm (or agent) among those available in the portfolio (such as PPO and TD3) will act on the environment at a given moment. These algorithms are not complementary to each other, but each one provides different strengths that the meta-algorithm can choose in order to better perform in a specific situation:

Representation of an algorithm selection method for RL

For example, if the task involves designing a self-driving car that drives on all kinds of terrains, then it may be useful to train one algorithm that is capable of amazing performance on the road, in the desert, and on ice. Then, AS could intelligently choose which one of these three versions to employ in each situation. For instance, AS may find that on rainy days, the policy that has been trained on ice works better than the others.

In RL, the policy changes with a very high frequency, and the dataset increases continuously over time. This means that there can be big differences in the optimal neural network size and the learning rate between the starting point, when the agent is in an embryonic state, compared to the agent in an advanced state. For example, an agent may start learning with a high learning rate, and decrease it as more experience is accumulated. This highlights how RL is a very interesting playground for algorithm selection. For this reason, that's exactly where we'll test our AS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset