Frontiers of RL

You have now seen the theory behind and application of the most useful RL techniques. Yet, RL is a moving field. This book cannot cover all of the current trends that might be interesting to practitioners, but it can highlight some that are particularly useful for practitioners in the financial industry.

Multi-agent RL

Markets, by definition, include many agents. Lowe and others, 2017, Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (see https://arxiv.org/abs/1706.02275), shows that reinforcement learning can be used to train agents that cooperate, compete, and communicate depending on the situation.

Multi-agent RL

Multiple agents (in red) working together to chase the green dots. From the OpenAI blog.

In an experiment, Lowe and others let agents communicate by including a communication vector into the action space. The communication vector that one agent outputted was then made available to other agents. They showed that the agents learned to communicate to solve a task. Similar research showed that agents adopted collaborative or competitive strategies based on the environment.

In a task where the agent had to collect reward tokens, agents collaborated as long as plenty of tokens were available and showed competitive behavior as tokens got sparse. Zheng and others, 2017, MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence (see https://arxiv.org/abs/1712.00600), scaled the environment to include hundreds of agents. They showed that agents developed more complex strategies such as an encirclement attack on other agents through a combination of RL algorithms and clever reward shaping.

Foerster and others, 2017, Learning with Opponent-Learning Awareness (see https://arxiv.org/abs/1709.04326), developed a new kind of RL algorithm that allows the agent to learn how another agent will behave and develop actions to influence the other agent.

Learning how to learn

A shortcoming of deep learning is that skilled humans have to develop neural networks. Because of that, one longstanding dream of researchers and companies who are currently having to pay Ph.D. students is to automate the process of designing neural networks.

One example of this so-called AutoML is the neural evolution of augmenting topologies, known as the NEAT algorithm. NEAT uses an evolutionary strategy to design a neural network that is then trained by standard backpropagation:

Learning how to learn

A network developed by the NEAT algorithm

As you can see in the preceding diagram, the networks developed by NEAT are often smaller than traditional, layer-based neural networks. They are hard to come up with. This is the strength of AutoML; it can find effective strategies that humans would not have discovered.

An alternative to using evolutionary algorithms for network design is to use reinforcement learning, which yields similar results. There are a couple "off-the-shelf" AutoML solutions:

For the subfield of hyperparameter search, there are a few packages available as well:

AutoML is still an active field of research, but it holds great promise. Many firms struggle to use machine learning due to a lack of skilled employees. If machine learning could optimize itself, more firms could start using machine learning.

Understanding the brain through RL

The other emerging field in finance and economics is behavioral economics. More recently, reinforcement learning has been used to understand how the human brain works. Wang and others, in 2018, published a paper titled, Prefrontal cortex as a meta-reinforcement learning system (see http://dx.doi.org/10.1038/s41593-018-0147-8), which provided new insights into the frontal cortex and the function of dopamine.

Similarly, Banino and others in 2018 published a report titled, Vector-based navigation using grid-like representations in artificial agents (see https://doi.org/10.1038/s41586-018-0102-6), where they replicated so-called "grid cells" that allow mammals to navigate using reinforcement learning.

The method is similar because both papers train RL algorithms on tasks related to the area of research, for example, navigation. They then examine the learned weights of the model for emergent properties. Such insight can be used to create more capable RL agents but also to further the field of neuroscience.

As the world of economics gets to grips with the idea that humans are not rational, but irrational in predictable ways, understanding the brain becomes more important when understanding economics. The results of neuroeconomics are particularly relevant to finance as they deal with how humans act under uncertainty and deal with risk, such as why humans are loss averse. Using RL is a promising avenue to yield further insight into human behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset