Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Frontiers of RL

You have now seen the theory behind and application of the most useful RL techniques. Yet, RL is a moving field. This book cannot cover all of the current trends that might be interesting to practitioners, but it can highlight some that are particularly useful for practitioners in the financial industry.

Multi-agent RL

Markets, by definition, include many agents. Lowe and others, 2017, Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (see https://arxiv.org/abs/1706.02275), shows that reinforcement learning can be used to train agents that cooperate, compete, and communicate depending on the situation.

Multiple agents (in red) working together to chase the green dots. From the OpenAI blog.

In an experiment, Lowe and others let agents communicate by including a communication vector into the action space. The communication vector that one agent outputted was then made available to other agents. They showed that the agents learned to communicate to solve a task. Similar research showed that agents adopted collaborative or competitive strategies based on the environment.

In a task where the agent had to collect reward tokens, agents collaborated as long as plenty of tokens were available and showed competitive behavior as tokens got sparse. Zheng and others, 2017, MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence (see https://arxiv.org/abs/1712.00600), scaled the environment to include hundreds of agents. They showed that agents developed more complex strategies such as an encirclement attack on other agents through a combination of RL algorithms and clever reward shaping.

Foerster and others, 2017, Learning with Opponent-Learning Awareness (see https://arxiv.org/abs/1709.04326), developed a new kind of RL algorithm that allows the agent to learn how another agent will behave and develop actions to influence the other agent.

Learning how to learn

A shortcoming of deep learning is that skilled humans have to develop neural networks. Because of that, one longstanding dream of researchers and companies who are currently having to pay Ph.D. students is to automate the process of designing neural networks.

One example of this so-called AutoML is the neural evolution of augmenting topologies, known as the NEAT algorithm. NEAT uses an evolutionary strategy to design a neural network that is then trained by standard backpropagation:

A network developed by the NEAT algorithm

As you can see in the preceding diagram, the networks developed by NEAT are often smaller than traditional, layer-based neural networks. They are hard to come up with. This is the strength of AutoML; it can find effective strategies that humans would not have discovered.

An alternative to using evolutionary algorithms for network design is to use reinforcement learning, which yields similar results. There are a couple "off-the-shelf" AutoML solutions:

tpot (https://github.com/EpistasisLab/tpot): This is a data science assistant that optimizes machine learning pipelines using genetic algorithms. It is built on top of scikit-learn, so it does not create deep learning models but models useful for structured data, such as random forests.
auto-sklearn (https://github.com/automl/auto-sklearn): This is also based on scikit-learn but focuses more on creating models rather than feature extraction.
AutoWEKA (https://github.com/automl/autoweka): This is similar to auto-sklearn, except that it is built on the WEKA package, which runs on Java.
H2O AutoML (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html): This is an AutoML tool that is part of the H2O software package, which provides model selection and ensembling.
Google Cloud AutoML (https://cloud.google.com/automl/): This is currently focused on pipelines for computer vision.

For the subfield of hyperparameter search, there are a few packages available as well:

Hyperopt (https://github.com/hyperopt/hyperopt): This package allows for distributed, asynchronous hyperparameter search in Python.
Spearmint (https://github.com/HIPS/Spearmint): This package is similar to Hyperopt, optimizing hyperparameters but using a more advanced Bayesian optimization process.

AutoML is still an active field of research, but it holds great promise. Many firms struggle to use machine learning due to a lack of skilled employees. If machine learning could optimize itself, more firms could start using machine learning.

Understanding the brain through RL

The other emerging field in finance and economics is behavioral economics. More recently, reinforcement learning has been used to understand how the human brain works. Wang and others, in 2018, published a paper titled, Prefrontal cortex as a meta-reinforcement learning system (see http://dx.doi.org/10.1038/s41593-018-0147-8), which provided new insights into the frontal cortex and the function of dopamine.

Similarly, Banino and others in 2018 published a report titled, Vector-based navigation using grid-like representations in artificial agents (see https://doi.org/10.1038/s41586-018-0102-6), where they replicated so-called "grid cells" that allow mammals to navigate using reinforcement learning.

The method is similar because both papers train RL algorithms on tasks related to the area of research, for example, navigation. They then examine the learned weights of the model for emergent properties. Such insight can be used to create more capable RL agents but also to further the field of neuroscience.

As the world of economics gets to grips with the idea that humans are not rational, but irrational in predictable ways, understanding the brain becomes more important when understanding economics. The results of neuroeconomics are particularly relevant to finance as they deal with how humans act under uncertainty and deal with risk, such as why humans are loss averse. Using RL is a promising avenue to yield further insight into human behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Frontiers of RL

Create new playlist

Sign In

Sign Up

Frontiers of RL

Multi-agent RL

Learning how to learn

Understanding the brain through RL

Table of Contents for
Frontiers of RL