Evolutionary strategies and genetic algorithms

Recently, a decades-old optimization algorithm for reinforcement learning algorithms has come back into fashion. Evolutionary strategies (ES) are much simpler than Q-learning or A2C.

Instead of training one model through backpropagation, in ES we create a population of models by adding random noise to the weights of the original model. We then let each model run in the environment and evaluate its performance. The new model is the performance-weighted average of all the models.

In the following diagram, you can see a visualization of how evolution strategies work:

Evolutionary strategies and genetic algorithms

Evolutionary strategy

To get a better grip on how this works, consider the following example. We want to find a vector that minimizes the mean squared error to a solution vector. The learner is not given the solution, but only the total error as a reward signal:

solution = np.array([0.5, 0.1, -0.3])
def f(w):
  reward = -np.sum(np.square(solution - w))
  return reward

A key advantage of evolutionary strategies is that they have fewer hyperparameters. In this case, we need just three:

npop =   50 #1
sigma = 0.1 #2
alpha = 0.1 #3
  1. Population size: We will create 50 versions of the model at each iteration
  2. Noise standard deviation: The noise we add will have mean of zero and a standard deviation of 0.1
  3. Learning rate: Weights don't just simply get set to the new average but are slowly moved in the direction to avoid overshooting

The optimization algorithm will look like the following code:

w = np.random.randn(3)                          #1
for i in range(300):                            #2
  N = np.random.randn(npop, 3) * sigma          #3 
  R = np.zeros(npop)
  for j in range(npop):                         #4
    w_try = w + N[j]
    R[j] = f(w_try)

  A = (R - np.mean(R)) / np.std(R)              #5
  w = w + alpha * np.dot(N.T, A)/npop           #6

Genetic optimization is relatively short in code, so let's go through it:

  1. We start off with a random solution.
  2. Just like with the other RL algorithm, we train for a number of epochs, here 300.
  3. We create a noise matrix of 50 noise vectors with a mean of zero and a standard deviation of sigma.
  4. We now create and immediately evaluate our population by adding noise to the original weights and running the resulting vector through the evaluation function.
  5. We standardize the rewards by subtracting the mean and dividing by the standard deviation. The result can be interpreted as an advantage, in this case, that a particular member of the population has over the rest.
  6. Finally, we add the weighted average noise vector to the weight solution. We use a learning rate to slow down the process and avoid overshooting.

Similar to neural networks themselves, evolutionary strategies are loosely inspired by nature. In nature, species optimize themselves for survival using natural selection. Researchers have come up with many algorithms to imitate this process. The preceding neural evolution strategy algorithm works not only for single vectors but for large neural networks as well. Evolutionary strategies are still a field of active research, and at the time of writing, no best practice has been settled on.

Reinforcement learning and evolutionary strategies are the go-to techniques if no supervised learning is possible, but a reward signal is available. There are many applications in the financial industry where this is the case from simple "multi-armed bandit" problems, such as the DHL order routing system, to complex trading systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset