GA for trading strategies

Let's apply our expertise in genetic algorithms to evaluate different strategies to trade securities using trading signals. Knowledge in trading strategies is not required to understand the implementation of a GA. However, you may want to get familiar with the foundation and terminology of technical analysis of securities and financial markets, as described briefly in the Technical analysis section in the Appendix A, Basic Concepts.

The problem is to find the best trading strategy to predict the increase or decrease of the price of a security given a set of trading signals. A trading strategy is defined as a set of trading signals tsj that are triggered or fired when a variable x = {xj}, derived from financial metrics such as the price of the security or the daily or weekly trading volume, either exceeds or equals or is below a predefined target value αj (refer to the Trading signals and strategy section in the Appendix A, Basic Concepts).

The number of variables that can be derived from price and volume can be very large. Even the most seasoned financial professionals face two challenges, which are as follows:

  • Selecting a minimal set of trading signals that are relevant to a given dataset (minimize a cost or unfitness function)
  • Turning those trading signals with heuristics derived from personal experience and expertise

Note

Alternative to GA

The problem described earlier can certainly be solved using one of the machine learning algorithms introduced in the previous chapters. It is just a matter of defining a training set and formulating the problem as minimizing the loss function between the predictor and the training score.

The following table lists the trading classes with their counterpart in the genetic world:

Generic classes

Corresponding securities trading classes

Operator

SOperator

Gene

Signal

Chromosome

Strategy

Population

StrategiesFactory

Definition of trading strategies

A chromosome is the genetic encoding of a trading strategy. A factory class, StrategyFactory, assembles the components of a trading strategy: operators, unfitness function, and signals

Trading operators

Let's extend the Operator trait with the SOperator class to define the operations that we need to trigger the signals. The SOperator instance has a single parameter: its identifier, _id. The class overrides the id() method to retrieve the ID (similarly, the class overrides the apply method to convert an ID into an SOperator instance):

class SOperator(_id: Int) extends Operator {
  override def id: Int = _id
  override def apply(idx: Int): SOperator = new SOperator(idx) 
}

The operators used by trading signals are the logical operators: < (LESS_THAN), > (GREATER_THAN), and = (EQUAL), as follows:

object LESS_THAN extends SOperator(1) 
object GREATER_THAN extends SOperator(2)
object EQUAL extends SOperator(3)

Each operator of the SOperator type is associated with a scoring function by the operatorFuncMap map. The scoring function computes the cost (or unfitness) of the signal against a real value or a time series:

val operatorFuncMap = Map[Operator, (Double,Double) =>Double](
  LESS_THAN -> ((x: Double, target: Double) => target - x),
  … )

The select method of Population computes the cost value of a signal by quantifying the truthfulness of the predicate. For instance, the unfitness value for a trading signal, x > 10, is penalized as 5 – 10 = -5 for x = 5 and credited as 14 – 10 = 4 if x = 14. In this case, the unfitness value is similar to the cost or loss in a discriminative machine learning algorithm.

The cost function

Let's consider the following trading strategy defined as a set of two signals to predict the sudden relative decrease Δp of the price of a security:

  • Relative volume vm with a condition vm < α
  • Relative volatility vl with the condition vl > β

Let's take a look at the following graphs:

The cost function

A chart of the price, relative volume, and relative volatility of a security

As the goal is to model a sudden crash in the stock price, we should reward the trading strategies that predict the steep decrease in the stock price and penalize the strategies that work well only with a small decrease or increase in the stock price. In the case of the trading strategy with two signals, relative volume vm and relative volatility vl, n trading sessions, the cost or unfitness function C, and given a relative variation of the stock price and a penalization w = -Δp (M2):

The cost function

Trading signals

Let's subclass the Gene class to define the trading signal of the Signal type as follows:

class Signal(id: String, target: Double, op: Operator,
   xt: DblVector, weights: Option[DblVector] = None)
   (implicit quantize: Quantization, encoding: Encoding) 
  extends Gene(id, target, op) 

The Signal class requires the following arguments:

  • An identifier id for the feature
  • A target value
  • An op operator
  • An xt time series of the DblVector type
  • The optional weights associated with each data point of the time series, xt
  • An implicit quantization instance, quantize
  • An implicit encoding scheme

The main purpose of the Signal class is to compute its score as a chromosome. The chromosome updates its cost by summing the score or weighted score of the signals it contains. The score of the trading signal is simply the summation of the penalty or truthfulness of the signal for each entry of the time series, ts:

override def score: Double = 
  if(!operatorFuncMap.contains(op)) Double.MaxValue
  else {
    val f = operatorFuncMap.get(op).get
    if( weights != None ) xt.zip(weights.get)
        .map{case(x, w) => w*f(x,target)}.sum
    else xt.map( f(_, target)).sum   
  }

Trading strategies

A trading strategy is an unordered list of trading signals. It makes sense to create a factory class to generate the trading strategies. The StrategyFactory class creates strategies of the List[Signal] type from an existing pool of signals of the subtype, Gene:

Trading strategies

A factory pattern for trading signals

The StrategyFactory class has two arguments: the number of signals, nSignals, in a trading strategy and the implicit Quantization and Encoding instances (line 63):

class StrategyFactory(nSignals: Int)  //63
    (implicit quantize: Quantization, encoding: Encoding){
  val signals = new ListBuffer[Signal]   
  lazy val strategies: Pool[Signal] //64
  def += (id: String, target: Double, op: SOperator, 
       xt: DblVector, weights: DblVector)
  …
 }

The += method takes five arguments: the identifier id, the target value, the op operation to qualify the class as Gene, the xt times series for scoring the signals, and the weights associated with the overall cost function. The StrategyFactory class generates all possible sequences of signals as trading strategies as lazy values to avoid unnecessary regeneration of the pool on demand (line 64), as follows:

lazy val strategies: Pool[Signal] = {
  implicit val ordered = Signal.orderedSignals //70
  
  val xss = new Pool[Signal] //65
  val treeSet = new TreeSet[Signal] ++= signals.toList //66
  val subsetsIterator = treeSet.subsets(nSignals) //67

  while( subsetsIterator.hasNext) {
    val signalList = subsetsIterator.next.toList  //68
    xss.append(Chromosome[Signal](signalList)) //69
  } 
  xss
}

The implementation of the strategies value creates a pool of signals Pool (line 65) by converting the list of signals to treeset (line 66). It breaks down the tree set into unique subtrees of nSignals nodes each. It instantiates a subsetsIterator iterator to traverse the sequence of subtrees (line 67) and converts them into a list (line 68) as arguments of the new chromosome (trading strategy) (line 69). The procedure to order the signals, orderedSignals, in the tree set has to be implicitly defined (line 70) as val orderedSignals = Ordering.by((signal: Signal) => signal.id).

Trading signal encoding

The encoding of trading predicates is the most critical element of the genetic algorithm. In our example, we encode a predicate as a tuple (target value, operator). Let's consider the simple predicate volatility > 0.62. The discretization converts the value 0.62 into 32 bits for the instance and a 2-bit representation for the operator:

Trading signal encoding

Encoding of the trading signal: volatility > 0.62

Note

IEEE-732 encoding

The threshold value for predicates is converted into an integer (the Int type or Long). The IEEE-732 binary representation of floating point values makes the bit addressing required to apply genetic operators quite challenging. A simple conversion consists of the following:

  • encoding e: (x: Double) => (x*100000).toInt
  • decoding d: (x: Int) => x*1e-5

All values are normalized, so there is no risk of overflowing the 32-bit representation.

A test case

The goal is to evaluate which trading strategy was the most relevant (fittest) during the crash of the stock market in fall 2008. Let's consider the stock price of one of the financial institutions, Goldman Sachs, as a proxy of the sudden market decline:

A test case

A sudden decrease in Goldman-Sachs stock price in Sept – Nov 2008

Besides the variation of the price of the stock between two consecutive trading sessions (dPrice), the model uses the following parameters (or trading signals):

  • dVolume: This is the relative variation of the volume between two consecutive trading sessions
  • dVolatility: This is the relative variation of volatility between two consecutive trading sessions
  • volatility: This is the relative volatility within a trading session
  • vPrice: This is the relative difference of the stock opening and closing price

The naming convention for the trading data and metrics is described in the Trading data section under Technical analysis in the Appendix A, Basic Concepts.

The execution of the genetic algorithm requires the following steps:

  1. Extraction of model parameters or variables.
  2. Generation of the initial population of trading strategies.
  3. Setting up the GA configuration parameters with the maximum number of reproduction cycles allowed, the crossover and mutation ratio, and the soft limit function for the population growth.
  4. Instantiating the GA algorithm with the scoring/unfitness function.
  5. Extracting the fittest trading strategy that can best explain the sharp decline in the price of Goldman Sachs stocks.

Creating trading strategies

The input to the genetic algorithm is the population of trading strategies. Each strategy consists of the combination of three trading signals and each trading signal is a tuple (signal ID, operator, and target value).

The first step is to extract the model parameters as illustrated for the variation of the stock price volume, volatility, and relative volatility between two consecutive trading sessions (line 71):

Import YahooFinancials._
val NUM_SIGNALS = 3

def createStrategies: Try[Pool[Signal]] = {
  val src = DataSource(path, false, true, 1) //71
  for {  //72
    price <- src.get(adjClose)
    dPrice <- delta(price, -1.0) 
    volume <- src.get(volume)
    dVolume <- delta(volume, 1.0)
    volatility <- src.get(volatility)
    dVolatility <- delta(volatility, 1.0)
    vPrice = src.get(vPrice)
  } yield { //72
    val factory = new StrategyFactory(NUM_SIGNALS) //73

    val weights = dPrice  //74
    factory += ("dvolume", 1.1, GREATER_THAN, dVolume, weights)
    factory += ("volatility", 1.3, GREATER_THAN, 
       volatility.drop(1), weights)
    factory += ("vPrice", 0.8, LESS_THAN, 
       vPrice.drop(1), weights)
    factory += ("dVolatility", 0.9, GREATER_THAN, 
       dVolatility, weights)
    factory.strategies
   }
}

The purpose is to generate the initial population of strategies that compete to become relevant to the decline of the price of stocks of Goldman Sachs. The initial population of trading strategies is generated by creating a combination from four trading signals weighted by the variation in the stock price: ∆(volume) > 1.1, ∆(volatility) > 1.3, ∆(close-open) < 0.8, and volatility > 0.9.

The delta method computes the variation of a trading variable between consecutive trading sessions. It invokes the XTSeries.zipWithShift method, which was introduced in the Time series in Scala section in Chapter 3, Data Preprocessing:

def delta(xt: DblVector, a: Double): Try[DblVector] = Try {
  zipWithShift(xt, 1).map{case(x, y) => a*(y/x - 1.0)}
}

The trading strategies are generated by the StrategyFactory class introduced in the previous section (line 73). The weights for the trading strategies are computed as the dPrice difference of the price of the stock between two consecutive trading sessions (line 74). The option of unweighted trading strategies is selected by replacing the weights by the average price variation as follows:

val avWeights = dPrice.sum/dPrice.size
val weights = Vector.fill(dPrice.size)(avWeights)

The generation of the initial population of trading strategies is illustrated in the following diagram:

Creating trading strategies

A design for the generation of the initial population of trading strategies

Configuring the optimizer

The configuration parameters for the execution of the genetic algorithm is categorized as follows:

  • Tuning parameters such as crossover, mutation ratio, or soft limit on the population growth
  • Data representation parameters such as quantization and encoding
  • A scoring scheme

The four configuration parameters for the GA are the maximum number of reproduction cycles (MAX_CYCLES) allowed in the execution, the crossover (XOVER), the mutation ratio (MU), and the soft limit function (softLimit) to control the population growth. The soft limit is implemented as a linearly decreasing function of the number of cycles (n) to retrain the growth of the population as the execution of the genetic algorithm progresses:

val XOVER = 0.8  //Probability(ratio) for cross-over
val MU = 0.4  //Probability(ratio) for mutation
val MAX_CYCLES = 400  //Max. number of optimization cycles 

val CUTOFF_SLOPE = -0.003  //Slope linear soft limit
val CUTOFF_INTERCEPT = 1.003  //Intercept linear soft limit
val softLimit = (n: Int) => CUTOFF_SLOPE*n +CUTOFF_INTERCEPT

The trading strategies are converted into chromosomes through encoding (line 75). A digitize quantization scheme has to be implicitly defined in order to encode the target value in each trading signal (line 76):

implicit val encoding = defaultEncoding //75
val R = 1024  //Quantization ratio
implicit val digitize = new Quantization(R) //76

The scoring function computes the cost or unfitness of a trading strategy (chromosome) by applying the score function to each of the three trading signals (genes) it contains (line 77):

val scoring = (chr: Chromosome[Signal]) => {
  val signals: List[Gene] = chr.code
  chr.cost = signals.map(_.score).sum //77
}

Finding the best trading strategy

The trading strategies generated by the factory in the createStrategies method are fed to the genetic algorithm as the initial population (line 79). The upper limit to the population growth is set at eight times the size of the initial population (line 78):

createStrategies.map(strategies => {
  val limit = strategies.size <<3 //78
  val initial = Population[Signal](limit, strategies) //79

  val config = GAConfig(XOVER, MU, MAX_CYCLES,softLimit) //80
  val solver = GASolver[Signal](config,scoring,Some(tracker)) 
                                                //81
  (solver |> initial)
    .map(_.fittest.map(_.symbolic).getOrElse("NA")) match {
      case Success(results) => show(results)
      case Failure(e) => error("GAEval: ", e)
    } //82
})

The configuration, config (line 80), the scoring function, and optionally a tracker function are all that you need to create and execute the solver genetic algorithm (line 81). The partial function generated by the |> operator transforms the initial population of trading strategies into the two fittest strategies (line 82).

The documented source code for the monitoring function, tracker, and miscellaneous methods is available online.

Tests

The cost function C (or unfitness) score of each trading strategy are weighted for the rate of decline of the price of the Goldman Sachs stock. Let's run the following two tests:

  • Evaluation of the configuration of the genetic algorithm with the score weighted by the price variation
  • Evaluation of the genetic algorithm with an unweighted scoring function

The weighted score

The score is weighted by the variation of the price of the stock GS. The test uses three different sets of crossover and mutation ratios: (0.6, 0.2), (0.3, 0.1), and (0.2, 0.6). The best trading strategy for each scenario is as follows:

  • 0.6-0.2: change < 0.82 dVolume > 1.17 volatility > 1.35 cost= 0.0 fitness: 1.0E10
  • 0.3-0.1: change < 0.42 dVolume > 1.61 volatility > 1.08 cost= 59.18 fitness: 0.016
  • 0.2-0.6: change < 0.87 dVolume < 8.17 volatility > 3.91 cost= 301.3 fitness: 0.003

The fittest trading strategy for each case does not differ much from the initial population for one or several of the following reasons:

  • The initial guess for the trading signals was good
  • The size of the initial population is too small to generate genetic diversity
  • The test does not take into account the rate of decline of the stock price

The execution of the genetic algorithm with cross-over = 0.2 and mutation = 0.6 produces a trading strategy that is inconsistent with the first two cases. One possible explanation is the fact that the crossover is applied always to the first of the three genes, forcing the optimizer to converge toward a local minimum.

Let's examine the behavior of the genetic algorithm during execution. We are particularly interested in the convergence of the average chromosome unfitness score. The average chromosome unfitness is the ratio of the total unfitness score for the population over the size of the population. Let's take a look at the following graph:

The weighted score

The convergence of a genetic algorithm for the crossover ratio 0.2 and mutation 0.6 with a weighted score

The GA converges quite quickly and then stabilizes. The size of the population increases through crossover and mutation operations until it reaches the maximum of 256 trading strategies. The soft limit or constraint on the population size kicks in after 23 trading cycles. The test is run again with different values of crossover and mutation ratios, as shown in the following graph:

The weighted score

The impact of the crossover and mutation ratio on the convergence of a genetic algorithm with a weighted score

The profile of the execution of the genetic algorithm is not overly affected by the different values of crossover and mutation ratios. The chromosome unfitness score for the high crossover ratio (0.6) oscillates as the execution progresses. In some cases, the unfitness score between chromosomes is so small that the GA recycles the same few trading strategies.

The quick decline in the unfitness of the chromosomes is consistent with the fact that some of the fittest strategies were part of the initial population. It should, however, raise some concerns that the GA locked on a local minimum early on.

The unweighted score

The execution of a test that is similar to the previous one with the unweighted trading strategies (trading strategies that use the average price variation) scoring formula produces some interesting results, as shown in the following graph:

The unweighted score

The convergence of a genetic algorithm for the crossover ratio 0.4 and mutation 0.4 with an unweighted score

The profile for the size of the population is similar to the test using weighted scoring. However, the chromosome average cost pattern is somewhat linear. The unweighted (or averaging) adds the rate of decline of the stock price to the score (cost).

Note

The complexity of a scoring function

The complexity of the scoring (or computation of the cost) formula increases the odds of the genetic algorithm not converging properly. The possible solutions to the convergence problem are as follows:

  • Make the weighting function additive (less complex)
  • Increase the size and diversity of the initial population
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset