Let's apply our expertise in genetic algorithms to evaluate different strategies to trade securities using trading signals. Knowledge in trading strategies is not required to understand the implementation of a GA. However, you may want to get familiar with the foundation and terminology of technical analysis of securities and financial markets, as described briefly in the Technical analysis section in the Appendix A, Basic Concepts.
The problem is to find the best trading strategy to predict the increase or decrease of the price of a security given a set of trading signals. A trading strategy is defined as a set of trading signals tsj that are triggered or fired when a variable x = {xj}, derived from financial metrics such as the price of the security or the daily or weekly trading volume, either exceeds or equals or is below a predefined target value αj (refer to the Trading signals and strategy section in the Appendix A, Basic Concepts).
The number of variables that can be derived from price and volume can be very large. Even the most seasoned financial professionals face two challenges, which are as follows:
Alternative to GA
The problem described earlier can certainly be solved using one of the machine learning algorithms introduced in the previous chapters. It is just a matter of defining a training set and formulating the problem as minimizing the loss function between the predictor and the training score.
The following table lists the trading classes with their counterpart in the genetic world:
Generic classes |
Corresponding securities trading classes |
---|---|
Operator |
|
Gene |
|
Chromosome |
|
Population |
|
A chromosome is the genetic encoding of a trading strategy. A factory class, StrategyFactory
, assembles the components of a trading strategy: operators, unfitness function, and signals
Let's extend the Operator
trait with the SOperator
class to define the operations that we need to trigger the signals. The SOperator
instance has a single parameter: its identifier, _id
. The class overrides the id()
method to retrieve the ID (similarly, the class overrides the apply
method to convert an ID into an SOperator
instance):
class SOperator(_id: Int) extends Operator { override def id: Int = _id override def apply(idx: Int): SOperator = new SOperator(idx) }
The operators used by trading signals are the logical operators: < (LESS_THAN
), > (GREATER_THAN
), and = (EQUAL
), as follows:
object LESS_THAN extends SOperator(1) object GREATER_THAN extends SOperator(2) object EQUAL extends SOperator(3)
Each operator of the SOperator
type is associated with a scoring function by the operatorFuncMap
map. The scoring function computes the cost (or unfitness) of the signal against a real value or a time series:
val operatorFuncMap = Map[Operator, (Double,Double) =>Double](
LESS_THAN -> ((x: Double, target: Double) => target - x),
… )
The select
method of Population
computes the cost
value of a signal by quantifying the truthfulness of the predicate. For instance, the unfitness value for a trading signal, x > 10, is penalized as 5 – 10 = -5 for x = 5 and credited as 14 – 10 = 4 if x = 14. In this case, the unfitness value is similar to the cost or loss in a discriminative machine learning algorithm.
Let's consider the following trading strategy defined as a set of two signals to predict the sudden relative decrease Δp of the price of a security:
Let's take a look at the following graphs:
As the goal is to model a sudden crash in the stock price, we should reward the trading strategies that predict the steep decrease in the stock price and penalize the strategies that work well only with a small decrease or increase in the stock price. In the case of the trading strategy with two signals, relative volume vm and relative volatility vl, n trading sessions, the cost or unfitness function C, and given a relative variation of the stock price and a penalization w = -Δp (M2):
Let's subclass the Gene
class to define the trading signal of the Signal
type as follows:
class Signal(id: String, target: Double, op: Operator, xt: DblVector, weights: Option[DblVector] = None) (implicit quantize: Quantization, encoding: Encoding) extends Gene(id, target, op)
The Signal
class requires the following arguments:
id
for the featuretarget
valueop
operatorxt
time series of the DblVector
typeweights
associated with each data point of the time series, xt
quantize
encoding
schemeThe main purpose of the Signal
class is to compute its score
as a chromosome. The chromosome updates its cost
by summing the score or weighted score of the signals it contains. The score of the trading signal is simply the summation of the penalty or truthfulness of the signal for each entry of the time series, ts
:
override def score: Double =
if(!operatorFuncMap.contains(op)) Double.MaxValue
else {
val f = operatorFuncMap.get(op).get
if( weights != None ) xt.zip(weights.get)
.map{case(x, w) => w*f(x,target)}.sum
else xt.map( f(_, target)).sum
}
A trading strategy is an unordered list of trading signals. It makes sense to create a factory class to generate the trading strategies. The StrategyFactory
class creates strategies of the List[Signal]
type from an existing pool of signals of the subtype, Gene
:
The StrategyFactory
class has two arguments: the number of signals, nSignals
, in a trading strategy and the implicit Quantization
and Encoding
instances (line 63
):
class StrategyFactory(nSignals: Int) //63 (implicit quantize: Quantization, encoding: Encoding){ val signals = new ListBuffer[Signal] lazy val strategies: Pool[Signal] //64 def += (id: String, target: Double, op: SOperator, xt: DblVector, weights: DblVector) … }
The +=
method takes five arguments: the identifier id
, the target
value, the op
operation to qualify the class as Gene
, the xt
times series for scoring the signals, and the weights
associated with the overall cost function. The StrategyFactory
class generates all possible sequences of signals as trading strategies as lazy values to avoid unnecessary regeneration of the pool on demand (line 64
), as follows:
lazy val strategies: Pool[Signal] = { implicit val ordered = Signal.orderedSignals //70 val xss = new Pool[Signal] //65 val treeSet = new TreeSet[Signal] ++= signals.toList //66 val subsetsIterator = treeSet.subsets(nSignals) //67 while( subsetsIterator.hasNext) { val signalList = subsetsIterator.next.toList //68 xss.append(Chromosome[Signal](signalList)) //69 } xss }
The implementation of the strategies
value creates a pool of signals Pool
(line 65
) by converting the list of signals to treeset
(line 66
). It breaks down the tree set into unique subtrees of nSignals
nodes each. It instantiates a subsetsIterator
iterator to traverse the sequence of subtrees (line 67
) and converts them into a list (line 68
) as arguments of the new chromosome (trading strategy) (line 69
). The procedure to order the signals, orderedSignals,
in the tree set has to be implicitly defined (line 70
) as val orderedSignals = Ordering.by((signal: Signal) => signal.id)
.
The encoding of trading predicates is the most critical element of the genetic algorithm. In our example, we encode a predicate as a tuple (target value, operator). Let's consider the simple predicate volatility > 0.62. The discretization converts the value 0.62 into 32 bits for the instance and a 2-bit representation for the operator:
IEEE-732 encoding
The threshold value for predicates is converted into an integer (the Int
type or Long
). The IEEE-732 binary representation of floating point values makes the bit addressing required to apply genetic operators quite challenging. A simple conversion consists of the following:
encoding e: (x: Double) => (x*100000).toInt
decoding d: (x: Int) => x*1e-5
All values are normalized, so there is no risk of overflowing the 32-bit representation.
The goal is to evaluate which trading strategy was the most relevant (fittest) during the crash of the stock market in fall 2008. Let's consider the stock price of one of the financial institutions, Goldman Sachs, as a proxy of the sudden market decline:
Besides the variation of the price of the stock between two consecutive trading sessions (dPrice
), the model uses the following parameters (or trading signals):
dVolume
: This is the relative variation of the volume between two consecutive trading sessionsdVolatility
: This is the relative variation of volatility between two consecutive trading sessionsvolatility
: This is the relative volatility within a trading sessionvPrice
: This is the relative difference of the stock opening and closing priceThe naming convention for the trading data and metrics is described in the Trading data section under Technical analysis in the Appendix A, Basic Concepts.
The execution of the genetic algorithm requires the following steps:
The input to the genetic algorithm is the population of trading strategies. Each strategy consists of the combination of three trading signals and each trading signal is a tuple (signal ID, operator, and target value).
The first step is to extract the model parameters as illustrated for the variation of the stock price volume, volatility, and relative volatility between two consecutive trading sessions (line 71
):
Import YahooFinancials._ val NUM_SIGNALS = 3 def createStrategies: Try[Pool[Signal]] = { val src = DataSource(path, false, true, 1) //71 for { //72 price <- src.get(adjClose) dPrice <- delta(price, -1.0) volume <- src.get(volume) dVolume <- delta(volume, 1.0) volatility <- src.get(volatility) dVolatility <- delta(volatility, 1.0) vPrice = src.get(vPrice) } yield { //72 val factory = new StrategyFactory(NUM_SIGNALS) //73 val weights = dPrice //74 factory += ("dvolume", 1.1, GREATER_THAN, dVolume, weights) factory += ("volatility", 1.3, GREATER_THAN, volatility.drop(1), weights) factory += ("vPrice", 0.8, LESS_THAN, vPrice.drop(1), weights) factory += ("dVolatility", 0.9, GREATER_THAN, dVolatility, weights) factory.strategies } }
The purpose is to generate the initial population of strategies that compete to become relevant to the decline of the price of stocks of Goldman Sachs. The initial population of trading strategies is generated by creating a combination from four trading signals weighted by the variation in the stock price: ∆(volume) > 1.1, ∆(volatility) > 1.3, ∆(close-open) < 0.8, and volatility > 0.9.
The delta
method computes the variation of a trading variable between consecutive trading sessions. It invokes the XTSeries.zipWithShift
method, which was introduced in the Time series in Scala section in Chapter 3, Data Preprocessing:
def delta(xt: DblVector, a: Double): Try[DblVector] = Try { zipWithShift(xt, 1).map{case(x, y) => a*(y/x - 1.0)} }
The trading strategies are generated by the StrategyFactory
class introduced in the previous section (line 73
). The weights
for the trading strategies are computed as the dPrice
difference of the price of the stock between two consecutive trading sessions (line 74
). The option of unweighted trading strategies is selected by replacing the weights
by the average price variation as follows:
val avWeights = dPrice.sum/dPrice.size
val weights = Vector.fill(dPrice.size)(avWeights)
The generation of the initial population of trading strategies is illustrated in the following diagram:
The configuration parameters for the execution of the genetic algorithm is categorized as follows:
The four configuration parameters for the GA are the maximum number of reproduction cycles (MAX_CYCLES
) allowed in the execution, the crossover (XOVER
), the mutation ratio (MU
), and the soft limit function (softLimit
) to control the population growth. The soft limit is implemented as a linearly decreasing function of the number of cycles (n
) to retrain the growth of the population as the execution of the genetic algorithm progresses:
val XOVER = 0.8 //Probability(ratio) for cross-over val MU = 0.4 //Probability(ratio) for mutation val MAX_CYCLES = 400 //Max. number of optimization cycles val CUTOFF_SLOPE = -0.003 //Slope linear soft limit val CUTOFF_INTERCEPT = 1.003 //Intercept linear soft limit val softLimit = (n: Int) => CUTOFF_SLOPE*n +CUTOFF_INTERCEPT
The trading strategies are converted into chromosomes through encoding
(line 75
). A digitize
quantization scheme has to be implicitly defined in order to encode the target value in each trading signal (line 76
):
implicit val encoding = defaultEncoding //75 val R = 1024 //Quantization ratio implicit val digitize = new Quantization(R) //76
The scoring
function computes the cost
or unfitness of a trading strategy (chromosome) by applying the score
function to each of the three trading signals (genes) it contains (line 77
):
val scoring = (chr: Chromosome[Signal]) => { val signals: List[Gene] = chr.code chr.cost = signals.map(_.score).sum //77 }
The trading strategies
generated by the factory in the createStrategies
method are fed to the genetic algorithm as the initial
population (line 79
). The upper limit
to the population growth is set at eight times the size of the initial population (line 78
):
createStrategies.map(strategies => { val limit = strategies.size <<3 //78 val initial = Population[Signal](limit, strategies) //79 val config = GAConfig(XOVER, MU, MAX_CYCLES,softLimit) //80 val solver = GASolver[Signal](config,scoring,Some(tracker)) //81 (solver |> initial) .map(_.fittest.map(_.symbolic).getOrElse("NA")) match { case Success(results) => show(results) case Failure(e) => error("GAEval: ", e) } //82 })
The configuration, config
(line 80
), the scoring function, and optionally a tracker function are all that you need to create and execute the solver
genetic algorithm (line 81
). The partial function generated by the |>
operator transforms the initial
population of trading strategies into the two fittest
strategies (line 82
).
The documented source code for the monitoring function, tracker, and miscellaneous methods is available online.
The cost function C (or unfitness) score of each trading strategy are weighted for the rate of decline of the price of the Goldman Sachs stock. Let's run the following two tests:
The score is weighted by the variation of the price of the stock GS. The test uses three different sets of crossover and mutation ratios: (0.6, 0.2), (0.3, 0.1), and (0.2, 0.6). The best trading strategy for each scenario is as follows:
The fittest trading strategy for each case does not differ much from the initial population for one or several of the following reasons:
The execution of the genetic algorithm with cross-over = 0.2 and mutation = 0.6 produces a trading strategy that is inconsistent with the first two cases. One possible explanation is the fact that the crossover is applied always to the first of the three genes, forcing the optimizer to converge toward a local minimum.
Let's examine the behavior of the genetic algorithm during execution. We are particularly interested in the convergence of the average chromosome unfitness score. The average chromosome unfitness is the ratio of the total unfitness score for the population over the size of the population. Let's take a look at the following graph:
The GA converges quite quickly and then stabilizes. The size of the population increases through crossover and mutation operations until it reaches the maximum of 256 trading strategies. The soft limit or constraint on the population size kicks in after 23 trading cycles. The test is run again with different values of crossover and mutation ratios, as shown in the following graph:
The profile of the execution of the genetic algorithm is not overly affected by the different values of crossover and mutation ratios. The chromosome unfitness score for the high crossover ratio (0.6) oscillates as the execution progresses. In some cases, the unfitness score between chromosomes is so small that the GA recycles the same few trading strategies.
The quick decline in the unfitness of the chromosomes is consistent with the fact that some of the fittest strategies were part of the initial population. It should, however, raise some concerns that the GA locked on a local minimum early on.
The execution of a test that is similar to the previous one with the unweighted trading strategies (trading strategies that use the average price variation) scoring formula produces some interesting results, as shown in the following graph:
The profile for the size of the population is similar to the test using weighted scoring. However, the chromosome average cost pattern is somewhat linear. The unweighted (or averaging) adds the rate of decline of the stock price to the score (cost).
The complexity of a scoring function
The complexity of the scoring (or computation of the cost) formula increases the odds of the genetic algorithm not converging properly. The possible solutions to the convergence problem are as follows: