Implementation

As mentioned earlier, the genetic operators are independent of the problem to be solved. Let's implement all the components of the reproduction cycle. The fitness function and the encoding scheme are highly domain specific.

In accordance with the principles of object-oriented programming, the software architecture defines the genetic operators using a top-down approach: starting with the population, then each chromosome, and down to each gene.

Software design

The implementation of the genetic algorithm uses a design that is similar to the template for classifiers (refer to the Design template for classifier section in the Appendix A, Basic Concepts).

The key components of the implementation of the genetic algorithm are as follows:

  • The Population class defines the current set of solution candidates or chromosomes.
  • The GASolver class implements the GA solver and has two components: a configuration object of the GAConfig type and the initial population. This class implements an explicit monadic data transformation of the ETransform type.
  • The GAConfig configuration class consists of the GA execution and reproduction configuration parameters.
  • The reproduction (of the Reproduction type) controls the reproduction cycle between consecutive generations of chromosomes through the mate method.
  • The GAMonitor monitoring trait tracks the progress of the optimization and evaluates the exit condition for each reproduction cycle.

The following UML class diagram describes the relation between the different components of the genetic algorithm:

Software design

The UML class diagram of genetic algorithm components

Let's start with defining the key classes that control the genetic algorithm.

Key components

The Population parameterized class (with the Gene subtype) contains the set or pool of chromosomes. A population contains chromosomes that are a sequence or list of elements of the type inherited from Gene. A Pool is a mutable array used in order to avoid excessive duplication of the Chromosome instances associated with immutable collections.

Note

The case for mutability

It is a good Scala programming practice to stay away from mutable collections. However, in this case, the number of chromosomes can be very large. Most implementations of genetic algorithms update the population potentially three times per reproduction cycle, generating a large number of objects and taxing the Java garbage collector.

Population

The Population class takes two arguments:

  • limit: This is the maximum size of the population
  • chromosomes: This is the pool of chromosomes that define the current population

A reproduction cycle executes the following sequence of three genetic operators on a population: select for the selection across all the chromosomes of the population (line 1), +- for crossover of all the chromosomes (line 2), and ^ for the mutation of each chromosome (line 3). Consider the following code:

type Pool[T <: Gene] = mutable.ArrayBuffer[Chromosome[T]]

class Population[T <: Gene](
    limit: Int, val chromosomes: Pool[T]) {    
  def select(score: Chromosome[T]=>Unit, cutOff: Double) //1
  def +- (xOver: Double) //2
  def ^ (mu: Double) //3
  …
}

The limit value specifies the maximum size of the population during optimization. It defines the hard limit or constraints on the population growth.

Chromosomes

The chromosome is the second level of containment in the genotype hierarchy. The Chromosome class takes a list of genes as parameter (code). The signature of the crossover and mutation methods, +- and ^, are similar to their implementation in the Population class except for the fact that the crossover and mutable parameters are passed as indices relative to the list of genes and each gene. The section dedicated to the genetic crossover describes the GeneticIndices class:

class Chromosome[T <: Gene](val code: List[T]) {   
  var cost: Double = Random.nextDouble //4
  def +- (that: Chromosome[T], idx: GeneticIndices): 
        (Chromosome[T], Chromosome[T]) 
  def ^ (idx: GeneticIndices): Chromosome[T]
   …
}

The algorithm assigns the (un)fitting score or a cost value to each chromosome to enable the ranking of chromosomes in the population, and ultimately, the selection of the fittest chromosomes (line 4).

Note

Fitness versus cost

The machine learning algorithms use the loss function or its variant as an objective function to be minimized. This implementation of the GA uses cost scores in order to be consistent with the concept of the minimization of the cost, loss, or penalty function.

Genes

Finally, the reproduction process executes the genetic operators on each gene:

class Gene(val id: String, 
    val target: Double, 
    op: Operator)
    (implicit quantize: Quantization, encoding: Encoding){//5
  lazy val bits: BitSet = apply(target, op)

  def apply(value: Double, op: Operator): BitSet //6
  def unapply(bitSet: BitSet): (Double, Operator) //7
   …
  def +- (index: Int, that: Gene): Gene //8
  def ^ (index: Int): Unit //9
  …
}

The Gene class takes three arguments and two implicit parameters, which are as follows:

  • id: This is the identifier of the gene. It is usually the name of the variable represented by the gene.
  • target: This is the target value or threshold to be converted or discretized into a bit string.
  • op: This is the operator that is applied to the target value.
  • quantize: This is the quantization or discretization class that converts a double value to an integer to be converted into bits and vice versa (line 5).
  • encoding: This is the encoding or bits layout of the gene as a pair of values and operators.

The apply method encodes a pair of value and operator into a bit set (line 6). An unapply method is the reverse operation of apply. In this case, it decodes a bit set into a pair of value and operator (line 7).

Note

unapply()

The unapply method reverses the state transition performed by the apply method. For example, if the apply method populates a collection, the unapply method clears the collection from its elements.

The implementation of the crossover (line 8) and mutation (line 9) operators on a gene is similar to the operations on the container chromosome.

The quantization is implemented as a case class:

case class Quantization(toInt: Double => Int,
     toDouble: Int => Double) {
   def this(R: Int) =  this((x: Double) => 
         (x*R).floor.toInt, (n: Int) => n/R) 
}

The first toInt function converts a real value to an integer and toDouble converts the integer back to a real value. The discretization and inverse functions are encapsulated into a class to reduce the risk of inconsistency between the two opposite conversion functions.

The instantiation of a gene converts the predicate representation into a bit string (bits of the java.util.BitSet type) using the quantization function, Quantization.toInt.

The layout of a gene is defined by the Encoding class as follows:

class Encoding(nValueBits: Int, nOpBits: Int) {
  val rValue = Range(0, nValueBits)  
  val length = nValueBits + nOpBits
  val rOp = Range(nValueBits, length)
}

The Encoding class specifies the bits layout of the gene as a number of bits, nValueBits, to encode the value and the number of bits, nOpBits, to encode the operator. The class defines the rValue range for the value and the rOp range for the operator. The client code has to be supplied to the implicit instance of the Encoding class.

The bit set, bitset, of the gene (encoding) is implemented by using the apply method:

def apply(value: Double, op: Operator): BitSet = {
  val bitset = new BitSet(encoding.length)  
  encoding.rOp foreach(i =>  //10
    if(((op.id>>i) & 0x01)==0x01) bitset.set(i))
  encoding.rValue foreach(i => //11
    if( ((quant.toInt(value)>>i) & 0x01)==0x01) bitset.set(i))
  bitset
}

The bits layout of the gene is created using java.util.BitSet. The op operator is encoded first through its identifier, id (line 10). The value is quantized by invoking the toInt method and then encoded (line 11).

The unapply method decodes the gene from a bit set or bit string to a pair of values and operators. The method uses the quantization instance to cover bits into values and a convert auxiliary function that is described along with its implementation in the source code, accompanying the book (line 12):

def unapply(bits: BitSet): (Double, Operator) = 
  (quant.toDouble(convert(encoding.rValue, bits)), 
   op(convert(encoding.rOp, bits))) //12

The Operator trait defines the signature of any operator. Each domain-specific problem requires a unique set of operations: Boolean, numeric, or string manipulation:

trait Operator {
   def id: Int
   def apply(id: Int): Operator
}

The preceding operator has two methods: an identifier id and an apply method that converts an index to an operator.

Selection

The first genetic operator of the reproduction cycle is the selection process. The select method of the Population class implements the steps of the selection phase to the population of chromosomes in the most efficient manner, as follows:

def select(score: Chromosome[T]=> Unit, cutOff: Double): Unit = {
  val cumul = chromosomes.map( _.cost).sum/SCALING_FACTOR //13
  chromosomes foreach( _ /= cumul) //14

  val _chromosomes = chromosomes.sortWith(_.cost < _.cost)//15
  val cutOffSize = (cutOff*_chromosomes.size).floor.toInt //16
  val popSize = if(limit < cutOffSize) limit else cutOffSize

  chromosomes.clear //17
  chromosomes ++= _chromosomes.take(popSize) //18
}

The select method computes the cumul cumulative sum of the cost (line 13) for the entire population. It normalizes the cost of each chromosome (line 14), orders the population by decreasing the value (line 15), and applies a cutOff soft limit function on the population growth (line 16). The next step reduces the size of the population to the lowest of the two limits: the hard limit, limit, or the soft limit, cutOffSize. Finally, the existing chromosomes are cleared (line 17) and updated with the next generation (line 18).

Note

Even population size

The next phase in the reproduction cycle is the crossover, which requires the pairing of parent chromosomes. It makes sense to pad the population so that its size is an even integer.

The score scoring function takes a chromosome as a parameter and returns the cost value for this chromosome.

Controlling the population growth

The natural selection process controls or manages the growth of the population of species. The genetic algorithm uses the following two mechanisms:

  • The absolute maximum size of the population (the hard limit).
  • The incentive to reduce the population as the optimization progresses (the soft limit). This incentive (or penalty) on the population growth is defined by the cutOff value used during selection (the select method).

The cutoff value is computed using a softLimit user-defined function of the Int => Double type, which is provided as a configuration parameter (softLimit(cycle: Int) => a.cycle +b).

The GA configuration

The four configurations and tuning parameters required by the genetic algorithm are as follows:

  • xOver: This is the crossover ratio (or probability) and has a value in the interval [0, 1]
  • mu: This is the mutation ratio
  • maxCycles: This is the maximum number of reproduction cycles
  • softLimit: This is the soft constraint on the population growth

Consider the following code:

class GAConfig(val xover: Double, 
     val mu: Double, 
     val maxCycles: Int, 
     val softLimit: Int => Double) extends Config {
   val mutation = (cycle : Int) => softLimit(cycle)
}

Crossover

As mentioned earlier, the genetic crossover operator couples two chromosomes to generate two offspring chromosomes that compete with all the other chromosomes in the population, including their own parents, in the selection phase of the next reproduction cycle.

Population

We use the +- notation as the implementation of the crossover operator in Scala. There are several options to select pairs of chromosomes for crossover. This implementation ranks the chromosomes by their fitness (or inverse cost) value and then divides the population into two halves. Finally, it pairs the chromosomes of identical rank from each half, as illustrated in the following diagram:

Population

Pairing of chromosomes within a population prior to crossover

The crossover implementation, +-, selects the parent chromosome candidates for crossover using the pairing scheme described earlier. Consider the following code:

def +- (xOver: Double): Unit = 
  if( size > 1) {
    val mid = size>>1
    val bottom = chromosomes.slice(mid, size) //19
    val gIdx = geneticIndices(xOver)  //20

    val offSprings = chromosomes.take(mid).zip(bottom)
          .map{ case (t, b) => t +- (b, gIdx) }.unzip //21
    chromosomes ++= offSprings._1 ++ offSprings._2 //22
  }

This method splits the population into two subpopulations of equal size (line 19) and applies the Scala zip and unzip methods to generate the set of pairs of offspring chromosomes (line 20). The +- crossover operator is applied to each chromosome pair to produce an array of pairs of offSprings (line 21). Finally, the crossover method adds offspring chromosomes to the existing population (line 22). The xOver crossover value is a probability randomly generated over the interval [config.xOver, 1].

The GeneticIndices case class defines two indices of the bit whenever a crossover or a mutation occurs. The first chOpIdx index is the absolute index of the bit affected by the genetic operation in the chromosome (line 23). The second geneOpIdx index is the index of the bit within the gene subjected to crossover or mutation (line 24):

case class GeneticIndices(val chOpIdx: Int, //23
     val geneOpIdx: Int)  //24

The geneticIndices method computes the relative indices of the crossover bit in the chromosomes and genes:

def geneticIndices(prob: Double): GeneticIndices = {
  var idx = (prob*chromosomeSize).floor.toInt //25
  val chIdx = if(idx == chromosomeSize) chromosomeSize-1 
       else idx //25

  idx = (prob*geneSize).floor.toInt  
  val gIdx = if(idx == geneSize) geneSize-1 else idx //26
  GeneticIndices(chIdx, gIdx)
}

The first chIdx indexer is the index or rank of the gene within the chromosome to be affected by the genetic operator (line 25). The second gIdx indexer is the relative index of the bit within the gene (line 26).

Let's consider a chromosome composed of 2 genes with 63 bits/elements each, as illustrated in the following diagram:

Population

The geneticIndices method computes the following:

  • The chIdx index of the gene within the chromosome and the gIdx index of the bit within the gene
  • The genetic operator selects the gene of the chIdx index (that is the second gene) to be altered
  • The genetic operator alters the chromosome at the bit of the gIdx index (that is chIdx*64 + gIdx)

Chromosomes

First, we need to define the Chromosome class, which takes a list of genes, code, (for genetic code) as the parameter:

val QUANT = 500
class Chromosome[T <: Gene](val code: List[T]) {  
  var cost: Double = QUANT*(1.0 + Random.nextDouble) //27

  def +- (that: Chromosome[T], indices: GeneticIndices): //28
     (Chromosome[T], Chromosome[T]) 
  def ^ (indices: GeneticIndices): Chromosome[T] //29
  def /= (normalizeFactor: Double): Unit =   //30
      cost /= normalizeFactor
  def decode(implicit d: Gene=>T): List[T] =  //31
      code.map( d(_)) 
  …
}

The cost (or unfitness) of a chromosome is initialized as a random value between QUANT and 2*QUANT (line 27). The genetic +- crossover operator generates a pair of two offspring chromosomes (line 28). The genetic ^ mutation operator creates a slightly modified (1 or 2 bits) clone of this chromosome (line 29). The /= method normalizes the cost of the chromosome (line 30). The decode method converts the gene to a logic predicate or rule using an implicit conversion, d, between a gene and its subclass (line 31).

Note

Cost initialization

There is no absolute rule to initialize the cost of the chromosomes from an initial population. However, it is recommended that you differentiate a chromosome using nonzero random values with a large range as their cost.

The implementation of the crossover for a pair of chromosomes using hierarchical encoding follows two steps:

  1. Find the gene on each chromosome that corresponds to the indices.chOpIdx crossover index and then swap the remaining genes.
  2. Split and splice the gene crossover at xoverIdx.

Consider the following code:

def +- (that: Chromosome[T], indices: GeneticIndices): 
    (Chromosome[T], Chromosome[T]) = {
  val xoverIdx = indices.chOpIdx //32
  val xGenes =  spliceGene(indices, that.code(xoverIdx)) //33

  val offSprng1 = code.slice(0, xoverIdx) ::: 
       xGenes._1 :: that.code.drop(xoverIdx+1) //34
  val offSprng2 = that.code.slice(0, xoverIdx) ::: 
      xGenes._2 :: code.drop(xoverIdx+1)
  (Chromosome[T](offSprng1), Chromosome[T](offSprng2)) //35
}

The crossover method computes the index xoverIdx of the bit that defines the crossover in each parent chromosome (line 32). The this.code(xoverIdx) and that.code(xoverIdx) genes are swapped and spliced by the spliceGene method to generate a spliced gene (line 33):

def spliceGene(indices: GeneticIndices, thatCode: T): (T,T) ={
  ((this.code(indices.chOpIdx) +- (thatCode,indices)), 
   (thatCode +- (code(indices.chOpIdx),indices)) )
}

The offspring chromosomes are gathered by collating the first xOverIdx genes of the parent chromosome, the crossover gene, and the remaining genes of the other parent (line 34). The method returns the pair of offspring chromosomes (line 35).

Genes

The crossover is applied to a gene using the +- method of the Gene class. The exchange of bits between the this and that genes uses the BitSet Java class to rearrange the bits after the permutation:

def +- (that: Gene, indices: GeneticIndices): Gene = {
  val clonedBits = cloneBits(bits) //36

  Range(indices.geneOpIdx, bits.size).foreach(n => 
    if( that.bits.get(n) ) clonedBits.set(n) 
    else clonedBits.clear(n) //37
   )
   val valOp = decode(clonedBits) //38
   new Gene(id, valOp._1, valOp._2)
}

The bits of the gene are cloned (line 36) and then spliced by exchanging their bits along with the indices.geneOpIdx crossover point (line 37). The cloneBits function duplicates a bit string, which is then converted into a (target value, operator) tuple using the decode method (line 38). We omit these two methods because they are not critical to the understanding of the algorithm.

Mutation

The mutation of the population uses the same algorithmic approach as the crossover operation.

Population

The ^ mutation operator invokes the same operator for all the chromosomes in the population and then adds the mutated chromosomes to the existing population, so that they can compete with the original chromosomes. We use the ^ notation to define the mutation operator to remind you that the mutation is implemented by flipping one bit:

def ^ (prob: Double): Unit = 
  chromosomes ++= chromosomes.map(_ ^ geneticIndices(prob))

The prob mutation parameter is used to compute the absolute index of the mutating gene, geneticIndices(prob).

Chromosomes

The implementation of the ^ mutation operator on a chromosome consists of mutating the gene of the indices.chOpIdx index (line 39) and then updating the list of genes in the chromosome (line 40). The method returns a new chromosome (line 41) that will compete with the original chromosome:

def ^ (indices: GeneticIndices): Chromosome[T] = { //39 
  val mutated = code(indices.chOpIdx) ^ indices 
  val xs = Range(0, code.size).map(i =>
    if(i== indices.chOpIdx) mutated 
    else code(i)).toList //40
  Chromosome[T](xs) //41
}

Genes

Finally, the mutation operator flips (XOR) the bit at the indices.geneOpIdx index:

def ^ (indices: GeneticIndices): Gene = { 
  val idx = indices.geneOpIdx
  val clonedBits = cloneBits(bits) //42
  
  clonedBits.flip(idx)  //43
  val valOp = decode(clonedBits)  //44
  new Gene(id, valOp._1, valOp._2) //45
}

The ^ method mutates the cloned bit string, clonedBits, (line 42) by flipping the bit at the indices.geneOpIdx index (line 43). It decodes and converts the mutated bit string by converting it into a (target value, operator) tuple (line 44). The last step creates a new gene from the target-operator tuple (line 45).

Reproduction

Let's wrap the reproduction cycle into a Reproduction class that uses the scoring function, score:

class Reproduction[T <: Gene](score: Chromosome[T] => Unit)

The mate reproduction function implements the sequence or workflow of the three genetic operators: select for the selection, +- (xover) for the crossover, and ^ (mu) for the mutation:

def mate(population: Population[T], config: GAConfig, 
    cycle: Int): Boolean = (population.size: @switch) match {
  case 0 | 1 | 2 => false   //46
  case _ => {
    rand.setSeed(rand.nextInt + System.currentTimeMillis)
    population.select(score, config.softLimit(cycle)) //47
    population +- rand.nextDouble*config.xover //48
    population ^ rand.nextDouble*config.mu  //49
    true
  }
}

The mate method returns false (that is, the reproduction cycle aborts) if the population size is less than 3 (line 46). The chromosomes in the current population are ranked by the increasing cost. The chromosomes with the high cost or low fitness are discarded to comply with the soft limit, softLimit, on the population growth (line 47). The randomly generated probability is used as an input to the crossover operation on the entire remaining population (line 48) and as an input to the mutation of the remaining population (line 49):

Reproduction

An illustration of the linear and quadratic soft limit for the population growth

Solver

The GASolver class manages the reproduction cycles and the population of chromosomes. The solver is defined as a data transformation of the ETransform type using an explicit configuration of the GAConfig type, as described in the Monadic data transformation section in Chapter 2, Hello World! (line 50).

The GASolver class implements the GAMonitor trait to monitor the population diversity, manage the reproduction cycle, and control the convergence of the optimizer (line 51).

The genetic algorithm-based solver has the following three arguments:

  • config: This is the configuration of the execution of the genetic algorithm
  • score: This is the scoring function of a chromosome
  • tracker: This is the optional tracking function to initialize the monitoring function of GAMonitor

The code will be as follows:

class GASolver[T <: Gene](config: GAConfig, 
    score: Chromosome[T] => Unit,
    tracker: Option[Population[T] => Unit] = None) 
     extends ETransform[GAConfig](config) //50
        with GAMonitor[T] { //51

  type U = Population[T]  //52
  type V = Population[T]  //53

  val monitor: Option[Population[T] => Unit] = tracker
  def |>(initialize: => Population[T]): Try[Population[T]] = 
      this.|> (initialize()) //54
  override def |> : PartialFunction[U, Try[V]] //55
}

This explicit data transformation has to initialize the U type of an input element (line 52) and the V type of an output element (line 53) for the prediction or optimization method, |>. The optimizer takes an initial population as the input and generates a very small population of the fittest chromosomes from which the best solution is extracted (line 55).

The population is generated by the |> method ( => Population[T]) that takes the constructor of the Population class as an argument (line 54).

Let's briefly take a look at the GAMonitor monitoring trait assigned to the genetic algorithm. The trait has the following two attributes:

  • monitor: This is an abstract value to be initialized by classes that implement this trait (line 55).
  • state: This is the current state of the execution of the genetic algorithm. The initial state of the genetic algorithm is GA_NOT_RUNNING (line 56).

The code will be as follows:

trait GAMonitor[T <: Gene] extends Monitor {
  self: { 
    def |> :PartialFunction[Population[T],Try[Population[T]]] 
  } => //55
    val monitor: Option[Population[T] => Unit] //56
    var state: GAState = GA_NOT_RUNNING //57

    def isReady: Boolean = state == GA_NOT_RUNNING
    def start: Unit = state = GA_RUNNING
    def isComplete(population: Population[T], 
        remainingCycles: Int): Boolean  = { … } //58
}

The state of the genetic algorithm can only be updated in the |> method through an instance of the GAMonitor class. (line 55).

Here is a subset of the possible state of the execution of the genetic algorithm:

sealed abstract class GAState(description: String)
case class GA_FAILED(description: String) 
   extends GAState(description)
object GA_RUNNING extends GAState("Running")

The solver invokes the isComplete method to test the convergence of the optimizer at each reproduction cycle (line 58).

There are two options for estimating that the reproducing cycle is converging:

  • Greedy: In this approach, the objective is to check whether the n fittest chromosomes have not changed in the last m reproduction cycles
  • Loss function: This approach is similar to the convergence criteria for the training of supervised learning

Let's consider the following implementation of the genetic algorithm solver:

override def |> : PartialFunction[U, Try[V]] = {
  case population: U if(population.size > 1 && isReady) => {
    start //59
    val reproduction = Reproduction[T](score)  //60

    @tailrec
    def reproduce(population: Population[T], 
          n:Int): Population[T] = { //61
      if( !reproduction.mate(population, config, n) || 
         isComplete(population, config.maxCycles -n) )
       population
      else
       reproduce(population, n+1)
    }
    reproduce(population, 0)
    population.select(score, 1.0) //62
    Try(population)
  }
}

The optimizing method initializes the state of execution (line 59) and the components of the reproduction cycle (line 60). The reproduction cycle (or an epoch) is implemented as a tail recursion that tests whether the last reproduction cycle has failed or whether the optimization has converged toward a solution (line 61). Finally, the remaining fittest chromosomes are reordered by invoking the select method of the Population class (line 62).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset