Examples of learning algorithms

Let's now merge the theoretical content presented so far together into simple examples of learning algorithms. In this chapter, we are going to explore a couple of learning algorithms in single layer neural networks; multiple layers will be covered in the next chapter.

In the Java code, we will create one new superclass LearningAlgorithm in a new package edu.packt.neural.learn. Another useful package called edu.packt.neural.data will be created to handle datasets that will be processed by the neural network, namely the classes NeuralInputData, and NeuralOutputData, both referenced by the NeuralDataSet class. We recommend the reader takes a glance at the code documentation to understand how these classes are organized, to save text space here.

The LearningAlgorithm class has the following attributes and methods:

public abstract class LearningAlgorithm {
    protected NeuralNet neuralNet;
    public enum LearningMode {ONLINE,BATCH};
    protected enum LearningParadigm {SUPERVISED,UNSUPERVISED};
//…
    protected int MaxEpochs=100;
    protected int epoch=0;
    protected double MinOverallError=0.001;
    protected double LearningRate=0.1;
    protected NeuralDataSet trainingDataSet;
    protected NeuralDataSet testingDataSet;
    protected NeuralDataSet validatingDataSet;
    public boolean printTraining=false;
    public abstract void train() throws NeuralException;
    public abstract void forward() throws NeuralException;
    public abstract void forward(int i) throws NeuralException;
    public abstract Double calcNewWeight(int layer,int input,int neuron) throws NeuralException;
    public abstract Double calcNewWeight(int layer,int input,int neuron,double error) throws NeuralException;
//…
}

The neuralNet object is a reference to the neural network that will be trained by this learning algorithm. The enums define the learning mode and learning paradigm. The learning executing parameters are defined (MaxEpochs, MinOverallError, LearningRate), and the datasets that will be taken into account during the learning process.

The method train( ) should be overridden by each learning algorithm implementation. All the training process will occur in this method. The methods forward( ) and forward(int k) process the neural network with all input data and with the kth input data record, respectively. And finally, the method calcNewWeight( ) will perform the weight update for the weight connecting an input to a neuron in a specific layer. A variation in the calcNewWeight( ) method allows providing a specific error to be taken in the update operation.

The delta rule

This algorithm updates the weights according to the cost function. Following the gradient approach, one wants to know which weights can drive the cost function to a lower value. Note that we can find the direction by computing the partial derivative of the cost function to each of the weights. To help in understanding, let's consider one simple approach with only one neuron, one weight, and one bias, and therefore one input. The output will be as follows:

The delta rule

Here, g is the activation function, X is the vector containing x values, and Y is the output vector generated by the neural network. The general error for the kth sample is quite simple:

The delta rule

However, it is possible to define this error as square error, N-degree error, or MSE. But, for simplicity, let's consider the simple error difference for the general error. Now the overall error, that will be the cost function, should be computed as follows:

The delta rule

The weight and bias are updated according to the delta rule, that considers the partial derivatives The delta rule with respect to the weight and the bias, respectively. For the batch training mode, X and E are vectors:

The delta rule

If the training mode is online, we don't need to perform dot product:

The delta rule

The learning rate

Note in the preceding equations the presence of the term α that indicates the learning rate. It plays an important role in weight update, because it can drive faster or slower to the minimum cost value. Let's see a cost error surface in relation to two weights:

The learning rate

Implementing the delta rule

We will implement the delta rule in a class called DeltaRule, that will extend the LearningAlgorithm class:

public class DeltaRule extends LearningAlgorithm {
    public ArrayList<ArrayList<Double>> error;
    public ArrayList<Double> generalError;
    public ArrayList<Double> overallError;
    public double overallGeneralError;
    public double degreeGeneralError=2.0;
    public double degreeOverallError=0.0;
    public enum ErrorMeasurement {SimpleError, SquareError,NDegreeError,MSE}
    
    public ErrorMeasurement generalErrorMeasurement=ErrorMeasurement.SquareError;
    public ErrorMeasurement overallErrorMeasurement=ErrorMeasurement.MSE;
    private int currentRecord=0;
    private ArrayList<ArrayList<ArrayList<Double>>> newWeights;
//…
}

The errors discussed in the error measurement section (general and overall errors) are implemented in the DeltaRule class, because the delta rule learning algorithm considers these errors during the training. They are arrays because there will be a general error for each dataset record, and there will be an overall error for each output. An attribute overallGeneralError takes on the cost function result, or namely the overall error for all outputs and records. A matrix called error, stores the errors for each output record combination.

This class also allows multiple ways of calculating the overall and general errors. The attributes generalErrorMeasurement and overallErrorMeasurement can take on one of the input values for simple error, square error calculation, Nth degree error (cubic, quadruple, and so on), or the MSE. The default will be simple error for the general error and MSE for the overall.

Two important attributes are worth noting in this code: currentRecord refers to the index of the record being fed into the neural network during training, and the newWeights cubic matrix is a collection of all new values of weights that will be updated in the neural network. The currentRecord attribute is useful in the online training, and the newWeights matrix helps the neural network to keep all of its original weights until all new weights calculation is finished, preventing new weights to be updated during the forward processing stage, what could compromise the training quality significantly.

The core of the delta rule learning - train and calcNewWeight methods

To save space, we will not detail here the implementation of the forward methods. As described in the previous section, forward means that neural dataset records should be fed into the neural network and then the error values are calculated:

@Override
public void train() throws NeuralException{
//…
  switch(learningMode){
    case BATCH: //this is the batch training mode
      epoch=0;
      forward(); //all data are presented to the neural network
      while(epoch<MaxEpochs && overallGeneralError>MinOverallError){ //continue condition
        epoch++; //new epoch                       
        for(int j=0;j<neuralNet.getNumberOfOutputs();j++){
          for(int i=0;i<=neuralNet.getNumberOfInputs();i++){
            //here the new weights are calculated
            newWeights.get(0).get(j).set(i,calcNewWeight(0,i,j));
          }
        }
//only after all weights are calculated, they are applied
        applyNewWeights();
// the errors are updated with the new weights
        forward();
      }
      break;
    case ONLINE://this is the online training
      epoch=0;
      int k=0;
      currentRecord=0; //this attribute is used in weight update
      forward(k); //only the k-th record is presented
      while(epoch<MaxEpochs && overallGeneralError>MinOverallError){
        for(int j=0;j<neuralNet.getNumberOfOutputs();j++){
          for(int i=0;i<=neuralNet.getNumberOfInputs();i++){
            newWeights.get(0).get(j).set(i,calcNewWeight(0,i,j));
          }
        }
//the new weights will be considered for the next record
        applyNewWeights();
        currentRecord=++k;
        if(k>=trainingDataSet.numberOfRecords){
          k=0; //if it was the last record, again the first
          currentRecord=0;
          epoch++; //epoch completes after presenting all records
        }
        forward(k); //presenting the next record
      }
    break;
    }
  }

We note that in the train( ) method, there is a loop with a condition to continue training. This means that while the training will stop when this condition no longer holds true. The condition checks the epoch number and the overall error. When the epoch number reaches the maximum or the error reaches the minimum, the training is finished. However, there are some cases in which the overall error fails to meet the minimum requirement, and the neural network needs to stop training.

The new weight is calculated using the calcNewWeight( ) method:

@Override
public Double calcNewWeight(int layer,int input,int neuron)
            throws NeuralException{
//…
  Double deltaWeight=LearningRate;
  Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron);
  switch(learningMode){
    case BATCH: //Batch mode
      ArrayList<Double> derivativeResult=currNeuron.derivativeBatch(trainingDataSet.getArrayInputData());
      ArrayList<Double> _ithInput;
      if(input<currNeuron.getNumberOfInputs()){ // weights
        _ithInput=trainingDataSet.getIthInputArrayList(input);
      }
      else{ // bias
        _ithInput=new ArrayList<>();
        for(int i=0;i<trainingDataSet.numberOfRecords;i++){
          _ithInput.add(1.0);
        }
      }
      Double multDerivResultIthInput=0.0; // dot product
      for(int i=0;i<trainingDataSet.numberOfRecords;i++){
        multDerivResultIthInput+=error.get(i).get(neuron)*derivativeResult.get(i)*_ithInput.get(i);
      }
      deltaWeight*=multDerivResultIthInput;
    break;
    case ONLINE:
      deltaWeight*=error.get(currentRecord).get(neuron);
      deltaWeight*=currNeuron.derivative(neuralNet.getInputs());
      if(input<currNeuron.getNumberOfInputs()){
        deltaWeight*=neuralNet.getInput(input);
      }
      break;
  }
return currNeuron.getWeight(input)+deltaWeight;
//…
}

Note that in the weight update, there a call to the derivative of the activation function of the given neuron. This is needed to meet the delta rule. In the activation function interface, we've added this method derivative( ) to be overridden in each of the implementing classes.

Note

Note: For the batch mode the call to the derivativeBatch( ), that receives and returns an array of values, instead of a single scalar.

In the train( ) method, we've seen that new weights are stored in the newWeights attribute, to not influence the current learning process, and are only applied after the training iteration has finished.

Another learning algorithm - Hebbian learning

In the 1940s, the neuropsychologist Donald Hebb postulated that the connections between neurons that activate or fire simultaneously, or using his words, repeatedly or persistently, should be increased. This is one approach of unsupervised learning, since no target output is specified for Hebbian learning:

Another learning algorithm - Hebbian learning

In summary, the weight update rule for Hebbian learning takes into account only the input and outputs of the neuron. Given a neuron j whose connection to neuron i (weight ij) is to be updated, the update is given by the following equation:

Another learning algorithm - Hebbian learning

Here, α is a learning rate, oj is the output of the neuron j, and oi is the output of the neuron i, also the input i for the neuron j. For the batch training case, oi and oj will be vectors, and we'll need to perform a dot product.

Since we don't include error measurement in Hebbian learning, a stop condition can be determined by the maximum number of epochs or the increase in the overall average of neural outputs. Given N records, we compute the expectancy or average of all outputs produced by the neural network. When this average increases over a certain level, it is time to stop the training, to prevent the neural outputs from blowing up.

We'll develop a new class for Hebbian learning, also inheriting from LearningAlgorithm:

public class Hebbian extends LearningAlgorithm {
//…
    private ArrayList<ArrayList<ArrayList<Double>>> newWeights;
    private ArrayList<Double> currentOutputMean;
    private ArrayList<Double> lastOutputMean;
}

All parameters except for the absent error measures and the new measures of mean are identical to the DeltaRule class. The methods are quite similar, except for the calcNewWeight( ):

@Override
public Double calcNewWeight(int layer,int input,int neuron)
         throws NeuralException{
//…
  Double deltaWeight=LearningRate;
  Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron);
  switch(learningMode){
    case BATCH:
//…
//the batch case is analogous to the implementation in Delta Rule
//but with the neuron's output instead of the error
//we're suppressing here to save space
      break;
    case ONLINE:
      deltaWeight*=currNeuron.getOutput();
      if(input<currNeuron.getNumberOfInputs()){
        deltaWeight*=neuralNet.getInput(input);
      }
      break;
    }
    return currNeuron.getWeight(input)+deltaWeight;
  }

Adaline

Adaline is an architecture standing for Adaptive Linear Neuron, developed by Bernard Widrow and Ted Hoff, based on the McCulloch, and Pitts neuron. It has only one layer of neurons and can be trained similarly to the delta rule. The main difference lies in the fact that the update rule is given by the error between the weighted sum of inputs and biases and the target output, instead of updating based on the neuron output after the activation function. This may be desirable when one wants to perform continuous learning for classification problems, which tend to use discrete values instead of continuous.

The following figure illustrates how Adaline learns:

Adaline

So the weights are updated by the following equation:

Adaline

In order to implement Adaline, we create a class called Adaline with the following overridden weight calcNewWeight. To save space, we're presenting only the online case:

@Override
public Double calcNewWeight(int layer,int input,int neuron)
            throws NeuralException{
//…
  Double deltaWeight=LearningRate;
  Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron);
  switch(learningMode){
    case BATCH:
//…
    break;
    case ONLINE:
      deltaWeight*=error.get(currentRecord).get(neuron)
        *currNeuron.getOutputBeforeActivation();
      if(input<currNeuron.getNumberOfInputs()){
        deltaWeight*=neuralNet.getInput(input);
      }
    break;
  }
  return currNeuron.getWeight(input)+deltaWeight;
}

Note the method getOutputBeforeActivation( ); we mentioned in the last chapter that this property would be useful in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset