Levenberg–Marquardt implementation

The Levenberg–Marquardt algorithm uses many features of the backpropagation algorithm; that's why we inherited this class from backpropagation. Basically, the train function is the same, except for the following piece of code:

  for (int rows_i = 0; rows_i < rows; rows_i++) {  
    n = forward(n, rows_i);
    buildJacobianMatrix(n, rows_i);
    sumErrors = sumErrors + n.getErrorMean();    
  }
  n=updateWeights(n);

The loop, where it goes over the training dataset, calls the buildJacobianMatrix method for each data row. This method calls the original version from the inherited backpropagation method in order to compute the gradients.

As seen in the LMA theory explained earlier, the row of a Jacobian matrix contains all weights and the bias in a serial sequence. So, the corresponding columns of the weights in the Jacobian matrix can be detailed as in the following table:

Layer

Weight or bias

Position

Hidden

jth weight of the ith neuron

(i * (numberOfInputs)) + j

Output

Bias of the ith neuron

((numberOfInputs) * (numberOfHiddenNeurons - 1)) + (i * (numberOfHiddenNeurons) + numberOfHiddenNeurons)

Since the buildJacobianMatrix method is a bit similar to backpropagation, we are going to highlight the Jacobian row construction. For the weights in the hidden layer, the following line of code is called:

jacobian.setValue( row, ( i * ( numberOfInputs ) ) + j,
    ( neuron.getSensibility() *
    n.getTrainSet()[row][j] ) / nb.getErrorMean() );

We can see the sensibility of the hidden neuron being used in the gradient. Now, for the output layer, we use the following:

jacobian.setValue( row,
    ( numberOfInputs + 1 ) * ( numberOfHiddenNeurons ) +
    ( i * ( numberOfHiddenNeurons + 1 ) ) + j,
    ( output.getSensibility() * neuron.getOutputValue() ) / 
        n.getErrorMean() );

In this piece of code, the neuron object refers to the hidden neuron that precedes the output layer.

One more difference between the backpropagation and the Levenberg–Marquardt algorithm is that the weights here are updated once at an epoch, not on every data point. This is necessary because the Jacobian matrix is built using the entire dataset.

We can see in the train method that after building the Jacobian matrix, the algorithm calls the updateWeights method. In this method, the Levenberg–Marquardt matrix equation is solved, and then, the weights are added to the corresponding contribution from the delta matrix.

Solution of the Levenberg–Marquardt matrix equation:

Matrix term1 = jacobian.transpose().multiply(jacobian) 
    .add(new IdentityMatrix(jacobian.getNumberOfColumns())
    .multiply(damping));
Matrix term2 = jacobian.transpose().multiply(error);
Matrix delta = term1.inverse().multiply(term2);

Update of the jth weight of the ith neuron in the hidden layer:

newWeight = hiddenLayerInputWeights.get( i ) + 
    delta.getValue( ( i * ( numberOfInputs + 1 ) + j ) ,0 );
hidden.getListOfWeightIn().set( i, newWeight );
neuron.getListOfWeightIn().set( hidden_i, newWeight );

For the output layer:

newWeight = neuron.getListOfWeightOut().get(i) + 
    delta.getValue(  ( numberOfInputs + 1 ) * 
              ( numberOfHiddenNeurons ) +
              ( i*(numberOfHiddenNeurons+1) )+j , 0);
neuron.getListOfWeightOut().set(i, newWeight);
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset