Online retraining

During the learning process, it is important to design how the training should be performed. Two basic approaches are batch and incremental learning.

In batch learning, all the records are fed to the network, so it can evaluate the error and then update the weights.

Online retraining

In incremental learning, the update is performed after each record has been sent to the network.

Online retraining

Both approaches work well and have advantages and disadvantages. While batch learning can used for a less often, though more directed weight update, incremental learning provides a way for a finely tuned weight adjustment. In this context, it is possible to design a mode of learning that enables the network to learn continually.

Stochastic online learning

Offline learning means that the neural network learns while not in "operation." Every neural network application is supposed to work in an environment, and in order to be at production, it should be properly trained. Offline training is suitable to put the network into operation, since its outputs may vary over large ranges of values, which would certainly compromise the system, if it is in operation. However, when it comes to online learning, there are restrictions. While in offline learning, it's possible to use cross-validation and bootstrapping to predict errors, in online learning, this can't be done since there's no "training dataset" anymore. However, one would need online training when some improvement in the neural network's performance is desired.

A stochastic method is used when online learning is performed. This algorithm to improve neural network training is composed of two main features: random choice of samples for training and variation of the learning rate in runtime (online). This training method has been used when noise is found in the objective function. It helps to escape the local minimum (one of the best solutions) and to reach the global minimum (the best solution).

The pseudo algorithm is as follows:

Initialize the weights.
   Initialize the learning rate.
   Repeat the following steps:
      Randomly select one (or possibly more) case(s)
         from the population.
      Update the weights by subtracting the gradient
         times the learning rate.
      Reduce the learning rate according to an
         appropriate schedule.

Note

The Source code of the pseudo algorithm can be found at ftp://ftp.sas.com/pub/neural/FAQ2.html#A_styles.

Implementation

In the Java project, it has created the BackpropagationOnline class inside the learn package. The differences between this algorithm and classic backpropagation was programmed by changing the train() method and adding two new methods: generateIndexRandomList() and reduceLearningRate(). The first one generates a random list of indexes to be used in the training step, and the second one executes the learning rate online variation according to the following heuristic:

private double reduceLearningRate(NeuralNet n, double percentage) {
    double newLearningRate = n.getLearningRate() * 
                    ((100.0 - percentage) / 100.0);
    
    if(newLearningRate < 0.1) {
      newLearningRate = 1.0;
    }
    
    return newLearningRate;
  }

The train() method was also modified to comply with the pseudo algorithm presented earlier. The following code is the main part of this method:

ArrayList<Integer> indexRandomList = generateIndexRandomList(rows);

while(getMse() > n.getTargetError()) {
  
  if ( epoch >= n.getMaxEpochs() ) break;
  
    double sumErrors = 0.0;
    
    for (int rows_i = 0; rows_i < rows; rows_i++) {
    
      n = forward( n, indexRandomList.get(rows_i) );
      
      n = backpropagation( n, indexRandomList.get(rows_i) );
    
      sumErrors = sumErrors + n.getErrorMean();
      
      n.setLearningRate( reduceLearningRate( n, n.getLearningRatePercentageReduce() ) );
    
    }
    
    setMse( sumErrors / rows );
    
    n.getListOfMSE().add( getMse() );
    
    
    epoch++;
    
    }

Application

We have used data from previous chapters to test this new way to train neural nets. This chapter uses the same neural net topology that was defined in Chapter 5, Forecasting Weather, and Chapter 8, Pattern Recognition (OCR Case). The first one is the forecast weather problem, and the second one is the OCR. The following table shows the comparison of results.

Values

Forecast weather

OCR

Classic backpropagation learning rate

0.5

0.5

Classic backpropagation MSE value

0.2877786584

0.0011981712

On-line backpropagation learning rate

Found: Application Application 0.15

Found: Application Application 0.40

On-line backpropagation MSE value

0.4618623052

9.977909980E-6

The following graph shows the MSE evolution found after the new training method. It takes into consideration the forecast weather data. The curve has a saw shape because of the variation of the learning rate. Besides, it's very similar to the curve shown in Chapter 5, Forecasting Weather.

Application

On the other hand, the following graph was produced using the OCR data and shows that the training process was faster and stopped near the 900th epoch because it had a very small MSE error. It's important to remember that in Chapter 8, Pattern Recognition (OCR Case), the training process was slower and continued until the 6000th epoch.

Application

Other experiments were also conducted: train neural nets with the backpropagation algorithm, considering the learning rate found by using an online approach. The MSE values decreased in both problems.

The forecast weather MSE was about 0.206 against 0.287 (found in Chapter 5, Forecasting Weather). It's shown in the following figure:

Application

The OCR MSE was about 8.663E-6 against 0.001 (found in Chapter 8, Pattern Recognition (OCR Case)). It's possible to see this in the following figure:

Application

Another important observation is based on the fact that the training process shown in the preceding figure is almost terminated in the 3000th epoch. Therefore, it's faster and better than the training process discussed in Chapter 8, Pattern Recognition (OCR Case), using the same algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset