Input selection

One of the key tasks in designing a neural network application is to select appropriate inputs. For the unsupervised case, one wishes to use only relevant variables on which the neural network will find the patterns. And for the supervised case, there is a need to map the outputs to the inputs, so one needs to choose only the input variables which somewhat have influence on the output.

Data correlation

One strategy that helps in selecting good inputs in the supervised case is the correlation between data series, which is implemented in Chapter 5, Forecasting Weather. A correlation between data series is a measure of how one data sequence reacts or influences the other. Suppose we have one dataset containing a number of data series, from which we choose one to be an output. Now we need to select the inputs from the remaining variables.

The correlation takes values from -1 to 1, where values near to +1 indicate a positive correlation, values near -1 indicate a negative correlation, and values near 0 indicate no correlation at all.

As an example, let's see three charts of two variables X and Y:

Data correlation

In the first chart, to the left, visually one can see that as one variable decreases, the other increases its value (corr. -0.8). The middle chart shows the case when the two variables vary in the same direction, therefore positive correlation (corr. +0.7). The third chart, to the right, shows a case where there is no correlation between the variables (corr. -0.1).

There is no threshold rule as to which correlation should be taken into account as a limit; it depends on the application. While absolute correlation values greater than 0.5 may be suitable for one application, in others, values near 0.2 may add a significant contribution.

Transforming data

Linear correlation is very good in detecting behaviors between data series when they are presumably linear. However, if two data series form a parable when plotted together, linear correlation won't be able to identify any relation. That's why sometimes we need to transform data into a view that exhibits a linear correlation.

Data transformation depends on the problem that is being faced. It consists of inserting an additional data series with processed data from one or more data series. One example is an equation (possibly nonlinear) that includes one or more parameters. Some behaviors are more detectable under a transformed view of the data.

Tip

Data transformation also involves a bit of knowledge about the problem. However, by seeing the scatter plot of two data series, it becomes easier to choose which transformation to apply.

Dimensionality reduction

Another interesting point is regarding removing redundant data. Sometimes this is desired when there is a lot of available data in both unsupervised and supervised learning. As an example, let's see a chart of two variables:

Dimensionality reduction

It can be seen that both X and Y variables share the same shape, so this can be interpreted as a redundancy, as both variables are carrying almost the same information due the high positive correlation. Thus, one can consider a technique called Principal Component Analysis (PCA) which gives a good approach for dealing with these cases.

The result of PCA will be a new variable summarizing the previous two (or more). Basically, the original data series are subtracted by the mean and then multiplied by the transposed eigenvectors of the covariance matrix:

Dimensionality reduction

Here, SXY is the covariance between the variables X and Y.

The derived new data will be then:

Dimensionality reduction

Let's see now what a new variable would look like in a chart, compared to the original ones:

Dimensionality reduction

In our framework, we are going to add the class PCA that will perform this transformation and preprocessing before applying the data into a neural network:

public class PCA {
    
  DataSet originalDS;
  int numberOfDimensions;
  DataSet reducedDS;
    
  DataNormalization normalization = new DataNormalization(DataNormalization.NormalizationTypes.ZSCORE);
    
  public PCA(DataSet ds,int dimensions){
    this.originalDS=ds;
    this.numberOfDimensions=dimensions;
  }
    
  public DataSet reduceDS(){
    //matrix algebra to calculate transformed data in lower dimension
    …
  }

  public DataSet reduceDS(int numberOfDimensions){
    this.numberOfDimensions = numberOfDimensions;
    return reduceDS;
  }

}

Data filtering

Noisy data and bad data are also sources of problems in neural network applications; that's why we need to filter data. One of the common data filtering techniques can be performed by excluding the records that exceed the usual range. For example, temperature values are between -40 and 40, so a value such as 50 would be considered an outlier and could be taken out.

The 3-sigma rule is a good and effective measure for filtering. It consists in filtering the values that are beyond three times the standard deviation from the mean:

Data filtering

Let's add a class to deal with data filtering:

public abstract class DataFiltering {
    
  DataSet originalDS;
  DataSet filteredDS;

}

public class ThreeSigmaRule extends DataFiltering {
    
  double thresholdDistance = 3.0;

  public ThreeSigmaRule(DataSet ds,double threshold){
    this.originalDS=ds;
    this.thresholdDistance=threshold;
  }
    
  public DataSet filterDS(){
    //matrix algebra to calculate the distance of each point in each column
    …
  }

}

These classes can be called in DataSet by the following methods, which are then called elsewhere for filtering and reducing dimensionality:

public DataSet applyPCA(int dimensions){
  PCA pca = new PCA(this,dimensions);
  return pca.reduceDS();
}
    
public DataSet filter3Sigma(double threshold){
  ThreeSigmaRule df = new ThreeSigmaRule(this,threshold);
  return df.filterDS();
}

Cross-validation

Among a number of strategies for validating a neural network, one very important one is cross-validation. This strategy ensures that all data has been presented to the neural network as training and test data. The dataset is partitioned into K groups, of which one is separated for testing while the others are for training:

Cross-validation

In our code, let's create a class called CrossValidation to manage cross-validation:

public class CrossValidation {
  NeuralDataSet dataSet;
  int numberOfFolds;
    
  public LearningAlgorithm la;
    
  double[] errorsMSE;
    
  public CrossValidation(LearningAlgorithm _la,NeuralDataSet _nds,int _folds){
    this.dataSet=_nds;
    this.la=_la;
    this.numberOfFolds=_folds;
    this.errorsMSE=new double[_folds];
  }
    
  public void performValidation() throws NeuralException{
    //shuffle the dataset
    NeuralDataSet shuffledDataSet = dataSet.shuffle();
    int subSize = shuffledDataSet.numberOfRecords/numberOfFolds;
    NeuralDataSet[] foldedDS = new NeuralDataSet[numberOfFolds];
    for(int i=0;i<numberOfFolds;i++){
            foldedDS[i]=shuffledDataSet.subDataSet(i*subSize,(i+1)*subSize-1);
    }
    //run the training
    for(int i=0;i<numberOfFolds;i++){
      NeuralDataSet test = foldedDS[i];
      NeuralDataSet training = foldedDS[i==0?1:0];
      for(int k=1;k<numberOfFolds;k++){
        if((i>0)&&(k!=i)){
          training.append(foldedDS[k]);
        }
        else if(k>1) training.append(foldedDS[k]);
      }
      la.setTrainingDataSet(training);
      la.setTestingDataSet(test);
      la.train();
      errorsMSE[i]=la.getMinOverallError();
    }
  }
}

Structure selection

To choose an adequate structure for a neural network is also a very important step. However, this is often done empirically, since there is no rule on how many hidden units a neural network should have. The only measure of how many units are adequate is the neural network performance. One assumes that if the general error is low enough, then the structure is suitable. Nevertheless, there might be a smaller structure that could yield the same result.

In this context, there are basically two methodologies: constructive and pruning. The constructive consists in starting with only the input and output layers, then adding new neurons at a hidden layer, until a good result can be obtained. The destructive approach, also known as pruning, works on a bigger structure on which the neurons having few contributions to the output are taken out.

The constructive approach is depicted in the following figure:

Structure selection

Pruning is the way back: when given a high number of neurons, one wishes to prune those whose sensitivity is very low, that is, whose contribution to the error is minimal:

Structure selection

To implement pruning, we`ve added the following properties in the class NeuralNet:

public class NeuralNet{
//…
  public Boolean pruning;
  public double senstitityThreshold;
}

A method called removeNeuron in the class NeuralLayer, which actually sets all the connections of the neuron to zero, disables weight updating and fires only zero at the neuron`s output. This method is called if the property pruning of the NeuralNet object is set to true. The sensitivity calculation is according to the chain rule, as shown in Chapter 3, Perceptrons and Supervised Learning and implemented in the calcNewWeigth method:

@Override
public Double calcNewWeight(int layer,int input,int neuron){
  Double deltaWeight=calcDeltaWeight(layer,input,neuron);
  if(this.neuralNet.pruning){
    if(deltaWeight<this.neuralNet.sensitivityThreshold)
      neuralNet.getHiddenLayer(layer).remove(neuron);
  }
  return newWeights.get(layer).get(neuron).get(input)+deltaWeight;
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset