Neural networks in pattern recognition

For pattern recognition, the neural network architectures that can be applied are MLPs (supervised) and the Kohonen Network (unsupervised). In the first case, the problem should be set up as a classification problem, that is, the data should be transformed into the X-Y dataset, where for every data record in X there should be a corresponding class in Y. As stated in Chapter 3, Perceptrons and Supervised Learning and Chapter 6, Classifying Disease Diagnosis the output of the neural network for classification problems should have all of the possible classes, and this may require preprocessing of the output records.

For the other case, unsupervised learning, there is no need to apply labels to the output, but the input data should be properly structured. To remind you, the schema of both neural networks are shown in the next figure:

Neural networks in pattern recognition

Data pre-processing

As previously seen in Chapter 6, Classifying Disease Diagnosis and Chapter 7, Clustering Customer Profiles we have to deal with all possible types of data, i.e., numerical (continuous and discrete) and categorical (ordinal or unscaled).

However, here we have the possibility of performing pattern recognition on multimedia content, such as images and videos. So, can multimedia could be handled? The answer to this question lies in the way these contents are stored in files. Images, for example, are written with a representation of small colored points called pixels. Each color can be coded in an RGB notation where the intensity of red, green, and blue define every color the human eye is able to see. Therefore an image of dimension 100x100 would have 10,000 pixels, each one having three values for red, green and blue, yielding a total of 30,000 points. That is the challenge for image processing in neural networks.

Some methods, which we'll review in the next chapter, may reduce this huge number of dimensions. Afterwards an image can be treated as big matrix of numerical continuous values.

For simplicity, we are applying only gray-scale images with small dimensions in this chapter.

Text recognition (optical character recognition)

Many documents are now being scanned and stored as images, making it necessary to convert these documents back into text, for a computer to apply edition and text processing. However, this feature involves a number of challenges:

  • Variety of text font
  • Text size
  • Image noise
  • Manuscripts

In spite of that, humans can easily interpret and read even texts produced in a bad quality image. This can be explained by the fact that humans are already familiar with text characters and the words in their language. Somehow the algorithm must become acquainted with these elements (characters, digits, signalization, and so on), in order to successfully recognize texts in images.

Digit recognition

Although there are a variety of tools available on the market for OCR, it still remains a big challenge for an algorithm to properly recognize texts in images. So, we will be restricting our application to in a smaller domain, so that we'll face simpler problems. Therefore, in this chapter, we are going to implement a neural network to recognize digits from 0 to 9 represented on images. Also, the images will have standardized and small dimensions, for the sake of simplicity.

Digit representation

We applied the standard dimension of 10x10 (100 pixels) in gray scaled images, resulting in 100 values of gray scale for each image:

Digit representation

In the preceding image we have a sketch representing the digit 3 at the left and a corresponding matrix with gray values for the same digit, in gray scale.

We apply this preprocessing in order to represent all ten digits in this application.

Implementation in Java

To recognize optical characters, data to train and to test neural network was produced by us. In this example, digits from 0 (super black) to 255 (super white) were considered. According to pixel disposal, two versions of each digit data were created: one to train and another to test. Classification techniques presented in Chapter 3, Perceptrons and Supervised Learning and Chapter 6, Classifying Disease Diagnosis will be used here.

Generating data

Numbers from zero to nine were drawn in the Microsoft Paint ®. The images have been converted into matrices, from which some examples are shown in the following image. All pixel values between zero and nine are grayscale:

Generating data

For each digit we generated five variations, where one is the perfect digit, and the others contain noise, either by the drawing, or by the image quality.

Each matrix row was merged into vectors (Dtrain and Dtest) to form a pattern that will be used to train and test the neural network. Therefore, the input layer of the neural network will be composed of 101 neurons.

The output dataset was represented by ten patterns. Each one has a more expressive value (one) and the rest of the values are zero. Therefore, the output layer of the neural network will have ten neurons.

Neural architecture

So, in this application our neural network will have 100 inputs (for images that have a 10x10 pixel size) and ten outputs, the number of hidden neurons remaining unrestricted. We created a class called DigitExample in the package examples.chapter08 to handle this application. The neural network architecture was chosen with these parameters:

  • Neural network type: MLP
  • Training algorithm: Backpropagation
  • Number of hidden layers: 1
  • Number of neurons in the hidden layer: 18
  • Number of epochs: 1000
  • Minimum overall error: 0.001
Neural architecture

Experiments

Now, as has been done in other cases previously presented, let's find the best neural network topology training several nets. The strategy to do that is summarized in the following table:

Experiment

Learning rate

Activation Functions

#1

0.3

Hidden Layer: SIGLOG

Output Layer: LINEAR

#2

0.5

Hidden Layer: SIGLOG

Output Layer: LINEAR

#3

0.8

Hidden Layer: SIGLOG

Output Layer: LINEAR

#4

0.3

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#5

0.5

Hidden Layer: SIGLOG

Output Layer: LINEAR

#6

0.8

Hidden Layer: SIGLOG

Output Layer: LINEAR

#7

0.3

Hidden Layer: HYPERTAN

Output Layer: SIGLOG

#8

0.5

Hidden Layer: HYPERTAN

Output Layer: SIGLOG

#9

0.8

Hidden Layer: HYPERTAN

Output Layer: SIGLOG

The following DigitExample class code defines how to create a neural network to read from digit data:

// enter neural net parameter via keyboard (omitted)

// load dataset from external file (omitted)

// data normalization (omitted)

// create ANN and define parameters to TRAIN: 
Backpropagation backprop = new Backpropagation(nn, neuralDataSetToTrain, LearningAlgorithm.LearningMode.BATCH);
backprop.setLearningRate( typedLearningRate );
backprop.setMaxEpochs( typedEpochs );
backprop.setGeneralErrorMeasurement(Backpropagation.ErrorMeasurement.SimpleError);
backprop.setOverallErrorMeasurement(Backpropagation.ErrorMeasurement.MSE);
backprop.setMinOverallError(0.001);
backprop.setMomentumRate(0.7);
backprop.setTestingDataSet(neuralDataSetToTest);
backprop.printTraining = true;
backprop.showPlotError = true;

// train ANN:
try {
    backprop.forward();
    //neuralDataSetToTrain.printNeuralOutput();

    backprop.train();
    System.out.println("End of training");
    if (backprop.getMinOverallError() >= backprop.getOverallGeneralError()) {
        System.out.println("Training successful!");
    } else {
        System.out.println("Training was unsuccessful");
    }
    System.out.println("Overall Error:" + String.valueOf(backprop.getOverallGeneralError()));
    System.out.println("Min Overall Error:" + String.valueOf(backprop.getMinOverallError()));
    System.out.println("Epochs of training:" + String.valueOf(backprop.getEpoch()));

} catch (NeuralException ne) {
    ne.printStackTrace();
} 

// test ANN (omitted)

Results

After running each experiment using the DigitExample class, excluding training and testing overall errors and the quantity of right number classifications using the test data (table above), it is possible observe that experiments #2 and #4 have the lowest MSE values. The differences between these two experiments are learning rate and activation function used in the output layer.

Experiment

Training overall error

Testing overall error

# Right number classifications

#1

9.99918E-4

0.01221

2 by 10

#2

9.99384E-4

0.00140

5 by 10

#3

9.85974E-4

0.00621

4 by 10

#4

9.83387E-4

0.02491

3 by 10

#5

9.99349E-4

0.00382

3 by 10

#6

273.70

319.74

2 by 10

#7

1.32070

6.35136

5 by 10

#8

1.24012

4.87290

7 by 10

#9

1.51045

4.35602

3 by 10

The figure above shows the MSE evolution (train and test) by each epoch graphically by experiment #2. It is interesting to notice the curve stabilizes near the 30th epoch:

Results

The same graphic analysis was performed for experiment #8. It is possible to check the MSE curve stabilizes near the 200th epoch.

Results

As already explained, only MSE values might not be considered to attest neural net quality. Accordingly, the test dataset has verified the neural network generalization capacity. The next table shows the comparison between real output with noise and the neural net estimated output of experiment #2 and #8. It is possible to conclude that the neural network weights by experiment #8 can recognize seven digits patterns better than #2's:

Output comparison

Real output (test dataset)

Digit

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0

1

2

3

4

5

6

7

8

9

Estimated output (test dataset) – Experiment #2

Digit

0.20 0.26 0.09 -0.09 0.39 0.24 0.35 0.30 0.24 1.02

0.42 -0.23 0.39 0.06 0.11 0.16 0.43 0.25 0.17 -0.26

0.51 0.84 -0.17 0.02 0.16 0.27 -0.15 0.14 -0.34 -0.12

-0.20 -0.05 -0.58 0.20 -0.16 0.27 0.83 -0.56 0.42 0.35

0.24 0.05 0.72 -0.05 -0.25 -0.38 -0.33 0.66 0.05 -0.63

0.08 0.41 -0.21 0.41 0.59 -0.12 -0.54 0.27 0.38 0.00

-0.76 -0.35 -0.09 1.25 -0.78 0.55 -0.22 0.61 0.51 0.27

-0.15 0.11 0.54 -0.53 0.55 0.17 0.09 -0.72 0.03 0.12

0.03 0.41 0.49 -0.44 -0.01 0.05 -0.05 -0.03 -0.32 -0.30

0.63 -0.47 -0.15 0.17 0.38 -0.24 0.58 0.07 -0.16 0.54

0 (OK)

1 (ERR)

2 (ERR)

3 (OK)

4 (ERR)

5 (OK)

6 (OK)

7 (ERR)

8 (ERR)

9 (OK)

Estimated output (test dataset) – Experiment #8

Digit

0.10 0.10 0.12 0.10 0.12 0.13 0.13 0.26 0.17 0.39

0.13 0.10 0.11 0.10 0.11 0.10 0.29 0.23 0.32 0.10

0.26 0.38 0.10 0.10 0.12 0.10 0.10 0.17 0.10 0.10

0.10 0.10 0.10 0.10 0.10 0.17 0.39 0.10 0.38 0.10

0.15 0.10 0.24 0.10 0.10 0.10 0.10 0.39 0.37 0.10

0.20 0.12 0.10 0.10 0.37 0.10 0.10 0.10 0.17 0.12

0.10 0.10 0.10 0.39 0.10 0.16 0.11 0.30 0.14 0.10

0.10 0.11 0.39 0.10 0.10 0.15 0.10 0.10 0.17 0.10

0.10 0.25 0.34 0.10 0.10 0.10 0.10 0.10 0.10 0.10

0.39 0.10 0.10 0.10 0.28 0.10 0.27 0.11 0.10 0.21

0 (OK)

1 (OK)

2 (OK)

3 (ERR)

4 (OK)

5 (ERR)

6 (OK)

7 (OK)

8 (ERR)

9 (OK)

Tip

The experiments showed in this chapter have taken in consideration 10x10 pixel information images. We recommend that you try to use 20x20 pixel datasets to build a neural net able to classify digit images of this size.

You should also change the training parameters of the neural net to achieve better classifications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset