A special type of activation function – Logistic regression

We've covered that neural networks can work as data classifiers by establishing decision boundaries onto data in the hyperspace. Such a boundary can be linear in the case of perceptrons or nonlinear in the case of other neural architectures such as MLPs, Kohonen, or Adaline. The linear case is based on linear regression, on which the classification boundary is literally a line, as shown in the preceding figure. If the scatter chart of the data looks like that shown in the following figure, then a nonlinear classification boundary is needed.

A special type of activation function – Logistic regression

Neural networks are in fact a great nonlinear classifier, and this is achieved by the usage of nonlinear activation functions. One nonlinear function that actually works well for nonlinear classification is the sigmoid function, and the procedure for classification using this function is called logistic regression.

A special type of activation function – Logistic regression

This function returns values bounded between 0 and 1. In this function, the α parameter denotes how hard the transition from 0 to 1 occurs. The following chart shows the difference:

A special type of activation function – Logistic regression

Note that the larger the value of the α parameter is, the more the logistic function takes a shape of a hard-limiting threshold function, also known as a step function.

Multiple classes versus binary classes

Classification problems usually deal with a case of multiple classes, where each class is assigned a label. However, a binary classification schema is applied in neural networks. This is because a neural network with a logistic function at the output layer can produce only values between 0 and 1, meaning that it assigns (1) or not (0) to some classes.

Nevertheless, there is one approach for multiple classes using binary functions. Consider that every class is represented by an output neuron, and whenever this output neuron fires, the neuron's corresponding class is applied on the input data record. So, let's suppose a network to classify diseases; each neuron output represents a disease to be applied to some symptom:

Multiple classes versus binary classes

Tip

Note that in this configuration, it is possible to have multiple diseases with the same symptoms. However, if it is desirable to choose only one class, then a schema as a competitive learning algorithm is more suitable.

Comparing the expected versus produced results – the confusion matrix

There is no perfect classifier algorithm; all of them are subjected to errors and biases. However, it is expected that a classification algorithm can correctly classify 70% to 90% of the records.

Tip

Very high correct classification rates are not always desirable because of the possible biases presented in the input data that might affect the classification task, and there is a risk of overtraining, when only the training data are correctly classified.

A confusion matrix shows how many of a given class's records were correctly classified and therefore how many were wrongly classified. The following table depicts what a confusion matrix may look like:

Actual class

Inferred class

Total

 

A

B

C

D

E

F

G

 

A

92%

1%

0%

4%

0%

1%

2%

100%

B

0%

83%

5%

6%

2%

3%

1%

100%

C

1%

3%

85%

0%

2%

5%

4%

100%

D

0%

3%

0%

92%

2%

1%

1%

100%

E

0%

10%

2%

1%

78%

1%

8%

100%

F

22%

2%

2%

3%

3%

65%

3%

100%

G

9%

6%

0%

16%

0%

3%

66%

100%

Note that the main diagonal is expected to have higher values, as the classification algorithm will always try to extract meaningful information from the input dataset. The sum of all rows must be equal to 100% because all elements of a given class are to be classified in one of the available classes. However, note that some classes may receive more classifications than expected.

The more a confusion matrix looks like an identity matrix, the better the classification algorithm will be.

Classification measures – sensitivity and specificity

When the classification is binary, the confusion matrix is found to be a simple 2 x 2 matrix, and therefore, its positions are specially named:

Actual Class

Inferred Class

Positive (1)

Negative (0)

Positive (1)

True Positive

False Negative

Negative (0)

False Positive

True Negative

In disease diagnosis, which is the subject of this chapter, the concept of a binary confusion matrix is applied in the sense that a false diagnosis may be either a false positive or a false negative. The rate of false results can be measured by using sensitivity and specificity indexes.

Sensitivity denotes the true positive rate; it measures how many of the records are correctly classified positively.

Classification measures – sensitivity and specificity

Specificity in turn represents the true negative rate; it indicates the proportion of negative record identification.

Classification measures – sensitivity and specificity

High values of both sensitivity and specificity are desired; however, depending on the application field, sensitivity may carry more meaning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset