Disease diagnosis with neural networks

For disease diagnosis, we are going to use the free dataset proben1, which is available on the Web (http://www.filewatcher.com/m/proben1.tar.gz.1782734-0.html). Proben1 is a benchmark set of several datasets from different domains. We are going to use the cancer and the diabetes datasets. We add a class to run the experiments of each case: DiagnosisExample.

Breast cancer

The breast cancer dataset is composed of 10 variables, of which nine are inputs and one is a binary output. The dataset has 699 records, but we excluded from them 16 which were found to be incomplete, thus we used 683 to train and test the neural network.

Tip

In real practical problems, it is common to have missing or invalid data. Ideally, the classification algorithm must handle these records, but sometimes it is recommended that you exclude them, since there would be not enough information to produce an accurate result.

The following table shows a configuration of this dataset:

Variable Name

Type

Maximum Value and Minimum Value

Diagnosis result

OUTPUT

[0; 1]

Clump Thickness

INPUT #1

[1; 10]

Uniformity of Cell Size

INPUT #2

[1; 10]

Uniformity of Cell Shape

INPUT #3

[1; 10]

Marginal Adhesion

INPUT #4

[1; 10]

Single Epithelial Cell Size

INPUT #5

[1; 10]

Bare Nuclei

INPUT #6

[1; 10]

Bland Chromatin

INPUT #7

[1; 10]

Normal Nucleoli

INPUT #8

[1; 10]

Mitoses

INPUT #9

[1; 10]

So, the proposed neural topology will be that of the following figure:

Breast cancer

The dataset division was made as follows:

  • Training: 549 records (80%);
  • Testing: 134 records (20%)

As in the previous cases, we performed many experiments to try to find the best neural network to classify whether cancer is benign or malignant. So we conducted 12 different experiments (1,000 epochs per experiment), wherein MSE and accuracy values were analyzed. After that, the confusion matrix, sensitivity, and specificity were generated with the test dataset and analysis was done. Finally, an analysis of generalization was taken. The neural networks involved in the experiments are shown in the following table:

Experiment

Number of neurons in hidden layer

Learning rate

Activation Function

#1

3

0.1

Hidden Layer: SIGLOG

Output Layer: LINEAR

#2

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#3

0.5

Hidden Layer: SIGLOG

Output Layer: LINEAR

#4

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#5

0.9

Hidden Layer: SIGLOG

Output Layer: LINEAR

#6

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#7

5

0.1

Hidden Layer: SIGLOG

Output Layer: LINEAR

#8

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#9

0.5

Hidden Layer: SIGLOG

Output Layer: LINEAR

#10

Hidden Layer: HYPERTAN

Output Layer: LINEAR

#11

0.9

Hidden Layer: SIGLOG

Output Layer: LINEAR

#12

Hidden Layer: HYPERTAN

Output Layer: LINEAR

After each experiment, we collected MSE values (Table X); experiments #4, #8, #9, #10, and #11 were equivalents, because they have low MSE values and same total accuracy measure (92.25%). Therefore, we selected experiments #4 and #11, because they have low MSE values among the five experiments mentioned before:

Experiment

MSE training rate

Total accuracy

#1

0.01067

96.29%

#2

0.00443

98.50%

#3

9.99611E-4

97.77%

#4

9.99913E-4

99.25%

#5

9.99670E-4

96.26%

#6

9.92578E-4

97.03%

#7

0.01392

98.49%

#8

0.00367

99.25%

#9

9.99928E-4

99.25%

#10

9.99951E-4

99.25%

#11

9.99926E-4

99.25%

#12

NaN

3.44%

Graphically, the MSE evolution over time is very fast, as can be seen in the following chart of the fourth experiment. Although we used 1,000 epochs to train, the experiment stopped earlier, because the minimum overall error (0.001) was reached:

Breast cancer

The confusion matrix is shown in the table with the sensibility and specificity for both experiments. It is possible to check that measures are the same for both experiments:

Experiment

Confusion Matrix

Sensibility

Specificity

#4

[[34.0, 1.0]

[0.00, 99.0]]

97.22%

100.0%

#11

[[34.0, 1.0]

[0.00, 99.0]]

97.22%

100.0%

If we had to choose between models generated by experiments #4 or #11, we recommend selecting #4, because it's simpler than #11 (it has fewer neurons in the hidden layer).

Diabetes

An additional example to be explored is the diagnosis of diabetes. This dataset has eight inputs and one output, shown in the table below. There are 768 records, all complete. However, proben1 states that there are several senseless zero values, probably indicating missing data. We're handling this data as if it was real anyway, thereby introducing some errors (or noise) into the dataset:

Variable Name

Type

Maximum Value and Minimum Value

Diagnosis result

OUTPUT

[0; 1]

Number of times pregnant

INPUT #1

[0.0; 17]

Plasma glucose concentration a 2 hours in an oral glucose tolerance test

INPUT #2

[0.0; 199]

Diastolic blood pressure (mm Hg)

INPUT #3

[0.0; 122]

Triceps skin fold thickness (mm)

INPUT #4

[0.0; 99]

2-Hour serum insulin (mu U/ml)

INPUT #5

[0.0; 744]

Body mass index (weight in kg/(height in m)^2)

INPUT #6

[0.0; 67.1]

Diabetes pedigree function

INPUT #7

[0.078; 2420]

Age (years)

INPUT #8

[21; 81]

The dataset division was made as follows:

  • Training: 617 records (80%)
  • Test: 151 records (20%)

To discover the best neural net topology to classify diabetes, we used the same schema of neural networks with the same analysis described in the last section. However, we're using multiple class classification in the output layer: two neurons in this layer will be used, one for the presence of diabetes and one for absence.

So, the proposed neural architecture looks like that of the following figure:

Diabetes

The table below shows the MSE training value and accuracy of the first six experiments and of the last six experiments:

Experiment

MSE training rate

Total accuracy

#1

0.00807

60.54%

#2

0.00590

71.03%

#3

9.99990E-4

75.49%

#4

9.98840E-4

74.17%

#5

0.00184

61.58%

#6

9.82774E-4

59.86%

#7

0.00706

63.57%

#8

0.00584

72.41%

#9

9.99994E-4

74.66%

#10

0.01047

72.14%

#11

0.00316

59.86%

#12

0.43464

40.13%

The fall of the MSE is fast in both cases. However, experiment #9 generates an increase of error rate in the first values. It is shown in the following figure:

Diabetes

Analyzing the confusion matrixes, it can be seen that the measures are very similar:

Experiment

Confusion Matrix

Sensibility

Specificity

#3

[[35.0, 12.0]

[25.0, 79.0]]

74.46%

75.96%

#9

[[34.0, 12.0]

[26.0, 78.0]]

73.91%

75.00%

One more time, we suggest choosing the simplest model. In the diabetes example, it is the artificial neural network generated by experiment #3.

Tip

It is recommended you explore the class D iagnosisExample and create a GUI to become easy select neural net parameters, as was done in the previous chapter. You should try to reuse code already programmed through the inheritance concept.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset