Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. Classifying Disease Diagnosis

So far, we have been working with supervised learning for predicting numerical values; however, in the real world, numbers are just part of the data addressed. Real variables also contain categorical values, which are not purely numerical, but describe important features that have influence on the problems neural networks are applied to solve. In this chapter, the reader will be presented with a very didactic but interesting application involving categorical values and classification: disease diagnosis. This chapter digs deeper into classification problems and how to represent categorical data, as well as showing how to design a classification algorithm using neural networks. The topics covered in this chapter are as follows:

Foundations of classification problems
Categorical data
Logistic regression
Confusion matrix
Sensibility and specificity
Neural networks for classification
Disease diagnosis using neural networks
Diagnosis for cancer
Diagnosis for diabetes

Foundations of classification problems

One thing neural networks are really good at is classifying records. The very simple perceptron network draws a decision boundary, defining whether a data point belongs to one region or another, whereas a region denotes a class. Let's take a look visually on an x-y scatter chart:

The dashed lines explicitly separate the points into classes. These points represent data records which originally had the corresponding class labels. That means their classes were already known, therefore this classification tasks falls in the supervised learning category.

A classification algorithm seeks to find the boundaries between the classes in the data hyperspace. Once the classification boundaries are defined, a new data point, with an unknown class, receives a class label according to the boundaries defined by the classification algorithm. The figure below shows how a new record is classified:

Based on the current class configuration, the new record's class is the third class.

Categorical data

Applications usually lead with the types of data shown in the following figure:

Data can be numerical or categorical or, simply speaking, numbers or words. Numerical data is represented by a numeric value, from which it can be continuous or discrete. This data type has been used so far in this book's applications. Categorical data is a wider class of data that includes words, letters, or even numbers, but with a quite different meaning. While numerical data can support arithmetic operations, categorical data is only descriptive and cannot be processed like numbers, even if the value is a number. An example is the severity degree of a disease in a scale (from zero to five, for example). Another property of categorical data is that a certain variable has a finite number of values; in other words, only a defined set of values can be assigned to a categorical variable. A subclass of data inside the categorical is ordinal data. This class is particular because the defined values can be sorted in a predefined order. An example is adjectives indicating the state or quality of something (bad, fair, good, excellent):

Numerical		Categorical
Only numbers		Numbers, words, letters, signs
Can support arithmetic operations		Do not support arithmetic operations
Infinite or undefined range of values		Finite or defined set of values
Continuous	Discrete	Ordinal	Non-ordinal
Real values	Integers, decimal	Can be ordered	Cannot be ordered
Any possible value	Predefined intervals	Can be assigned numbers	Each possible value is a flag

Tip

Note that here we are addressing structured data only. In the real world, most data is unstructured, including text and multimedia content. Although these types of data are also processed in learning from data applications, neural networks require them to be transformed into structured data types.

Working with categorical data

Structured data files, such as those used in CSV or Excel, usually contain columns of numerical and categorical data. In Chapter 5, Forecasting Weather we have created the classes LoadCsv (for loading csv files) and DataSet (for storing data from csv), but these classes are prepared only for working with numerical data. The simplest way of representing categorical value is converting each possible value into a binary column, whereby if the given value is presented in the original column, the corresponding binary column will have a one as the converted value, otherwise it will be zero:

Ordinal columns can assume the defined values as numerical in the same column; however, if the original values are letters or words, they need to be converted into numbers via a Java Dictionary.

The strategy described above may be implemented by you as an exercise. Otherwise, you would have to handle this manually. In this case, depending on the number of data rows, it can be time-consuming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Classifying Disease Diagnosis

Create new playlist

Sign In

Sign Up

Chapter 6. Classifying Disease Diagnosis

Foundations of classification problems

Categorical data

Tip

Working with categorical data

Table of Contents for
6. Classifying Disease Diagnosis