Using Decision Trees to Make a Medical Diagnosis

Now that we know how to handle data in all shapes and forms, be it numerical, categorical, text, or image data, it is time to put our newly gained knowledge to good use.

In this chapter, we will learn how to build a machine learning system that can make a medical diagnosis. We aren't all doctors, but we've probably all been to one at some point in our lives. Typically, a doctor would gain as much information as possible about a patient's history and symptoms to make an informed diagnosis. We will mimic a doctor's decision-making process with the help of what is known as decision trees. We will also cover the Gini coefficient, information gain, and variance reduction, along with overfitting and pruning.

A decision tree is a simple yet powerful supervised learning algorithm that resembles a flow chart; we will talk more about this in just a minute. Other than in medicine, decision trees are commonly used in fields such as astronomy (for example, for filtering noise from the Hubble Space Telescope images or to classify star-galaxy clusters), manufacturing and production (for example, by Boeing to discover flaws in the manufacturing process), and object recognition (for example, for recognizing 3D objects).

Specifically, we want to learn about the following in this chapter:

  • Building simple decision trees from data and using them for either classification or regression
  • Deciding which decision to make next using the Gini coefficient, information gain, and variance reduction
  • Pruning a decision tree and its benefits

But first, let's talk about what decision trees actually are.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset