Imputation

In Chapter 1, Introduction to Healthcare Analytics, we mentioned the importance of handling missing data. Imputation is one strategy for dealing with missing values in which missing values are filled in with estimates that are derived based on the data that is present. In healthcare, two common types of imputation are zero imputation, in which missing data is taken to be zero (for example, if a particular diagnosis has a value of NULL, most likely that is because it is not present in the patient chart) and mean imputation, in which the missing data is taken to be the mean of the distribution of the present data (for example, if a patient has a missing age, we can impute it as 40). We demonstrated various imputation methods in Chapter 4, Computing Foundations – Databases, and we will write our own custom functions for performing imputation in Chapter 7, Making Predictive Models in Healthcare.

Scikit-learn comes with an Imputer class for performing different types of imputation. You can see details on how it is used at http://scikit-learn.org/stable/modules/preprocessing.html#imputation-of-missing-values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset