Imputation

This replaces the missing value with a number that makes sense.

There are various ways in which imputation can be performed. Some of them are as follows:

  • Imputing all the missing values in a dataset with 0:
data.fillna(0)
  • Imputing all the missing values with specified text:
data.fillna('text')
  • Imputing only the missing values in the body column with 0:
data['body'].fillna(0)
  • Imputing with a mean of non-missing values:
data['age'].fillna(data['age'].mean())
  • Imputing with a forward fill – this works especially well for time series data. Here, a missing value is replaced with the value in the previous row (period):
data['age'].fillna(method='ffill')

The following is the output:

Output DataFrame with missing values imputed with the forward fill method
  • Imputing with a backward fill – this works especially well for time series data. Here, a missing value is replaced with the value in the previous row (period). You can control the number of rows that get filled after the first NaN using pad options. Pad=1 means only 1 row will be filled forward or backward:
data['age'].fillna(method='backfill')

The following is the output:

Output DataFrame with missing values imputed with the backward fill method
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset