This replaces the missing value with a number that makes sense.
There are various ways in which imputation can be performed. Some of them are as follows:
- Imputing all the missing values in a dataset with 0:
data.fillna(0)
- Imputing all the missing values with specified text:
data.fillna('text')
- Imputing only the missing values in the body column with 0:
data['body'].fillna(0)
- Imputing with a mean of non-missing values:
data['age'].fillna(data['age'].mean())
- Imputing with a forward fill – this works especially well for time series data. Here, a missing value is replaced with the value in the previous row (period):
data['age'].fillna(method='ffill')
The following is the output:
Output DataFrame with missing values imputed with the forward fill method
- Imputing with a backward fill – this works especially well for time series data. Here, a missing value is replaced with the value in the previous row (period). You can control the number of rows that get filled after the first NaN using pad options. Pad=1 means only 1 row will be filled forward or backward:
data['age'].fillna(method='backfill')
The following is the output:
Output DataFrame with missing values imputed with the backward fill method