Oxygen saturation measures the oxygen level in the blood. It is reported as a percentage, with higher values being more healthy. We convert it to a numeric type and perform mean imputation as follows:
X_train.loc[:,'POPCT'] = X_train.loc[:,'POPCT'].apply(pd.to_numeric)
X_test.loc[:,'POPCT'] = X_test.loc[:,'POPCT'].apply(pd.to_numeric)
X_train = mean_impute_values(X_train,'POPCT')
X_test = mean_impute_values(X_test,'POPCT')
Let's examine the vital sign transformations we've done so far by selecting those columns and using the head() function:
X_train[['TEMPF','PULSE','RESPR','BPSYS','BPDIAS','POPCT']].head(n=20)
The output is as follows:
TEMPF | PULSE | RESPR | BPSYS | BPDIAS | ||
---|---|---|---|---|---|---|
15938 | 98.200000 | 101.000000 | 22.0 | 159.000000 | 72.000000 | 98.000000 |
5905 | 98.100000 | 70.000000 | 18.0 | 167.000000 | 79.000000 | 96.000000 |
4636 | 98.200000 | 85.000000 | 20.0 | 113.000000 | 70.000000 | 98.000000 |
9452 | 98.200000 | 84.000000 | 20.0 | 146.000000 | 72.000000 | 98.000000 |
7558 | 99.300000 | 116.000000 | 18.0 | 131.000000 | 82.000000 | 96.000000 |
17878 | 99.000000 | 73.000000 | 16.0 | 144.000000 | 91.000000 | 99.000000 |
21071 | 97.800000 | 88.000000 | 18.0 | 121.000000 | 61.000000 | 98.000000 |
20990 | 98.600000 | 67.000000 | 16.0 | 112.000000 | 65.000000 | 95.000000 |
4537 | 98.200000 | 85.000000 | 20.0 | 113.000000 | 72.000000 | 99.000000 |
7025 | 99.300000 | 172.000000 | 40.0 | 124.000000 | 80.000000 | 100.000000 |
2134 | 97.500000 | 91.056517 | 18.0 | 146.000000 | 75.000000 | 94.000000 |
5212 | 97.400000 | 135.000000 | 18.0 | 125.000000 | 71.000000 | 99.000000 |
9213 | 97.900000 | 85.000000 | 18.0 | 153.000000 | 96.000000 | 99.000000 |
2306 | 97.000000 | 67.000000 | 20.0 | 136.000000 | 75.000000 | 99.000000 |
6106 | 98.600000 | 90.000000 | 18.0 | 109.000000 | 70.000000 | 98.000000 |
2727 | 98.282103 | 83.000000 | 17.0 | 123.000000 | 48.000000 | 92.000000 |
4098 | 99.100000 | 147.000000 | 20.0 | 133.483987 | 78.127013 | 100.000000 |
5233 | 98.800000 | 81.000000 | 16.0 | 114.000000 | 78.000000 | 97.311242 |
5107 | 100.000000 | 95.000000 | 24.0 | 133.000000 | 75.000000 | 94.000000 |
18327 | 98.900000 | 84.000000 | 16.0 | 130.000000 | 85.000000 | 98.000000 |
Examining the preceding table, it looks like we are in good shape. We can see the imputed mean values for each column (values having extra precision). Let's move onto the last vital sign we have in our data, the pain level.