Temperature is usually measured using a thermometer early during the patient encounter and can be recorded in degrees Celsius or Fahrenheit. A temperature of 98.6° F (37.1° C) is usually considered a normal body temperature. Temperatures markedly above this range can be termed as fever or hyperthermia and usually reflect infection, inflammation, or environmental overexposure to the sun. Temperatures below normal by a certain amount are termed hypothermia and usually reflect environmental exposure to cold. The more the temperature deviates from normal, usually the more serious the illness is.
In our dataset, the TEMPF temperature has been multiplied by 10 and stored as an integer. Also, some values are blank (indicated by -9) and we must impute those, since temperature is a continuous variable. Following that, we first convert the temperature to a numeric type, use our previously written mean_impute_values() function to impute the missing values in TEMPF, and then use a lambda function to divide all temperatures by 10:
X_train.loc[:,'TEMPF'] = X_train.loc[:,'TEMPF'].apply(pd.to_numeric)
X_test.loc[:,'TEMPF'] = X_test.loc[:,'TEMPF'].apply(pd.to_numeric)
X_train = mean_impute_values(X_train,'TEMPF')
X_test = mean_impute_values(X_test,'TEMPF')
X_train.loc[:,'TEMPF'] = X_train.loc[:,'TEMPF'].apply(lambda x: float(x)/10)
X_test.loc[:,'TEMPF'] = X_test.loc[:,'TEMPF'].apply(lambda x: float(x)/10)
Let's print out 30 values of just this column to confirm that our processing was performed correctly:
X_train['TEMPF'].head(n=30)
The output is as follows:
15938 98.200000 5905 98.100000 4636 98.200000 9452 98.200000 7558 99.300000 17878 99.000000 21071 97.800000 20990 98.600000 4537 98.200000 7025 99.300000 2134 97.500000 5212 97.400000 9213 97.900000 2306 97.000000 6106 98.600000 2727 98.282103 4098 99.100000 5233 98.800000 5107 100.000000 18327 98.900000 19242 98.282103 3868 97.900000 12903 98.600000 12763 98.700000 8858 99.400000 8955 97.900000 16360 98.282103 6857 97.100000 6842 97.700000 22073 97.900000 Name: TEMPF, dtype: float64
We can see that the temperatures are now of the float type and that they are not multiplied by 10. Also, we see that the mean value, 98.282103, has been substituted where values were previously blank. Let's move on to the next variable.