Temperature

Temperature is usually measured using a thermometer early during the patient encounter and can be recorded in degrees Celsius or Fahrenheit. A temperature of 98.6° F (37.1° C) is usually considered a normal body temperature. Temperatures markedly above this range can be termed as fever or hyperthermia and usually reflect infection, inflammation, or environmental overexposure to the sun. Temperatures below normal by a certain amount are termed hypothermia and usually reflect environmental exposure to cold. The more the temperature deviates from normal, usually the more serious the illness is.

In our dataset, the TEMPF temperature has been multiplied by 10 and stored as an integer. Also, some values are blank (indicated by -9) and we must impute those, since temperature is a continuous variable. Following that, we first convert the temperature to a numeric type, use our previously written mean_impute_values() function to impute the missing values in TEMPF, and then use a lambda function to divide all temperatures by 10:

X_train.loc[:,'TEMPF'] = X_train.loc[:,'TEMPF'].apply(pd.to_numeric)
X_test.loc[:,'TEMPF'] = X_test.loc[:,'TEMPF'].apply(pd.to_numeric)

X_train = mean_impute_values(X_train,'TEMPF')
X_test = mean_impute_values(X_test,'TEMPF')

X_train.loc[:,'TEMPF'] = X_train.loc[:,'TEMPF'].apply(lambda x: float(x)/10)
X_test.loc[:,'TEMPF'] = X_test.loc[:,'TEMPF'].apply(lambda x: float(x)/10)

Let's print out 30 values of just this column to confirm that our processing was performed correctly:

X_train['TEMPF'].head(n=30)

The output is as follows:

15938     98.200000
5905      98.100000
4636      98.200000
9452      98.200000
7558      99.300000
17878     99.000000
21071     97.800000
20990     98.600000
4537      98.200000
7025      99.300000
2134      97.500000
5212      97.400000
9213      97.900000
2306      97.000000
6106      98.600000
2727      98.282103
4098      99.100000
5233      98.800000
5107     100.000000
18327     98.900000
19242     98.282103
3868      97.900000
12903     98.600000
12763     98.700000
8858      99.400000
8955      97.900000
16360     98.282103
6857      97.100000
6842      97.700000
22073     97.900000
Name: TEMPF, dtype: float64

We can see that the temperatures are now of the float type and that they are not multiplied by 10. Also, we see that the mean value, 98.282103, has been substituted where values were previously blank. Let's move on to the next variable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset