Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Correlating variables with the Spearman rank correlation

The Spearman rank correlation uses ranks to correlate two variables with the Pearson Correlation. Ranks are the positions of values in sorted order. Items with equal values get a rank, which is the average of their positions. For instance, if we have two items of equal value assigned position 2 and 3, the rank is 2.5 for both items. Have a look at the following equations:

In these equations, n is the sample size. (3.17) shows how the correlation is calculated. (3.19) gives the standard error. (3.20) is about the z-score, which we assume to be normally distributed. F(r) is here the same as in (3.14), since it is the same correlation but applied to ranks.

How to do it...

In this recipe we calculate the Spearman correlation between wind speed and temperature aggregated by the day of the year and the corresponding confidence interval. Then, we display the correlation matrix for all the weather data. The steps are as follows:

The imports are as follows:

import dautil as dl
from scipy import stats
import numpy as np
import math
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.html import widgets
from IPython.display import display
from IPython.display import HTML

Define the following function to compute the confidence interval:

def get_ci(n, corr):
    z = math.sqrt((n - 3)/1.06) * np.arctanh(corr)
    se = 0.6325/(math.sqrt(n - 1))
    ci = z + np.array([-1, 1]) * se * stats.norm.ppf((1 + 0.95)/2)

    return np.tanh(ci)

Load the data and display widgets so that you can correlate a different pair if you want:

df = dl.data.Weather.load().dropna()
df = dl.ts.groupby_yday(df).mean()

drop1 = widgets.Dropdown(options=dl.data.Weather.get_headers(), 
                         selected_label='TEMP', description='Variable 1')
drop2 = widgets.Dropdown(options=dl.data.Weather.get_headers(), 
                         selected_label='WIND_SPEED', description='Variable 2')
display(drop1)
display(drop2)

Compute the Spearman rank correlation with SciPy:

var1 = df[drop1.value].values
var2 = df[drop2.value].values
stats_corr = stats.spearmanr(var1, var2)
dl.options.set_pd_options()
html_builder = dl.report.HTMLBuilder()
html_builder.h1('Spearman Correlation between {0} and {1}'.format(
    dl.data.Weather.get_header(drop1.value), dl.data.Weather.get_header(drop2.value)))
html_builder.h2('scipy.stats.spearmanr()')
dfb = dl.report.DFBuilder(['Correlation', 'p-value'])
dfb.row([stats_corr[0], stats_corr[1]])
html_builder.add_df(dfb.build())

Compute the confidence interval as follows:

n = len(df.index)
ci = get_ci(n, stats_corr)
html_builder.h2('Confidence intervale')
dfb = dl.report.DFBuilder(['2.5 percentile', '97.5 percentile'])
dfb.row(ci)
html_builder.add_df(dfb.build())

Display the correlation matrix as a Seaborn heatmap:

corr = df.corr(method='spearman')

%matplotlib inline
plt.title('Spearman Correlation Matrix')
sns.heatmap(corr)
HTML(html_builder.html)

Refer to the following screenshot for the end result (see the correlating_spearman.ipynb file in this book's code bundle):

Table of Contents for
Correlating variables with the Spearman rank correlation

Correlating variables with the Spearman rank correlation

How to do it...

See also

Table of Contents for Correlating variables with the Spearman rank correlation

Create new playlist

Sign In

Sign Up

Correlating variables with the Spearman rank correlation

How to do it...

See also

Table of Contents for
Correlating variables with the Spearman rank correlation