Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Evaluating relations between variables with ANOVA

Analysis of variance (ANOVA) is a statistical data analysis method invented by statistician Ronald Fisher. This method partitions data of a continuous variable using the values of one or more corresponding categorical variables to analyze variance. ANOVA is a form of linear modeling. If we are modeling with one categorical variable, we speak of one-way ANOVA. In this recipe, we will use two categorical variables so we have two-way ANOVA. In two-way ANOVA, we create a contingency table—a table containing counts for all combinations of the two categorical variables (we will see a contingency table example soon). The linear model is then given by the equation:

This is an additive model where μ_ij is the mean of the continuous variable corresponding to one cell of the contingency table, μ is the mean for the whole data set, α_i is the contribution of the first categorical variable, β_j is the contribution of the second categorical variable, and ɣ ij is a cross-term. We will apply this model to weather data.

How to do it...

The following steps apply two-way ANOVA to wind speed as continuous variable, rain as a binary variable, and wind direction as categorical variable:

The imports are as follows:

from statsmodels.formula.api import ols
import dautil as dl
from statsmodels.stats.anova import anova_lm
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import HTML

Load the data and fit the model with statsmodels:

df = dl.data.Weather.load().dropna()
df['RAIN'] = df['RAIN'] > 0
formula = 'WIND_SPEED ~ C(RAIN) + C(WIND_DIR)'
lm = ols(formula, df).fit()
hb = dl.HTMLBuilder()
hb.h1('ANOVA Applied to Weather Data')
hb.h2('ANOVA results')
hb.add_df(anova_lm(lm), index=True)

Display a truncated contingency table and visualize the data with Seaborn:

df['WIND_DIR'] = dl.data.Weather.categorize_wind_dir(df)
hb.h2('Truncated Contingency table')
hb.add_df(df.groupby([df['RAIN'], df['WIND_DIR']]).count().head(3),index=True)

sns.pointplot(y='WIND_SPEED', x='WIND_DIR',
              hue='RAIN', data=df[['WIND_SPEED', 'RAIN', 'WIND_DIR']])
HTML(hb.html)

Refer to the following screenshot for the end result (see anova.ipynb file in this book's code bundle):

Table of Contents for
Evaluating relations between variables with ANOVA

Evaluating relations between variables with ANOVA

How to do it...

See also

Table of Contents for Evaluating relations between variables with ANOVA

Create new playlist

Sign In

Sign Up

Evaluating relations between variables with ANOVA

How to do it...

See also

Table of Contents for
Evaluating relations between variables with ANOVA