Anscombe's quartet is a classic example that illustrates why visualizing data is important. The quartet consists of four datasets with similar statistical properties. Each dataset has a series of x values and dependent y values. We will tabulate these metrics in an IPython notebook. However, if you plot the datasets, they look surprisingly different compared to each other.
For this recipe, you need to perform the following steps:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import matplotlib as mpl from dautil import report from dautil import plotting import numpy as np from tabulate import tabulate
x
and y
within a dataset, the slope, and the intercept of a linear fit for each of the datasets:df = sns.load_dataset("anscombe") agg = df.groupby('dataset') .agg([np.mean, np.var]) .transpose() groups = df.groupby('dataset') corr = [g.corr()['x'][1] for _, g in groups] builder = report.DFBuilder(agg.columns) builder.row(corr) fits = [np.polyfit(g['x'], g['y'], 1) for _, g in groups] builder.row([f[0] for f in fits]) builder.row([f[1] for f in fits]) bottom = builder.build(['corr', 'slope', 'intercept']) return df, pd.concat((agg, bottom))
def generate(table):
writer = report.RSTWriter()
writer.h1('Anscombe Statistics')
writer.add(tabulate(table, tablefmt='html', floatfmt='.3f'))
return writer.rst
lmplot()
function:def plot(df):
sns.set(style="ticks")
g = sns.lmplot(x="x", y="y", col="dataset",
hue="dataset", data=df,
col_wrap=2, ci=None, palette="muted", size=4,
scatter_kws={"s": 50, "alpha": 1})
plotting.embellish(g.fig.axes)
df, table = aggregate() from IPython.display import display_markdown display_markdown(generate(table), raw=True)
The following table shows practically identical statistics for each dataset (I modified the custom.css
file in my IPython profile to get the colors):
%matplotlib inline plot(df)
Refer to the following plot for the end result:
A picture says more than a thousand words. The source code is in the anscombe.ipynb
file in this book's code bundle.
lmplot()
function at https://web.stanford.edu/~mwaskom/software/seaborn/generated/seaborn.lmplot.html (retrieved July 2015)