Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Normalizing with the Box-Cox transformation

Data that doesn't follow a known distribution, such as the normal distribution, is often difficult to manage. A popular strategy to get control of the data is to apply the Box-Cox transformation. It is given by the following equation:

The scipy.stats.boxcox() function can apply the transformation for positive data. We will use the same data as in the Clipping and filtering outliers recipe. With Q-Q plots, we will show that the Box-Cox transformation does indeed make the data appear more normal.

How to do it...

The following steps show how to normalize data with the Box-Cox transformation:

The imports are as follows:

import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import boxcox
import seaborn as sns
import dautil as dl
from IPython.display import HTML

Load the data and transform it as follows:

context = dl.nb.Context('normalizing_boxcox')

starsCYG = sm.datasets.get_rdataset("starsCYG", "robustbase", cache=True).data

var = 'log.Te'

# Data must be positive
transformed, _ = boxcox(starsCYG[var])

Display the Q-Q plots and the distribution plots as follows:

sp = dl.plotting.Subplotter(2, 2, context)
sp.label()
sm.qqplot(starsCYG[var], fit=True, line='s', ax=sp.ax)

sp.label(advance=True)
sm.qqplot(transformed, fit=True, line='s', ax=sp.ax)

sp.label(advance=True)
sns.distplot(starsCYG[var], ax=sp.ax)

sp.label(advance=True)
sns.distplot(transformed, ax=sp.ax)                                       
plt.tight_layout()
HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (refer to the normalizing_boxcox.ipynb file in this book's code bundle):

How it works

The Q-Q plots, in the previous screenshot, graph theoretical quantiles for the normal distribution against the quantiles of the actual data. To help evaluate conformance to the normal distribution, I displayed a line that should correspond with perfectly normal data. The more the data fits the line, the more normal it is. As you can see, the transformed data fits the line better and is, therefore, more normal. The distribution plots should help you to confirm this.

Table of Contents for
Normalizing with the Box-Cox transformation

Normalizing with the Box-Cox transformation

How to do it...

How it works

See also

Table of Contents for Normalizing with the Box-Cox transformation

Create new playlist

Sign In

Sign Up

Normalizing with the Box-Cox transformation

How to do it...

How it works

See also

Table of Contents for
Normalizing with the Box-Cox transformation