Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Transforming data with the power ladder

Linear relations are commonplace in science and data analysis. Obviously, linear models are easier to understand than non-linear models. So historically, tools for linear models were developed first. In certain cases, it pays to linearize (make linear) data to make analysis simpler. A simple strategy that sometimes works is to square or cube one or more variables. Similarly, we can transform the data down an imaginary power ladder by taking the square or cube root.

In this recipe, we will use data from the Duncan dataset as described in https://vincentarelbundock.github.io/Rdatasets/doc/car/Duncan.html (retrieved August 2015). The data was gathered around 1961 and is about 45 occupations with four columns—type, income, education, and prestige. We will take a look at income and prestige. These variables seem to be linked by a cubic polynomial, so we can take the cube root of income or the cube of prestige. To check the result, we will visualize the residuals of regression. The expectation is that the residuals are randomly distributed, which means that we don't expect them to follow a recognizable pattern.

How to do it...

In the following steps, I will demonstrate the basic data transformation:

The imports are as follows:

import matplotlib.pyplot as plt
import numpy as np
import dautil as dl
import seaborn as sns
import statsmodels.api as sm
from IPython.display import HTML

Load and transform the data as follows:

df = sm.datasets.get_rdataset("Duncan", "car", cache=True).data
transformed = df.copy()
transformed['income'] = np.power(transformed['income'], 1.0/3)

Plot the original data with a Seaborn regression plot (cubic polynomial) as follows:

sp = dl.plotting.Subplotter(2, 2, context)
sp.label()
sns.regplot(x='income', y='prestige', data=df, order=3, ax=sp.ax)

Plot the transformed data with the following lines:

sp.label(advance=True)
sns.regplot(x='income', y='prestige', data=transformed, ax=sp.ax)

Plot the residuals plot for the cubic polynomial:

sp.label(advance=True)
sns.residplot(x='income', y='prestige', data=df, order=3, ax=sp.ax)

Plot the residuals plot for the transformed data as follows:

sp.label(advance=True)
sp.ax.set_xlim([1, 5])
sns.residplot(x='income', y='prestige', data=transformed, ax=sp.ax)
plt.tight_layout()
HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (refer to the transforming_up.ipynb file in this book's code bundle):

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Transforming data with the power ladder

Create new playlist

Sign In

Sign Up

Transforming data with the power ladder

How to do it...

Table of Contents for
Transforming data with the power ladder