Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Fitting a robust linear model

Robust regression is designed to deal better with outliers in data than ordinary regression. This type of regression uses special robust estimators, which are also supported by statsmodels. Obviously, there is no best estimator, so the choice of estimator depends on the data and the model.

In this recipe, we will fit data about annual sunspot counts available in statsmodels. We will define a simple model where the current count depends linearly on the previous value. To demonstrate the effect of outliers, I added a pretty big value and we will compare the robust regression model and an ordinary least squares model.

How to do it...

The following steps describe how to apply the robust linear model:

The imports are as follows:

import statsmodels.api as sm
import matplotlib.pyplot as plt
import dautil as dl
from IPython.display import HTML

Define the following function to set the labels of the plots:

def set_labels(ax):
    ax.set_xlabel('Year')
    ax.set_ylabel('Sunactivity')

Define the following function to plot the model fits:

def plot_fit(df, ax, results):
    x = df['YEAR']
    cp = dl.plotting.CyclePlotter(ax)
    cp.plot(x[1:], df['SUNACTIVITY'][1:], label='Data')
    cp.plot(x[2:], results.predict()[1:], label='Fit')
    ax.legend(loc='best')

Load the data and add an outlier for demonstration purposes:

df = sm.datasets.sunspots.load_pandas().data
vals = df['SUNACTIVITY'].values

# Outlier added by malicious person, because noone
# laughs at his jokes.
vals[0] = 100

Fit the robust model as follows:

rlm_model = sm.RLM(vals[1:], sm.add_constant(vals[:-1]),
                   M=sm.robust.norms.TrimmedMean())

rlm_results = rlm_model.fit()
hb = dl.report.HTMLBuilder()
hb.h1('Fitting a robust linear model')
hb.h2('Robust Linear Model')
hb.add(rlm_results.summary().tables[1].as_html())

Fit an ordinary least squares model:

hb.h2('Ordinary Linear Model')
ols_model = sm.OLS(vals[1:], sm.add_constant(vals[:-1]))
ols_results = ols_model.fit()
hb.add(ols_results.summary().tables[1].as_html())

Plot the data and the model results with the following code:

fig, [ax, ax2] = plt.subplots(2, 1)

plot_fit(df, ax, rlm_results)
ax.set_title('Robust Linear Model')
set_labels(ax)

ax2.set_title('Ordinary Least Squares')
plot_fit(df, ax2, ols_results)
set_labels(ax2)
plt.tight_layout()
HTML(hb.html)

Refer to the following screenshot for the end result (refer to the rlm_demo.ipynb file in this book's code bundle):

Table of Contents for
Fitting a robust linear model

Fitting a robust linear model

How to do it...

See also

Table of Contents for Fitting a robust linear model

Create new playlist

Sign In

Sign Up

Fitting a robust linear model

How to do it...

See also

Table of Contents for
Fitting a robust linear model