Fitting a robust linear model

Robust regression is designed to deal better with outliers in data than ordinary regression. This type of regression uses special robust estimators, which are also supported by statsmodels. Obviously, there is no best estimator, so the choice of estimator depends on the data and the model.

In this recipe, we will fit data about annual sunspot counts available in statsmodels. We will define a simple model where the current count depends linearly on the previous value. To demonstrate the effect of outliers, I added a pretty big value and we will compare the robust regression model and an ordinary least squares model.

How to do it...

The following steps describe how to apply the robust linear model:

  1. The imports are as follows:
    import statsmodels.api as sm
    import matplotlib.pyplot as plt
    import dautil as dl
    from IPython.display import HTML
  2. Define the following function to set the labels of the plots:
    def set_labels(ax):
        ax.set_xlabel('Year')
        ax.set_ylabel('Sunactivity')
  3. Define the following function to plot the model fits:
    def plot_fit(df, ax, results):
        x = df['YEAR']
        cp = dl.plotting.CyclePlotter(ax)
        cp.plot(x[1:], df['SUNACTIVITY'][1:], label='Data')
        cp.plot(x[2:], results.predict()[1:], label='Fit')
        ax.legend(loc='best')
  4. Load the data and add an outlier for demonstration purposes:
    df = sm.datasets.sunspots.load_pandas().data
    vals = df['SUNACTIVITY'].values
    
    # Outlier added by malicious person, because noone
    # laughs at his jokes.
    vals[0] = 100
  5. Fit the robust model as follows:
    rlm_model = sm.RLM(vals[1:], sm.add_constant(vals[:-1]),
                       M=sm.robust.norms.TrimmedMean())
    
    rlm_results = rlm_model.fit()
    hb = dl.report.HTMLBuilder()
    hb.h1('Fitting a robust linear model')
    hb.h2('Robust Linear Model')
    hb.add(rlm_results.summary().tables[1].as_html())
  6. Fit an ordinary least squares model:
    hb.h2('Ordinary Linear Model')
    ols_model = sm.OLS(vals[1:], sm.add_constant(vals[:-1]))
    ols_results = ols_model.fit()
    hb.add(ols_results.summary().tables[1].as_html())
  7. Plot the data and the model results with the following code:
    fig, [ax, ax2] = plt.subplots(2, 1)
    
    plot_fit(df, ax, rlm_results)
    ax.set_title('Robust Linear Model')
    set_labels(ax)
    
    ax2.set_title('Ordinary Least Squares')
    plot_fit(df, ax2, ols_results)
    set_labels(ax2)
    plt.tight_layout()
    HTML(hb.html)

Refer to the following screenshot for the end result (refer to the rlm_demo.ipynb file in this book's code bundle):

How to do it...

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset