Visualizing the goodness of fit

We expect, or at least hope, that the residuals of regression are just random noise. If that is not the case, then our regressor may be ignoring information. We expect the residuals to be independent and normally distributed. It is relatively easy to check with a histogram or a QQ plot. In general, we want the mean of the residuals to be as close to zero as possible, and we want the variance of the residuals to be as small as possible. An ideal fit will have zero-valued residuals.

How to do it...

  1. The imports are as follows:
    import numpy as np
    import matplotlib.pyplot as plt
    import dautil as dl
    import seaborn as sns
    from scipy.stats import probplot
    from IPython.display import HTML
  2. Load the target and predictions for the boosting regressor:
    y_test = np.load('temp_y_test.npy')
    preds = np.load('boosting.npy')
  3. Plot the actual and predicted values as follows:
    sp = dl.plotting.Subplotter(2, 2, context)
    cp = dl.plotting.CyclePlotter(sp.ax)
    cp.plot(y_test)
    cp.plot(preds)
    sp.ax.set_ylabel(dl.data.Weather.get_header('TEMP'))
    sp.label()
  4. Plot the residuals on their own as follows:
    residuals = preds - y_test
    sp.next_ax().plot(residuals)
    sp.label()
  5. Plot the distribution of the residuals:
    sns.distplot(residuals, ax=sp.next_ax())
    sp.label()
  6. Plot a QQ plot of the residuals:
    probplot(residuals, plot=sp.next_ax())
    HTML(sp.exit())

Refer to the following screenshot for the end result:

How to do it...

The code is in the visualizing_goodness.ipynb file in this book's code bundle.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset