We expect, or at least hope, that the residuals of regression are just random noise. If that is not the case, then our regressor may be ignoring information. We expect the residuals to be independent and normally distributed. It is relatively easy to check with a histogram or a QQ plot. In general, we want the mean of the residuals to be as close to zero as possible, and we want the variance of the residuals to be as small as possible. An ideal fit will have zero-valued residuals.
import numpy as np import matplotlib.pyplot as plt import dautil as dl import seaborn as sns from scipy.stats import probplot from IPython.display import HTML
y_test = np.load('temp_y_test.npy') preds = np.load('boosting.npy')
sp = dl.plotting.Subplotter(2, 2, context) cp = dl.plotting.CyclePlotter(sp.ax) cp.plot(y_test) cp.plot(preds) sp.ax.set_ylabel(dl.data.Weather.get_header('TEMP')) sp.label()
residuals = preds - y_test sp.next_ax().plot(residuals) sp.label()
sns.distplot(residuals, ax=sp.next_ax()) sp.label()
probplot(residuals, plot=sp.next_ax()) HTML(sp.exit())
Refer to the following screenshot for the end result:
The code is in the visualizing_goodness.ipynb
file in this book's code bundle.
probplot()
function documented at https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.probplot.html (retrieved November 2015)