
GMT assumption 5 requires the residual covariance to take the shape , that is, a diagonal matrix with entries equal to the constant variance of the error term. Heteroskedasticity occurs when the residual variance is not constant but differs across observations. If the residual variance is positively correlated with an input variable, that is, when errors are larger for input values that are far from their mean, then OLS standard error estimates will be too low, and, consequently, the t-statistic will be inflated leading to false discoveries of relationships where none actually exist.

Diagnostics starts with a visual inspection of the residuals. Systematic patterns in the (supposedly random) residuals suggest statistical tests of the null hypothesis that errors are homoscedastic against various alternatives. These tests include the Breusch—Pagan and White tests.

There are several ways to correct OLS estimates for heteroskedasticity:

  • Robust standard errors (sometimes called white standard errors) take heteroskedasticity into account when computing the error variance using a so-called sandwich estimator.
  • Clustered standard errors assume that there are distinct groups in your data that are homoskedastic but the error variance differs between groups. These groups could be different asset classes or equities from different industries.

Several alternatives to OLS estimate the error covariance matrix using different assumptions when . The following are available in statsmodels:

  • Weighted least squares (WLS): For heteroskedastic errors where the covariance matrix has only diagonal entries as for OLS, but now the entries are allowed to vary
  • Feasible generalized least squares (GLSAR), for autocorrelated errors that follow an autoregressive AR (p) process (see the chapter on linear time series models)
  • Generalized least squares (GLSfor arbitrary covariance matrix structure; yields efficient and unbiased estimates in the presence of heteroskedasticity or serial correlation
