4.4 Assumptions of the Regression Model

If we can make certain assumptions about the errors in a regression model, we can perform statistical tests to determine if the model is useful. The following assumptions are made about the errors:

  1. The errors are independent.

  2. The errors are normally distributed.

  3. The errors have a mean of zero.

  4. The errors have a constant variance (regardless of the value of X).

It is possible to check the data to see if these assumptions are met. Often a plot of the residuals will highlight any glaring violations of the assumptions. When the errors (residuals) are plotted against the independent variable, the pattern should appear random.

Figure 4.4 presents some typical error patterns, with Figure 4.4A displaying a pattern that is expected when the assumptions are met and the model is appropriate. The errors are random and no discernible pattern is present. Figure 4.4B demonstrates an error pattern in which the errors increase as X increases, violating the constant variance assumption. Figure 4.4C shows errors consistently increasing at first and then consistently decreasing. A pattern such as this would indicate that the model is not linear and some other form (perhaps quadratic) should be used. In general, patterns in the plot of the errors indicate problems with the assumptions or the model specification.

Scatter diagram with X on the horizontal axis and Error on the y axis.

Figure 4.4A Pattern of Errors Indicating Randomness

Scatter diagram with X on the horizontal axis and Error on the y axis.

Figure 4.4B Nonconstant Error Variance

Scatter diagram with X on the horizontal axis and Error on the y axis.

Figure 4.4C Pattern of Errors Indicating Relationship Is Not Linear

Estimating the Variance

While the errors are assumed to have constant variance (σ2), this is usually not known. It can be estimated from the sample results. The estimate of σ2 is the mean squared error (MSE) and is denoted by s2. The MSE is the sum of squares due to error divided by the degrees of freedom:1

s2=MSE=SSEnk1
(4-12)

where

n=number of observations in the samplek=number of independent variables

In the Triple A Construction example, n=6 and k=1, so

s2=MSE=SSEnk1=6.8750611=6.87504=1.7188

From this, we can estimate the standard deviation as

s=MSE
(4-13)

This is called the standard error of the estimate or the standard deviation of the regression. In this example,

s=MSE=1.7188=1.31

This is used in many of the statistical tests about the model. It is also used to find interval estimates for both Y and regression coefficients.2

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset