In any regression model, there is an implicit assumption (which can be tested) that a relationship exists between the variables. There is also some random error that cannot be predicted. The underlying simple linear regression model is
where
The true values for the intercept and slope are not known, and therefore they are estimated using sample data. The regression equation based on sample data is given as
where
In the Triple A Construction example, we are trying to predict the sales, so the dependent variable (Y) would be sales. The variable we use to help predict sales is the Albany area payroll, so this is the independent variable (X). Although any number of lines can be drawn through these points to show a relationship between X and Y in Figure 4.1, the line that will be chosen is the one that in some way minimizes the errors. Error is defined as
Since errors may be positive or negative, the average error could be zero even though there are extremely large errors—both positive and negative. To eliminate the difficulty of negative errors canceling positive errors, the errors can be squared. The best regression line will be defined as the one with the minimum sum of the squared errors. For this reason, regression analysis is sometimes called least squares regression.
Statisticians have developed formulas that we can use to find the equation of a straight line that would minimize the sum of the squared errors. The simple linear regression equation is
The following formulas can be used to compute the slope and the intercept:
The preliminary calculations are shown in Table 4.2. There are other “shortcut” formulas that are helpful when doing the computations on a calculator, and these are presented in Appendix 4.1. They will not be shown here, as computer software will be used for most of the other examples in this chapter.
Y | X | ||
---|---|---|---|
6 | 3 | ||
8 | 4 | ||
9 | 6 | ||
5 | 4 | ||
4.5 | 2 | ||
9.5 | 5 | ||
Computing the slope and the intercept of the regression equation for the Triple A Construction Company example, we have
The estimated regression equation therefore is
or
If the payroll next year is $600 million
or $950,000.
One of the purposes of regression is to understand the relationship among variables. This model tells us that each time the payroll increases by $100 million (represented by X), we would expect the sales to increase by $125,000, since