The multiple regression model is a practical extension of the model we just observed. It allows us to build a model with several independent variables. The underlying model is
where
To estimate the values of these coefficients, a sample is taken and the following equation is developed:
where
Consider the case of Jenny Wilson Realty, a real estate company in Montgomery, Alabama. Jenny Wilson, owner and broker for this company, wants to develop a model to determine a suggested listing price for houses based on the size of the house and the age of the house. She selects a sample of houses that have sold recently in a particular area, and she records the selling price, the square footage of the house, the age of the house, and also the condition (good, excellent, or mint) of each house, as shown in Table 4.5. Initially Jenny plans to use only the square footage and age to develop a model, although she wants to save the information on condition of the house to use later. She wants to find the coefficients for the following multiple regression model:
SELLING PRICE ($) | SQUARE FOOTAGE | AGE | CONDITION |
---|---|---|---|
95,000 | 1,926 | 30 | Good |
119,000 | 2,069 | 40 | Excellent |
124,800 | 1,720 | 30 | Excellent |
135,000 | 1,396 | 15 | Good |
142,800 | 1,706 | 32 | Mint |
145,000 | 1,847 | 38 | Mint |
159,000 | 1,950 | 27 | Mint |
165,000 | 2,323 | 30 | Excellent |
182,000 | 2,285 | 26 | Mint |
183,000 | 3,752 | 35 | Good |
200,000 | 2,300 | 18 | Good |
211,000 | 2,525 | 17 | Good |
215,000 | 3,800 | 40 | Excellent |
219,000 | 1,740 | 12 | Mint |
where
The mathematics of multiple regression becomes quite complex, so we leave formulas for and to regression textbooks.3 Excel can be used to develop a multiple regression model just as it was used for a simple linear regression model. When entering the data in Excel, it is important that all of the independent variables are in adjoining columns to facilitate the input. From the Data tab in Excel, select Data Analysis and then Regression, as shown earlier, in Program 4.1A. This opens the Regression window to allow the input, as shown in Program 4.4A. Note that the X Range includes the data in two columns (B and C) because there are two independent variables. The Excel output that Jenny Wilson obtains is shown in Program 4.4B, and it provides the following equation:
A multiple regression model can be evaluated in a manner similar to the way a simple linear regression model is evaluated. Both the p-value for the F test and can be interpreted the same with multiple regression models as they are with simple linear regression models. However, as there is more than one independent variable, the hypothesis that is being tested with the F test is that all the coefficients are equal to 0. If all these are 0, then none of the independent variables in the model is helpful in predicting the dependent variable.
To determine which of the independent variables in a multiple regression model is significant, a significance test on the coefficient for each variable is performed. While statistics textbooks can provide the details of these tests, the results of these tests are automatically displayed in the Excel output. The null hypothesis is that the coefficient is , and the alternate hypothesis is that it is not 0 The test statistic is calculated in Excel, and the p-value is given. If the p-value is lower than the level of significance , then the null hypothesis is rejected, and it can be concluded that the variable is significant.
In the Jenny Wilson Realty example in Program 4.4B, the overall model is statistically significant and useful in predicting the selling price of the house because the p-value for the F test is 0.002. The value is 0.6719, so 67% of the variability in selling price for these houses can be explained by the regression model. However, there were two independent variables in the model–-square footage and age. It is possible that one of these is significant and the other is not. The F test simply indicates that the model as a whole is significant.
Two significance tests can be performed to determine if square footage or age (or both) is significant. In Program 4.4B, the results of two hypothesis tests are provided. The first test for variable (square footage) is
Using a 5% level of significance the null hypothesis is rejected because the p-value for this is 0.0013. Thus, square footage is helpful in predicting the price of a house.
Similarly, the variable (age) is tested using the Excel output, and the p-value is 0.0039. The null hypothesis is rejected because this is less than 0.05. Thus, age is also helpful in predicting the price of a house.