9.7 A Complete Example

In the previous sections, we presented the basic elements necessary to fit and use a straight-line regression model. In this section, we will assemble these elements by applying them in an example with the aid of computer software.

Suppose a fire insurance company wants to relate the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. The study is to be conducted in a large suburb of a major city; a sample of 15 recent fires in this suburb is selected. The amount of damage, y, and the distance between the fire and the nearest fire station, x, are recorded for each fire. The results are given in Table 9.5 and saved in the FIREDAM file.

Table 9.5 Fire Damage Data

Distance from Fire Station, x (miles) Fire Damage y (thousands of dollars)
3.4 26.2
1.8 17.8
4.6 31.3
2.3 23.1
3.1 27.5
5.5 36.0
.7 14.1
3.0 22.3
2.6 19.6
4.3 31.3
2.1 24.0
1.1 17.3
6.1 43.2
4.8 36.4
3.8 26.1

Data Set: FIREDAM

  1. Step 1 First, we hypothesize a model to relate fire damage, y, to the distance from the nearest fire station, x. We hypothesize a straight-line probabilistic model:

    y=β0+β1x+ε
  2. Step 2 Next, we open the FIREDAM file and use statistical software to estimate the unknown parameters in the deterministic component of the hypothesized model. The SAS printout for the simple linear regression analysis is shown in Figure 9.25. The least squares estimate of the slope β1 and intercept β0, highlighted on the printout, are

    β^1=4.91933
    β^0=10.27793

    and the least squares equation is (rounded)

    y^=10.28+4.92x

    This prediction equation is graphed by MINITAB in Figure 9.26, along with a plot of the data points.

    Figure 9.25

    SAS printout for fire damage regression

    The least squares estimate of the slope, β^1=4.92, implies that the estimated mean damage increases by $4,920 for each additional mile from the fire station. This interpretation is valid over the range of x, or from .7 to 6.1 miles from the station. The estimated y-intercept, β^0=10.28, has the interpretation that a fire 0 miles from the fire station has an estimated mean damage of $10,280. Although this would seem to apply to the fire station itself, remember that the y-intercept is meaningfully interpretable only if x=0 is within the sampled range of the independent variable. Since x=0 is outside the range, β^0 has no practical interpretation.

    Figure 9.26

    MINITAB scatterplot with least squares line for fire damage regression analysis

  3. Step 3 Now we specify the probability distribution of the random-error component ε. The assumptions about the distribution are identical to those listed in Section 9.3. Although we know that these assumptions are not completely satisfied (they rarely are for practical problems), we are willing to assume that they are approximately satisfied for this example. The estimate of the standard deviation σ of ε, highlighted on the SAS printout, is

    s=2.31635

    This implies that most of the observed fire damage (y) values will fall within approximately 2s=4.64 thousand dollars of their respective predicted values when the least squares line is used.

  4. Step 4 We can now check the usefulness of the hypothesized model—in other words, whether x really contributes information for the prediction of y by the straight-line model. First, test the null hypothesis that the slope β1 is 0—that is, that there is no linear relationship between fire damage and the distance from the nearest fire station—against the alternative hypothesis that fire damage increases as the distance increases. We test

    H0:β1=0
    Ha:β1>0

    The two-tailed observed significance level for testing Ha:β10, highlighted on the SAS printout, is less than .0001. Thus, the p-value for our one-tailed test is less than half of this value (.00005). This small p-value leaves little doubt that mean fire damage and distance between the fire and the fire station are at least linearly related, with mean fire damage increasing as the distance increases.

    We gain additional information about the relationship by forming a confidence interval for the slope β1. A 95% confidence interval, highlighted on the SAS printout, is (4.071, 5.768). Thus, with 95% confidence, we estimate that the interval from $4,071 to $5,768 encloses the mean increase (β1) in fire damage per additional mile in distance from the fire station.

    Another measure of the utility of the model is the coefficient of determination, r2. The value (also highlighted on the printout) is r2=.9235, which implies that about 92% of the sample variation in fire damage (y) is explained by the distance (x) between the fire and the fire station.

    The coefficient of correlation, r, that measures the strength of the linear relationship between y and x is not shown on the SAS printout and must be calculated. Using the facts that r=r2 in simple linear regression and that r and β^1 have the same sign, we calculate

    r=+r2=.9235=.96

    The high correlation confirms our conclusion that β1 is greater than 0; it appears that fire damage and distance from the fire station are positively correlated. All signs point to a strong linear relationship between y and x.

  5. Step 5 We are now prepared to use the least squares model. Suppose the insurance company wants to predict the fire damage if a major residential fire were to occur 3.5 miles from the nearest fire station. The predicted value (highlighted at the bottom of the SAS printout) is y^=27.496, while the 95% prediction interval (also highlighted) is (22.324, 32.667). Therefore, with 95% confidence, we predict fire damage in a major residential fire 3.5 miles from the nearest station to be between $22,324 and $32,667.

Caution

We would not use this model to make predictions for homes less than .7 mile or more than 6.1 miles from the nearest fire station. A look at the data in Table 9.5 reveals that all the x-values fall between .7 and 6.1. It is dangerous to use the model to make predictions outside the region in which the sample data fall. A straight line might not provide a good model for the relationship between the mean value of y and the value of x when stretched over a wider range of x-values.

Exercises 9.120–9.123

Applying the Concepts—Intermediate

  1. PONG 9.120 Impact of dropping ping-pong balls. The impact of dropping hollow balls was investigated in the American Journal of Physics (Mar. 2014). Standard ping-pong balls were dropped vertically onto a force plate. Upon impact, two variables were measured: y=coefficient of restitution, COR (measured as a ratio of the speed at impact and rebound speed) and x=speed at impact (meters/second). Of the 19 balls dropped, 10 buckled at impact. The data (simulated from information provided in the article) are listed in the table.

    1. Conduct a complete simple linear regression analysis of the relationship between coefficient of restitution (y) and impact speed (x). Write all your conclusions in the words of the problem.

    2. The researcher believes that the rate of increase in the coefficient of restitution with impact speed differs depending on whether the ping-pong ball buckles. Do the data support this hypothesis? Explain.

      Alternate View
      Ball COR y Speed x Buckle
      1 .945 0.8 No
      2 .950 1.0 No
      3 .930 1.5 No
      4 .920 1.8 No
      5 .920 3.0 No
      6 .930 3.4 No
      7 .905 4.4 No
      8 .915 5.0 No
      9 .910 6.4 No
      10 .900 4.4 Yes
      11 .885 5.3 Yes
      12 .870 5.4 Yes
      13 .850 7.4 Yes
      14 .795 7.2 Yes
      15 .790 7.2 Yes
      16 .800 8.0 Yes
      17 .820 8.5 Yes
      18 .810 9.4 Yes
      19 .780 9.0 Yes
  2. DUST2 9.121 Thickness of dust on solar cells. The performance of a solar cell can deteriorate when atmospheric dust accumulates on the solar panel surface. In the International Journal of Energy and Environmental Engineering (Dec. 2012), researchers at the Renewable Energy Research Laboratory, University of Lucknow (India), estimated the relationship between the dust thickness and the efficiency of a solar cell. The thickness of dust (in millimeters) collected on a solar cell was measured three times per month over a year-long period. Each time the dust thickness was measured, the researchers also determined the percentage difference (before minus after dust collection) in efficiency of the solar panel. Data (monthly averages) for the 10 months when there was no rain are listed in the table.

    1. Fit the linear model, y=β0+β1x+ε, to the data where y=efficiency and x=average dust thickness.

    2. Evaluate the model statistically. Do you recommend using the model in practice?

    Month Efficiency (% change) Average Dust Thickness (mm)
    January 1.5666 0.00024
    February 1.9574 0.00105
    March 1.3707 0.00075
    April 1.9563 0.00070
    May 1.6332 0.00142
    June 1.8172 0.00055
    July 0.9202 0.00039
    October 1.8790 0.00095
    November 1.5544 0.00064
    December 2.0198 0.00065

    Source: Siddiqui, R., and Bajpai, U. “Correlation between thicknesses of dust collected on photovoltaic module and difference in efficiencies in composite climate,” International Journal of Energy and Environmental Engineering, Vol. 4, No. 1, Dec. 2012 (Table 1).

  3. GMAC 9.122 An MBA’s work-life balance. Many business schools offer courses that assist MBA students with developing good work-life balance habits and most large companies have developed work-life balance programs for their employees. The Graduate Management Admission Council (GMAC) conducted a survey of over 2,000 MBA alumni to explore the work-life balance issue. (For example, one question asked alumni to state their level of agreement with the statement, “My personal and work demands are overwhelming.”) Based on these responses, the GMAC determined a work-life balance scale score for each MBA alumni. Scores ranged from 0 to 100, with lower scores indicating a higher imbalance between work and life. Many other variables, including average number of hours worked per week, were also measured. The data for the work-life balance study are saved in the GMAC file. (The first 15 observations are listed in the accompanying table.) Let x=average number of hours worked per week and y=work-life balance scale score for each MBA alumnus. Investigate the link between these two variables by conducting a complete simple linear regression analysis of the data. Summarize your findings in a professional report.

    WLB Score Hours
    75.22 50
    64.98 45
    49.62 50
    44.51 55
    70.10 50
    54.74 60
    55.98 55
    21.24 60
    59.86 50
    70.10 50
    29.00 70
    64.98 45
    36.75 40
    35.45 40
    45.75 50

    Based on “Work-life balance: An MBA alumni report.” Graduate Management Admission Council (GMAC) Research Report (Oct. 13, 2005).

  4. LEGAL 9.123 Legal advertising—does it pay? To gain a competitive edge, lawyers aggressively advertise their services. Does legal advertising really pay? To partially answer this question, consider the case of an actual law firm that specializes in personal injury (PI) cases. The firm spends thousands of dollars each month on advertising. The accompanying table shows the firm’s new personal injury cases each month over a 42-month period. Also shown is the total expenditure on advertising each month, and over the previous 6 months. Do these data provide support for the hypothesis that increased advertising expenditures are associated with more personal injury cases? Conduct a complete simple linear regression analysis of the data, letting y=new PI cases and x=6-month cumulative advertising expenditure. Summarize your findings in a professional report.

    Month New PI Cases 6 Months Cumulative Adv. Exp.
    7 11 $ 41,632.74
    8 7 $ 38,227.39
    9 13 $ 39,779.77
    10 7 $ 37,490.22
    11 9 $ 52,225.71
    12 8 $ 56,249.15
    13 18 $ 59,938.03
    14 9 $ 65,250.59
    15 25 $ 66,071.85
    16 26 $ 81,765.94
    17 27 $ 66,895.46
    18 12 $ 71,426.16
    19 14 $ 75,346.40
    20 5 $ 81,589.97
    21 22 $ 78,828.68
    22 15 $ 78,415.73
    23 12 $ 90,802.77
    24 18 $ 95,689.44
    25 20 $ 83,099.55
    26 38 $ 82,703.75
    27 13 $ 90,484.38
    28 18 $102,084.54
    29 21 $ 84,976.99
    30 7 $ 95,314.29
    31 16 $115,858.28
    32 12 $108,557.00
    33 15 $127,693.57
    34 18 $122,761.67
    35 30 $123,545.67
    36 12 $119,388.26
    37 30 $134,675.68
    38 20 $133,812.93
    39 19 $142,417.13
    40 29 $149,956.61
    41 58 $165,204.46
    42 42 $156,725.72
    43 24 $146,397.56
    44 47 $197,792.64
    45 24 $198,460.28
    46 14 $206,662.87
    47 31 $253,011.27
    48 26 $249,496.28

    Source: Info Tech, Inc., Gainesville, Florida.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset