Chapter Notes

Key Terms

Note:

Starred (*) terms are from the optional sections in this chapter.

Key Symbols/Notation

y Dependent variable (variable to be predicted)
x Independent variable (variable used to predict)
E(y) Expected (mean) of y
β0 y-intercept of true line
β1 slope of true line
β^0 Least squared estimate of y-intercept
β^1 Least squares estimate of slope
ε Random error
y^ Predicted value of y for a given x-value
(yy^) Estimated error of prediction
SSE Sum of squared errors of prediction
r Coefficient of correlation
r2 Coefficient of determination
xp Value of x used to predict y
rs *Spearman’s rank correlation coefficient
di *Difference between ranks of ith observations for x and y

Key Ideas

Simple Linear Regression Variables

  • y=Dependent variable (quantitative)

  • x=Independent variable (quantitative)

Method of least squares properties

  1. average error of prediction=0

  2. sum of squared errors is minimum

First-order (straight-line) model

E(y)=β0+β1x

where E(y)= mean of y

  • β0 = y-intercept of line (point where line intercepts y-axis)

  • β1=slope of line (change in y for every one-unit change in x)

Practical interpretation of y-intercept

Predicted y-value when x=0

(no practical interpretation if x=0 is either nonsensical or outside range of sample data)

Practical interpretation of slope

Increase (or decrease) in y for every one-unit increase in x

Coefficient of correlation, r

  1. ranges between 1 and +1

  2. measures strength of linear relationship between y and x

Coefficient of determination, r2

  1. ranges between 0 and 1

  2. measures proportion of sample variation in y “explained” by the model.

Practical interpretation of model standard deviation s

Ninety-five percent of y-values fall within 2s of their respective predicted values

Comparing Intervals in Step 5

Width of confidence interval for E(y) will always be narrower than width of prediction interval for y.

Nonparametric Test for Rank Correlation

Spearman’s test

Key Formulas

  • β^1=SSxySSxx where SSxx=(xx¯)2andSSxy=(xx¯)(yy¯)β^0=y¯β^1 x¯SSE=(yy^)2=SSyyβ^1SSxy,whereSSyy=(yy¯)2s=SSEn2t (for testing H0: β1=0)=β^1sβ^1wheresβ^1=sSSxx(1α)100%CIforβ1:β^1±tα/2sβ^1r=SSxySSxxSSyyt (for testing H0: ρ=0)=rn21r2r2=SSyySSESSyy(1α) 100% CI for E(y):y^±tα/21n+(xpx¯)2SSxx(1α) 100% PI for y:y^±tα/21+1n+(xpx¯)2SSxxrs=16di2n(n21)

Guide to Simple Linear Regression

Supplementary Exercises 9.140–9.163

Understanding the Principles

  1. 9.140 Explain the difference between a probabilistic model and a deterministic model.

  2. 9.141 Give the general form of a straight-line model for E(y).

  3. 9.142 Outline the five steps in a simple linear regression analysis.

  4. 9.143 True or False. In simple linear regression, about 95% of the y-values in the sample will fall within 2s of their respective predicted values.

Learning the Mechanics

  1. 9.144 In fitting a least squares line to n=15 data points, the following quantities were computed: SSxx=55, SSyy=198,SSxy=88,x¯=1.3, and y=35.

    1. Find the least squares line.

    2. Graph the least squares line.

    3. Calculate SSE.

    4. Calculate s2.

    5. Find a 90% confidence interval for β1. Interpret this estimate.

    6. Find a 90% confidence interval for the mean value of y when x=15.

    7. Find a 90% prediction interval for y when x=15.

  2. L09145 9.145 Consider the following sample data:

    Alternate View
    y 5 1 3
    x 5 1 3
    1. Construct a scatterplot for the data.

    2. It is possible to find many lines for which Σ(yy^)=0. For this reason, the criterion Σ(yy^)=0 is not used to identify the “best-fitting” straight line. Find two lines that have Σ(yy^)=0.

    3. Find the least squares line.

    4. Compare the value of SSE for the least squares line with that of the two lines you found in part b. What principle of least squares is demonstrated by this comparison?

  3. L09146 9.146 Consider the following 10 data points.

    Alternate View
    x 3 5 6 4 3 7 6 5 4 7
    y 4 3 2 1 2 3 3 5 4 2
    1. Plot the data on a scatterplot.

    2. Calculate the values of r and r2.

    3. Is there sufficient evidence to indicate that x and y are linearly correlated? Test at the α=.10 level of significance.

Applying the Concepts—Basic

  1. 9.147 Wind turbine blade stress. Mechanical engineers at the University of Newcastle (Australia) investigated the use of timber in high-efficiency small wind turbine blades (Wind Engineering, Jan. 2004). The strengths of two types of timber—radiata pine and hoop pine—were compared. Twenty specimens (called “coupons”) of each timber blade were fatigue tested by measuring the stress (in MPa) on the blade after various numbers of blade cycles. A simple linear regression analysis of the data, one conducted for each type of timber, yielded the following results (where y=stress and x=natural logarithm of number of cycles):

    RadiataPine:y^=97.372.50x,r2=.84
    HoopPine:y^=122.032.36x,r2=.90
    1. Interpret the estimated slope of each line.

    2. Interpret the estimated y-intercept of each line.

    3. Interpret the value of r2 for each line.

    4. On the basis of these results, which type of timber blade appears to be stronger and more fatigue resistant? Explain.

  2. HOMES 9.148 Predicting sale prices of homes. Real-estate investors, home buyers, and homeowners often use the appraised value of property as a basis for predicting the sale of that property. Data on sale prices and total appraised value of 78 residential properties sold recently in an upscale Tampa, Florida, neighborhood named Hunter’s Green are saved in the HOMES file. Selected observations are listed in the table below.

    Property Sale Price Appraised Value
    1 $489,900 $418,601
    2 1,825,000 1,577,919
    3 890,000 687,836
    4 250,00 191,620
    5 1,275,000 1,063,901
    74 325,000 292,702
    75 516,000 407,449
    76 309,300 272,275
    77 370,000 347,320
    78 580,000 511,359

    Based on data from Hillsborough Country (Florida) Property Appraiser’s Office.

    1. Propose a straight-line model to relate the appraised property value (x) to the sale price (y) for residential properties in this neighborhood.

    2. A MINITAB scatterplot of the data with the least squared line is shown at the top of the printout on p. 643. Does it appear that a straight-line model will be an appropriate fit to the data?

    3. A MINITAB simple linear regression printout is also shown at the bottom of the printout on p. 643. Find the equation of the least squared line. Interpret the estimated slope and y-intercept in the words of the problem.

    4. Locate the test statistic and p-value for testing H0:β1=0 against Ha:β1>0. Is there sufficient evidence (at α=.01) of a positive linear relationship between apprised property value (x) and sale price (y)?

    5. Locate and interpret practically the values of r and r2 on the printout.

    6. Locate and interpret practically the 95% prediction interval for sale price (y) on the printout.

  3. 9.149 Sports news on local TV broadcasts. The Sports Journal (Winter 2004) published the results of a study conducted to assess the factors that affect the time allotted to sports news on local television news broadcasts. Information on total time (in minutes) allotted to sports and on audience ratings of the TV news broadcast (measured on a 100-point scale) was obtained from a national sample of 163 news directors. A correlation analysis of the data yielded r=.43.

    1. Interpret the value of the correlation coefficient r.

    2. Find and interpret the value of the coefficient of determination r2.

  4. SITIN 9.150 College protests of labor exploitation. Refer to the Journal of World-Systems Research (Winter 2004) study of student “sit-ins” for a “sweat-free campus” (p. 109). Data on the duration (in days) of each sit-in as well as the number of student arrests were measured. The data for 5 sit-ins in which there was at least one arrest are shown in the next table. Let y=number of arrests and x=duration.

    MINITAB output for Exercise 9.148

    Data for Exercise 9.150

    Alternate View
    Sit-In University Duration (days) Number of Arrests
    12 Wisconsin 4 54
    14 SUNY Albany 1 11
    15 Oregon 3 14
    17 Iowa 4 16
    18 Kentucky 1 12

    Based on Ross, R. J. S. “From antisweatshop to global justice to antiwar: How the new new left is the same and different from the old new left.” Journal of Word-Systems Research, Vol. X, No. 1, Winter 2004 (Tables 1 and 3).

    1. a. Give the equation of a straight-line model relating y to x.

    2. b. SPSS was used to fit the model to the data for the 5 sit-ins. The printout is shown on p. 568. Give the least squares prediction equation.

    3. c. Interpret the estimates of β0 and β1 in the context of the problem.

    4. d. Find and interpret the value of s on the printout.

    5. e. Find and interpret the value of r2 on the printout.

      SPSS Output for Exercise 9.150

    6. f. Conduct a test to determine whether number of arrests is positively linearly related to duration. (Use α=.10.)

    7. *g. Use a nonparametric test to determine if number of arrests is rank correlated with duration. Test using α=.10

  5. ALWINS 9.151 Baseball batting averages versus wins. Is the number of games won by a major league baseball team in a season related to the team’s batting average? Consider data from the Baseball Almanac on the number of games won and the batting averages for the 14 teams in the American League for the 2013 Major League Baseball season. The data are listed in the next table.

    Team Games Won Batting Avg. (average number of hits per 1,000 at bats)
    New York 85 .242
    Toronto 74 .252
    Baltimore 85 .260
    Boston 97 .277
    Tampa Bay 92 .257
    Cleveland 92 .255
    Detroit 93 .283
    Chicago 63 .249
    Kansas City 86 .260
    Minnesota 66 .242
    Los Angeles 78 .264
    Texas 91 .262
    Seattle 71 .237
    Oakland 96 .254

    Based on data from Baseball Almanac, 2013; www.mlb.com.

    1. a. If you were to model the relationship between the mean (or expected) number of games won by a major league team and the team’s batting average x, using a straight line, would you expect the slope of the line to be positive or negative? Explain.

    2. b. Construct a scatterplot of the data. Does the pattern revealed by the scatterplot agree with your answer to part a?

    3. c. A MINITAB printout of the simple linear regression is shown below. Find the estimates of the β's on the printout and write the equation of the least squares line.

    4. d. Graph the least squares line on your scatterplot. Does your least squares line seem to fit the points on your scatterplot?

    5. e. Interpret the estimates of β0 and β1 in the words of the problem.

    6. f. Conduct a test (at α=.05) to determine whether the mean (or expected) number of games won by a major league baseball team is positively linearly related to the team’s batting average.

    7. g. Find the coefficient of determination, r2, and interpret its value.

    8. h. Do you recommend using the model to predict the number of games won by a team during the 2013 season?

    9. *i. Conduct Spearman’s test for rank correlation. Use α=.05.

  6. RAIN 9.152 English as a second language reading ability. What are the factors that allow a native Spanish-speaking person to understand and read English? A study published in the Bilingual Research Journal (Summer 2006) investigated the relationship of Spanish (first-language) grammatical knowledge to English (second-language) reading. The study involved a sample of n=55 native Spanish-speaking adults who were students in an English as a second language college class. Each student took four standardized exams: Spanish grammar (SG), Spanish reading (SR), English grammar (EG), and English reading (ESLR). Simple linear regression was used to model the ESLR score (y) as a function of each of the other exam scores (x). The results are summarized in the next table.

    Independent variable (x) r2 p-value for testing  H0:β1=0
    SG score .002 .739
    SR score .099 .012
    EG score .078 .022
    1. At α=.05, is there sufficient evidence to indicate that ESLR score is linearly related to SG score?

    2. At α=.05, is there sufficient evidence to indicate that ESLR score is linearly related to SG score?

    3. At α=.05, is there sufficient evidence to indicate that ESLR score is linearly related to EG score?

    4. Practically interpret the r2 values.

  7. BREAM 9.153 Feeding habits of fish. Refer to the Brain and Behavior Evolution (Apr. 2000) study of the feeding behavior of black-bream fish, presented in Exercise 2.162 (p. 96). Recall that the zoologists recorded the number of aggressive strikes of two black-bream fish feeding at the bottom of an aquarium in the 10-minute period following the addition of food. The table listing the weekly number of strikes and the age of the fish (in days) is reproduced below.

    Week Number of Strikes Age of Fish (days)
    1 85 120
    2 63 136
    3 34 150
    4 39 155
    5 58 162
    6 35 169
    7 57 178
    8 12 184
    9 15 190

    Based on Shand, J., et al. “Variability in the location of the retinal ganglion cell area centralis is correlated with ontogenetic changes in feeding behavior in the Blackbream, Acanthopagrus ‘butcher’.” Brain and Behavior, Vol. 55, No. 4, Apr. 2000 (Figure H).

    1. a. Write the equation of a straight-line model relating number of strikes (y) to age of fish (x).

    2. b. Fit the model to the data by the method of least squares and give the least squares prediction equation.

    3. c. Give a practical interpretation of the value of β^0 if possible.

    4. d. Give a practical interpretation of the value of β^1 if possible.

    5. e. Test H0:β1=0 versus Ha:β1<0, using α=.10. Interpret the result.

    6. *f. Find Spearman’s rank correlation relating number of strikes (y) to age (x).

    7. *g. Test whether number of strikes (y) and age (x) are negatively rank correlated. Use α=.10.

Applying the Concepts—Intermediate

  1. 9.154 New method of estimating rainfall. Accurate measurements of rainfall are critical for many hydrological and meteorological projects. Two standard methods of monitoring rainfall use rain gauges and weather radar. Both, however, can be contaminated by human and environmental interference. In the Journal of Data Science (Apr. 2004), researchers employed artificial neural networks (i.e., computer-based mathematical models) to estimate rainfall at a meteorological station in Montreal. Rainfall estimates were made every 5 minutes over a 70-minute period by each of the three methods. The data (in millimeters) are listed in the table.

    Alternate View
    Time Radar Rain Gauge Neural Network
    8:00 a.m. 3.6 0 1.8
    8:05 2.0 1.2 1.8
    8:10 1.1 1.2 1.4
    8:15 1.3 1.3 1.9
    8:20 1.8 1.4 1.7
    8:25 2.1 1.4 1.5
    8:30 3.2 2.0 2.1
    8:35 2.7 2.1 1.0
    8:40 2.5 2.5 2.6
    8:45 3.5 2.9 2.6
    8:50 3.9 4.0 4.0
    8:55 3.5 4.9 3.4
    9:00 a.m. 6.5 6.2 6.2
    9:05 7.3 6.6 7.5
    9:10 6.4 7.8 7.2

    Based on Hessami, M., et al. “Selection of an artificial neural network model for the post-calibration of weather radar rainfall estimation.” Journal of Data Science, Vol. 2, No. 2, Apr. 2004. (Adapted from Figures 2 and 4.)

    1. Propose a straight-line model relating rain gauge amount (y) to weather radar rain estimate (x).

    2. Use the method of least squares to fit the model.

    3. Graph the least squares line on a scatterplot of the data. Is there visual evidence of a relationship between the two variables? Is the relationship positive or negative?

    4. Interpret the estimates of the y-intercept and slope in the words of the problem.

    5. Find and interpret the value of s for this regression.

    6. Test whether y is linearly related to x. Use α=.01.

    7. Construct a 99% confidence interval for β1. Interpret the result practically.

    8. Now consider a model relating rain gauge amount (y) to the artificial neural network rain estimate (x). Repeat parts a–g for this model.

  2. SMELT 9.155 Extending the life of an aluminum smelter pot. An investigation of the properties of bricks used to line aluminum smelter pots was published in The American Ceramic Society Bulletin (Feb. 2005). Six different commercial bricks were evaluated. The life span of a smelter pot depends on the porosity of the brick lining (the less porosity, the longer is the life); consequently, the researchers measured the apparent porosity of each brick specimen, as well as the mean pore diameter of each brick. The data are given in the table.

    Brick Apparent Porosity (%) Mean Pore Diameter (micrometers)
    A 18.8 12.0
    B 18.3 9.7
    C 16.3 7.3
    D 6.9 5.3
    E 17.1 10.9
    F 20.4 16.8

    Based on Bonadia, P., et al. “Aluminosilicate refractories for aluminum cell linings.” The American Ceramic Society Bulletin, Vol. 84, No. 2, Feb. 2005 (Table II).

    1. a. Find the least squares line relating porosity (y) to mean pore diameter (x).

    2. b. Interpret the y-intercept of the line.

    3. c. Interpret the slope of the line.

    4. d. Conduct a test of model adequacy. Use α=.10.

    5. e. Find r and r2 and interpret these values.

    6. f. Predict the apparent percentage of porosity for a brick with a mean pore diameter of 10 micrometers. Use a 90% prediction interval.

    7. *g. Apply Spearman’s test for rank correlation to the data. Use α=.10.

  3. 9.156 Relation of eye and head movements. How do eye and head movements relate to body movements when a person reacts to a visual stimulus? Scientists at the California Institute of Technology designed an experiment to answer this question and reported their results in Nature (Aug. 1998). Adult male rhesus monkeys were exposed to a visual stimulus (i.e., a panel of light-emitting diodes), and their eye, head, and body movements were electronically recorded. In one variation of the experiment, two variables were measured: active head movement (x, percent per degree) and body-­plus-head rotation (y, percent per degree). The data for n=39 trials were subjected to a simple linear regression analysis, with the following results: β^1=.88,sβ^1=.14

    1. Conduct a test to determine whether the two variables, active head movement x and body-plus-head rotation y, are positively linearly related. Use α=.05.

    2. Construct and interpret a 90% confidence interval for β1.

    3. The scientists want to know whether the true slope of the line differs significantly from 1. On the basis of your answer to part b, make the appropriate inference.

  4. CONDOR 9.157 Mortality of predatory birds. Two species of predatory birds—collard flycatchers and tits—compete for nest holes during breeding season on the island of Gotland, Sweden. Frequently, dead flycatchers are found in nest boxes occupied by tits. A field study examined whether the risk of mortality to flycatchers is related to the degree of competition between the two bird species for nest sites (The Condor, May 1995). The next table gives data on the number y of flycatchers killed at each of 14 discrete locations (plots) on the island, as well as on the nest box tit occupancy x (i.e., the percentage of nest boxes occupied by tits) at each plot. Consider the simple linear regression model E(y)=β0+β1x.

    Plot Number of Flycatchers Killed y Nest Box Tit Occupancy x (%)
    1 0 24
    2 0 33
    3 0 34
    4 0 43
    5 0 50
    6 1 35
    7 1 35
    8 1 38
    9 1 40
    10 2 31
    11 2 43
    12 3 55
    13 4 57
    14 5 64

    Based on Merila, J., and Wiggins, D. A. “Interspecific competition for nest holes causes adult mortality in the collard flycatcher.” The Condor, Vol. 97, No. 2, May 1995, p. 449 (Figure 2), Cooper Ornithological Society.

    1. Plot the data in a scatterplot. Does the frequency of flycatcher casualties per plot appear to increase linearly with increasing proportion of nest boxes occupied by tits?

    2. Use the method of least squares to find the estimates of β0 and β1. Interpret their values.

    3. Test the utility of the model, using α=.05.

    4. Find r and r2 and interpret their values.

    5. Find s and interpret the result.

    6. Do you recommend using the model to predict the number of flycatchers killed? Explain.

  5. 9.158 Winning marathon times. In Chance (Winter 2000), statistician Howard Wainer and two students compared men’s and women’s winning times in the Boston Marathon. One of the graphs used to illustrate gender differences is reproduced below. The scatterplot graphs the winning times (in minutes) against the year in which the race was run. Men’s times are represented by purple dots and women’s times by red dots.

    1. Consider only the winning times for men. Is there evidence of a linear trend? If so, propose a straight-line model for predicting winning time (y) based on year (x). Would you expect the slope of this line to be positive or negative?

    2. Repeat part b for women’s times.

    3. Which slope, the men’s or the women’s, will be greater in absolute value?

    4. Would you recommend using the straight-line models to predict the winning time in the 2020 Boston Marathon? Why or why not?

    5. Which model, the men’s or the women’s, is likely to have the smallest estimate of σ?

  6. HELIUM 9.159 Quantum tunneling. At temperatures approaching absolute zero (273°C), helium exhibits traits that seem to defy many laws of Newtonian physics. An experiment has been conducted with helium in solid form at various temperatures near absolute zero. The solid helium is placed in a dilution refrigerator along with a solid impure substance, and the fraction (in weight) of the impurity passing through the solid helium is recorded. (This phenomenon of solids passing directly through solids is known as quantum tunneling.) The data are given in the next table.

    Temperature x (°C) Proportion of Impurity
    262.0 .315
    265.0 .202
    256.0 .204
    267.0 .620
    270.0 .715
    272.0 .935
    272.4 .957
    272.7 .906
    272.8 .985
    272.9 .987
    1. Find the least squares estimates of the intercept and slope. Interpret them.

    2. Use a 95% confidence interval to estimate the slope β1. Interpret the interval in terms of this application. Does the interval support the hypothesis that temperature contributes information about the proportion of impurity passing through helium?

    3. Interpret the coefficient of determination for this model.

    4. Find a 95% prediction interval for the percentage of impurity passing through solid helium at 273°C. Interpret the result.

    5. Note that the value of x in part d is outside the experimental region. Why might this lead to an unreliable prediction?

  7. 9.160 Dance/movement therapy. In cotherapy, two or more therapists lead a group. An article in the American Journal of Dance Therapy (Spring/Summer 1995) examined the use of cotherapy in dance/movement therapy. Two of several variables measured on each of a sample of 136 professional dance/movement therapists were years x of formal training and reported success rate y (measured as a percentage) of coleading dance/movement therapy groups.

    1. Propose a linear model relating y to x.

    2. The researcher hypothesized that dance/movement therapists with more years in formal dance training will report higher perceived success rates in cotherapy relationships. State the hypothesis in terms of the parameter of the model you proposed in part a.

    3. The correlation coefficient for the sample data was reported as r=.26. Interpret this result.

    4. Does the value of r in part c support the hypothesis in part b? Test, using α=.05.

Applying the Concepts—Advanced

  1. FLOUR 9.161 Regression through the origin. Sometimes it is known from theoretical considerations that the straight-line relationship between two variables x and y passes through the origin of the xy-plane. Consider the relationship between the total weight y of a shipment of 50-pound bags of flour and the number x of bags in the shipment. Since a shipment containing x=0 bags (i.e., no shipment at all) has a total weight of y=0, a straight-line model of the relationship between x and y should pass through the point x=0,y=0. In such a case, you could assume that β0=0 and characterize the relationship between x and y with the following model:

    y=β1x+ε

    The least squares estimate of β1 for this model is

    β^1=ΣxiyiΣxi2

    From the records of past flour shipments, 15 shipments were randomly chosen and the data shown in the following table were recorded.

    Weight of Shipment Number of 50-Pound Bags in Shipment
    5,050 100
    10,249 205
    20,000 450
    7,420 150
    24,685 500
    10,206 200
    7,325 150
    4,958 100
    7,162 150
    24,000 500
    4,900 100
    14,501 300
    28,000 600
    17,002 400
    16,100 400
    1. Find the least squares line for the given data under the assumption that β0=0. Plot the least squares line on a scatterplot of the data.

    2. Find the least squares line for the given data, using the model

      y=β0+β1x+ε

      (i.e., do not restrict β0 to equal 0). Plot this line on the same scatterplot you constructed in part a.

    3. Refer to part b. Why might β^0 be different from 0 even though the true value of β0 is known to be 0?

    4. The estimated standard error of β^0 is equal to

      s1n+x¯2SSxx

      Use the t-statistic

      t=β^0 – 0s(1/n)+(x¯2/SSxx)

      to test the null hypothesis H0:β0=0 against the alternative Ha:β00. Take α=.10. Should you include β0 in your model?

  2. JUMP 9.162 Long-jump “takeoff error.” The long jump is a track- and-field event in which a competitor attempts to jump a maximum distance into a sandpit after a running start. At the edge of the pit is a takeoff board. Jumpers usually try to plant their toes at the front edge of this board to maximize their jumping distance. The absolute distance between the front edge of the takeoff board and the spot where the toe actually lands on the board prior to jumping is called “takeoff error.” Is takeoff error in the long jump linearly related to best jumping distance? To answer this question, kinesiology researchers videotaped the performances of 18 novice long jumpers at a high school track meet (Journal of Applied Biomechanics, May 1995). The average takeoff error x and the best jumping distance y (out of three jumps) for each jumper are recorded in the accompanying table. If a jumper can reduce his/her average takeoff error by .1 meter, how much would you estimate the jumper’s best jumping distance to change? On the basis of your answer, comment on the usefulness of the model for predicting best jumping distance.

    Jumper Best Jumping Distance y (meters) Average Takeoff Error x (meters)
    1 5.30 .09
    2 5.55 .17
    3 5.47 .19
    4 5.45 .24
    5 5.07 .16
    6 5.32 .22
    7 6.15 .09
    8 4.70 .12
    9 5.22 .09
    10 5.77 .09
    11 5.12 .13
    12 5.77 .16
    13 6.22 .03
    14 5.82 .50
    15 5.15 .13
    16 4.92 .04
    17 5.20 .07
    18 5.42 .04

    Based on Berg, W. P., and Greer, N. L. “A kinematic profile of the approach run of novice long jumpers.” Journal of Applied Biomechanics, Vol. 11, No. 2, May 1995, p. 147 (Table 1).

Critical Thinking Challenge

  1. BRICKS 9.163 Spall damage in bricks. A recent civil suit revolved around a five-building brick apartment complex located in the Bronx, New York, which began to suffer spalling damage (i.e., a separation of some portion of the face of a brick from its body). The owner of the complex alleged that the bricks were manufactured defectively. The brick manufacturer countered that poor design and shoddy management led to the damage. To settle the suit, an estimate of the rate of damage per 1,000 bricks, called the spall rate, was required (Chance, Summer 1994). The owner estimated the spall rate by using several scaffold-drop surveys. (With this method, an engineer lowers a scaffold down at selected places on building walls and counts the number of visible spalls for every 1,000 bricks in the observation area.) The brick manufacturer conducted its own survey by dividing the walls of the complex into 83 wall segments and taking a photograph of each one. (The number of spalled bricks that could be made out from each photo was recorded, and the sum over all 83 wall segments was used as an estimate of total spall damage.) In this court case, the jury was faced with the following dilemma: On the one hand, the scaffold-drop survey provided the most accurate estimate of spall rates in a given wall segment. Unfortunately, however, the drop areas were not selected at random from the entire complex; rather, drops were made at areas with high spall concentrations, leading to an overestimate of the total damage. On the other hand, the photo survey was complete in that all 83 wall segments in the complex were checked for spall damage. But the spall rate estimated by the photos, at least in areas of high spall concentration, was biased low (spalling damage cannot always be seen from a photo), leading to an underestimate of the total damage.

    The data in the table are the spall rates obtained from the two methods at 11 drop locations. Use the data, as did expert statisticians who testified in the case, to help the jury estimate the true spall rate at a given wall segment. Then explain how this information, coupled with the data (not given here) on all 83 wall segments, can provide a reasonable estimate of the total spall damage (i.e., total number of damaged bricks).

    Drop Location Drop Spall Rate (per 1,000 bricks) Photo Spall Rate (per 1,000 bricks)
    1 0 0
    2 5.1 0
    3 6.6 0
    4 1.1 .8
    5 1.8 1.0
    6 3.9 1.0
    7 11.5 1.9
    8 22.1 7.7
    9 39.3 14.9
    10 39.9 13.9
    11 43.0 11.8

    Based on Fairley, W. B., et al. “Bricks, buildings, and the Bronx: Estimating masonry deterioration.” Chance, Vol. 7. No. 3, Summer 1994, p. 36 (Figure 3).

    [Note: The data points are estimated from the points shown on a scatterplot.]

References

  • Chatterjee, S., and Price, B. Regression Analysis by Example, 2nd ed. New York: Wiley, 1991.

  • Conover, W. J. Practical Nonparametric Statistics, 2nd ed. New York: Wiley, 1980.

  • Daniel, W. W. Applied Nonparametric Statistics, 2nd ed. Boston: PWS-Kent, 1990.

  • Draper, N., and Smith, H. Applied Regression Analysis, 3rd ed. New York: Wiley, 1987.

  • Graybill, F. Theory and Application of the Linear Model. North Scituate, MA: Duxbury, 1976.

  • Kleinbaum, D., and Kupper, L. Applied Regression Analysis and Other Multivariable Methods, 2nd ed. North Scituate, MA: Duxbury, 1997.

  • Kutner, M., Nachtsheim, C., Neter, J., and Li, W. Applied Linear Statistical Models, 5th ed. New York: McGraw-Hill/Irwin, 2006.

  • Mendenhall, W. Introduction to Linear Models and the Design and Analysis of Experiments. Belmont, CA: Wadsworth, 1968.

  • Mendenhall, W., and Sincich, T. A Second Course in Statistics: Regression Analysis, 7th ed. Upper Saddle River, NJ: Prentice Hall, 2011.

  • Montgomery, D., Peck, E., and Vining, G. Introduction to Linear Regression Analysis, 3rd ed. New York: Wiley, 2001.

  • Mosteller, F., and Tukey, J. W. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley, 1977.

  • Rousseeuw, P. J., and Leroy, A. M. Robust Regression and Outlier Detection. New York: Wiley, 1987.

  • Weisburg, S. Applied Linear Regression, 2nd ed. New York: Wiley, 1985.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset