Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter Notes

Key Terms

Note:

Starred (*) terms are from the optional sections in this chapter.

Key Symbols/Notation

`y`	Dependent variable (variable to be predicted)
`x`	Independent variable (variable used to predict)
`E`(`y`)	Expected (mean) of `y`
$β_{0}$ $β_{0}$	`y`-intercept of true line
$β_{1}$ $β_{1}$	slope of true line
${\hat{β}}_{0}$ ${\hat{β}}_{0}$	Least squared estimate of `y`-intercept
${\hat{β}}_{1}$ ${\hat{β}}_{1}$	Least squares estimate of slope
$ε$ $ε$	Random error
$\hat{y}$ $\hat{y}$	Predicted value of `y` for a given `x`-value
$(y - \hat{y})$ $(y - \hat{y})$	Estimated error of prediction
SSE	Sum of squared errors of prediction
`r`	Coefficient of correlation
$r^{2}$ $r^{2}$	Coefficient of determination
$x_{p}$ $x_{p}$	Value of `x` used to predict `y`
$r_{s}$ $r_{s}$	*Spearman’s rank correlation coefficient
$d_{i}$ $d_{i}$	*Difference between ranks of `i`^th observations for `x` and `y`

Key Ideas

Simple Linear Regression Variables

$y = D e p e n d e n t variable (quantitative)$ $y = D e p e n d e n t variable (quantitative)$
$x = I n d e p e n d e n t variable (quantitative)$ $x = I n d e p e n d e n t variable (quantitative)$

Method of least squares properties

$average error of prediction = 0$ $average error of prediction = 0$
sum of squared errors is minimum

First-order (straight-line) model

E (y) = β_{0} + β_{1} x

$E (y) = β_{0} + β_{1} x$

where $E (y) =$ $E (y) =$ mean of y

$β_{0} = y - i n t e r c e p t of line (point where line intercepts y -axis)$ $β_{0} = y - i n t e r c e p t of line (point where line intercepts y -axis)$
$β_{1} = s l o p e of line (change in y for every one-unit change in x)$ $β_{1} = s l o p e of line (change in y for every one-unit change in x)$

Practical interpretation of `y`-intercept

Predicted y-value when $x = 0$ $x = 0$

(no practical interpretation if $x = 0$ $x = 0$ is either nonsensical or outside range of sample data)

Practical interpretation of slope

Increase (or decrease) in y for every one-unit increase in x

Coefficient of correlation, `r`

ranges between $- 1$ $- 1$ and $+ 1$ $+ 1$
measures strength of linear relationship between y and x

Coefficient of determination, $r^{2}$ $r^{2}$

ranges between 0 and 1
measures proportion of sample variation in y “explained” by the model.

Practical interpretation of model standard deviation `s`

Ninety-five percent of y-values fall within 2s of their respective predicted values

Comparing Intervals in Step 5

Width of confidence interval for E(y) will always be narrower than width of prediction interval for y.

Nonparametric Test for Rank Correlation

Spearman’s test

Key Formulas

$\begin{array}{l} \begin{array}{l} {\hat{β}}_{1} = \frac{{SS}_{x y}}{{SS}_{x x}} where {SS}_{x x} & = & \sum^{} {(x - \bar{x})}^{2} and \\ {SS}_{x y} & = & \sum^{} (x - \bar{x}) (y - \bar{y}) \end{array} \\ {\hat{β}}_{0} = \bar{y} - {\hat{β}}_{1} \bar{x} \\ SSE = \sum^{} {(y - \hat{y})}^{2} = {SS}_{y y} - {\hat{β}}_{1} {SS}_{x y}, where {SS}_{y y} = \sum^{} {(y - \bar{y})}^{2} \\ s = \sqrt{\frac{SSE}{n - 2}} \\ t (for testing H_{0} : β_{1} = 0) = \frac{{\hat{β}}_{1}}{s_{{\hat{β}}_{1}}} where s_{{\hat{β}}_{1}} = \frac{s}{\sqrt{{SS}_{x x}}} \\ (1 - α) 100 % CI for β_{1} : {\hat{β}}_{1} \pm t_{α / 2} s_{{\hat{β}}_{1}} \\ r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}} \\ t (for testing H_{0} : ρ = 0) = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}} \\ r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} \\ (1 - α) 100 % CI for E (y) : \hat{y} \pm t_{α / 2} \sqrt{\frac{1}{n} + \frac{{(x_{p} - \bar{x})}^{2}}{{SS}_{x x}}} \\ (1 - α) 100 % PI for y : \hat{y} \pm t_{α / 2} \sqrt{1 + \frac{1}{n} + \frac{{(x_{p} - \bar{x})}^{2}}{{SS}_{x x}}} \\ r_{s} = 1 - \frac{6 \sum^{} d_{i}^{2}}{n (n^{2} - 1)} \end{array}$ $\begin{array}{l} \begin{array}{l} {\hat{β}}_{1} = \frac{{SS}_{x y}}{{SS}_{x x}} where {SS}_{x x} & = & \sum^{} {(x - \bar{x})}^{2} and \\ {SS}_{x y} & = & \sum^{} (x - \bar{x}) (y - \bar{y}) \end{array} \\ {\hat{β}}_{0} = \bar{y} - {\hat{β}}_{1} \bar{x} \\ SSE = \sum^{} {(y - \hat{y})}^{2} = {SS}_{y y} - {\hat{β}}_{1} {SS}_{x y}, where {SS}_{y y} = \sum^{} {(y - \bar{y})}^{2} \\ s = \sqrt{\frac{SSE}{n - 2}} \\ t (for testing H_{0} : β_{1} = 0) = \frac{{\hat{β}}_{1}}{s_{{\hat{β}}_{1}}} where s_{{\hat{β}}_{1}} = \frac{s}{\sqrt{{SS}_{x x}}} \\ (1 - α) 100 % CI for β_{1} : {\hat{β}}_{1} \pm t_{α / 2} s_{{\hat{β}}_{1}} \\ r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}} \\ t (for testing H_{0} : ρ = 0) = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}} \\ r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} \\ (1 - α) 100 % CI for E (y) : \hat{y} \pm t_{α / 2} \sqrt{\frac{1}{n} + \frac{{(x_{p} - \bar{x})}^{2}}{{SS}_{x x}}} \\ (1 - α) 100 % PI for y : \hat{y} \pm t_{α / 2} \sqrt{1 + \frac{1}{n} + \frac{{(x_{p} - \bar{x})}^{2}}{{SS}_{x x}}} \\ r_{s} = 1 - \frac{6 \sum^{} d_{i}^{2}}{n (n^{2} - 1)} \end{array}$

Guide to Simple Linear Regression

Supplementary Exercises 9.140–9.163

Understanding the Principles

9.140 Explain the difference between a probabilistic model and a deterministic model.
9.141 Give the general form of a straight-line model for E(y).
9.142 Outline the five steps in a simple linear regression analysis.
9.143 True or False. In simple linear regression, about 95% of the y-values in the sample will fall within 2s of their respective predicted values.

Learning the Mechanics

9.144 In fitting a least squares line to $n = 15$ $n = 15$ data points, the following quantities were computed: ${SS}_{x x} = 55,$ ${SS}_{x x} = 55,$ ${SS}_{y y} = 198, {SS}_{x y} = - 88, \bar{x} = 1.3,$ ${SS}_{y y} = 198, {SS}_{x y} = - 88, \bar{x} = 1.3,$ and $\overline{y} = 35.$ $\overline{y} = 35.$
1. Find the least squares line.
2. Graph the least squares line.
3. Calculate SSE.
4. Calculate $s^{2} .$ $s^{2} .$
5. Find a 90% confidence interval for $β_{1} .$ $β_{1} .$ Interpret this estimate.
6. Find a 90% confidence interval for the mean value of y when $x = 15.$ $x = 15.$
7. Find a 90% prediction interval for y when $x = 15.$ $x = 15.$
L09145 9.145 Consider the following sample data:

Alternate View

y 5 1 3

x 5 1 3
1. Construct a scatterplot for the data.
2. It is possible to find many lines for which $Σ (y - \hat{y}) = 0.$ $Σ (y - \hat{y}) = 0.$ For this reason, the criterion $Σ (y - \hat{y}) = 0$ $Σ (y - \hat{y}) = 0$ is not used to identify the “best-fitting” straight line. Find two lines that have $Σ (y - \hat{y}) = 0.$ $Σ (y - \hat{y}) = 0.$
3. Find the least squares line.
4. Compare the value of SSE for the least squares line with that of the two lines you found in part b. What principle of least squares is demonstrated by this comparison?
L09146 9.146 Consider the following 10 data points.

Alternate View

x 3 5 6 4 3 7 6 5 4 7

y 4 3 2 1 2 3 3 5 4 2
1. Plot the data on a scatterplot.
2. Calculate the values of r and $r^{2} .$ $r^{2} .$
3. Is there sufficient evidence to indicate that x and y are linearly correlated? Test at the $α = .10$ $α = .10$ level of significance.

Applying the Concepts—Basic

9.147 Wind turbine blade stress. Mechanical engineers at the University of Newcastle (Australia) investigated the use of timber in high-efficiency small wind turbine blades (Wind Engineering, Jan. 2004). The strengths of two types of timber—radiata pine and hoop pine—were compared. Twenty specimens (called “coupons”) of each timber blade were fatigue tested by measuring the stress (in MPa) on the blade after various numbers of blade cycles. A simple linear regression analysis of the data, one conducted for each type of timber, yielded the following results (where $y = s t r e s s$ $y = s t r e s s$ and $x = n a t u r a l$ $x = n a t u r a l$ logarithm of number of cycles):

$\begin{aligned} R a d i a t a P i n e : \hat{y} = 97.37 - 2.50 x, r^{2} = .84 \end{aligned}$ $\begin{aligned} R a d i a t a P i n e : \hat{y} = 97.37 - 2.50 x, r^{2} = .84 \end{aligned}$

$\begin{aligned} H o o p P i n e : \hat{y} = 122.03 - 2.36 x, r^{2} = .90 \end{aligned}$ $\begin{aligned} H o o p P i n e : \hat{y} = 122.03 - 2.36 x, r^{2} = .90 \end{aligned}$
1. Interpret the estimated slope of each line.
2. Interpret the estimated y-intercept of each line.
3. Interpret the value of r² for each line.
4. On the basis of these results, which type of timber blade appears to be stronger and more fatigue resistant? Explain.

HOMES 9.148 Predicting sale prices of homes. Real-estate investors, home buyers, and homeowners often use the appraised value of property as a basis for predicting the sale of that property. Data on sale prices and total appraised value of 78 residential properties sold recently in an upscale Tampa, Florida, neighborhood named Hunter’s Green are saved in the HOMES file. Selected observations are listed in the table below.

Property	Sale Price	Appraised Value
1	$489,900	$418,601
2	1,825,000	1,577,919
3	890,000	687,836
4	250,00	191,620
5	1,275,000	1,063,901
$⋮$ $⋮$	$⋮$ $⋮$	$⋮$ $⋮$
74	325,000	292,702
75	516,000	407,449
76	309,300	272,275
77	370,000	347,320
78	580,000	511,359

Based on data from Hillsborough Country (Florida) Property Appraiser’s Office.

Propose a straight-line model to relate the appraised property value (x) to the sale price (y) for residential properties in this neighborhood.
A MINITAB scatterplot of the data with the least squared line is shown at the top of the printout on p. 643. Does it appear that a straight-line model will be an appropriate fit to the data?
A MINITAB simple linear regression printout is also shown at the bottom of the printout on p. 643. Find the equation of the least squared line. Interpret the estimated slope and y-intercept in the words of the problem.
Locate the test statistic and p-value for testing $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ against $H_{a} : β_{1} > 0.$ $H_{a} : β_{1} > 0.$ Is there sufficient evidence (at $α = .01$ $α = .01$ ) of a positive linear relationship between apprised property value (x) and sale price (y)?
Locate and interpret practically the values of r and $r^{2}$ $r^{2}$ on the printout.
Locate and interpret practically the 95% prediction interval for sale price (y) on the printout.

9.149 Sports news on local TV broadcasts. The Sports Journal (Winter 2004) published the results of a study conducted to assess the factors that affect the time allotted to sports news on local television news broadcasts. Information on total time (in minutes) allotted to sports and on audience ratings of the TV news broadcast (measured on a 100-point scale) was obtained from a national sample of 163 news directors. A correlation analysis of the data yielded $r = .43 .$ $r = .43 .$
1. Interpret the value of the correlation coefficient r.
2. Find and interpret the value of the coefficient of determination $r^{2} .$ $r^{2} .$
SITIN 9.150 College protests of labor exploitation. Refer to the Journal of World-Systems Research (Winter 2004) study of student “sit-ins” for a “sweat-free campus” (p. 109). Data on the duration (in days) of each sit-in as well as the number of student arrests were measured. The data for 5 sit-ins in which there was at least one arrest are shown in the next table. Let $y = n u m b e r$ $y = n u m b e r$ of arrests and $x = d u r a t i o n .$ $x = d u r a t i o n .$

MINITAB output for Exercise 9.148

Data for Exercise 9.150

Alternate View

Sit-In University Duration (days) Number of Arrests

12 Wisconsin 4 54

14 SUNY Albany 1 11

15 Oregon 3 14

17 Iowa 4 16

18 Kentucky 1 12

Based on Ross, R. J. S. “From antisweatshop to global justice to antiwar: How the new new left is the same and different from the old new left.” Journal of Word-Systems Research, Vol. X, No. 1, Winter 2004 (Tables 1 and 3).
1. a. Give the equation of a straight-line model relating y to x.
2. b. SPSS was used to fit the model to the data for the 5 sit-ins. The printout is shown on p. 568. Give the least squares prediction equation.
3. c. Interpret the estimates of $β_{0}$ $β_{0}$ and $β_{1}$ $β_{1}$ in the context of the problem.
4. d. Find and interpret the value of s on the printout.
5. e. Find and interpret the value of r² on the printout.
  
  SPSS Output for Exercise 9.150
6. f. Conduct a test to determine whether number of arrests is positively linearly related to duration. (Use $α = .10$ $α = .10$ .)
7. *g. Use a nonparametric test to determine if number of arrests is rank correlated with duration. Test using $α = .10$ $α = .10$

Sit-In	University	Duration (days)	Number of Arrests
12	Wisconsin	4	54
14	SUNY Albany	1	11
15	Oregon	3	14
17	Iowa	4	16
18	Kentucky	1	12

ALWINS 9.151 Baseball batting averages versus wins. Is the number of games won by a major league baseball team in a season related to the team’s batting average? Consider data from the Baseball Almanac on the number of games won and the batting averages for the 14 teams in the American League for the 2013 Major League Baseball season. The data are listed in the next table.

Team	Games Won	Batting Avg. (average number of hits per 1,000 at bats)
New York	85	.242
Toronto	74	.252
Baltimore	85	.260
Boston	97	.277
Tampa Bay	92	.257
Cleveland	92	.255
Detroit	93	.283
Chicago	63	.249
Kansas City	86	.260
Minnesota	66	.242
Los Angeles	78	.264
Texas	91	.262
Seattle	71	.237
Oakland	96	.254

Based on data from Baseball Almanac, 2013; www.mlb.com.

a. If you were to model the relationship between the mean (or expected) number of games won by a major league team and the team’s batting average x, using a straight line, would you expect the slope of the line to be positive or negative? Explain.
b. Construct a scatterplot of the data. Does the pattern revealed by the scatterplot agree with your answer to part a?
c. A MINITAB printout of the simple linear regression is shown below. Find the estimates of the $β' s$ $β' s$ on the printout and write the equation of the least squares line.
d. Graph the least squares line on your scatterplot. Does your least squares line seem to fit the points on your scatterplot?
e. Interpret the estimates of $β_{0}$ $β_{0}$ and $β_{1}$ $β_{1}$ in the words of the problem.
f. Conduct a test (at $α = .05$ $α = .05$ ) to determine whether the mean (or expected) number of games won by a major league baseball team is positively linearly related to the team’s batting average.
g. Find the coefficient of determination, $r^{2},$ $r^{2},$ and interpret its value.
h. Do you recommend using the model to predict the number of games won by a team during the 2013 season?
*i. Conduct Spearman’s test for rank correlation. Use $α = .05 .$ $α = .05 .$

RAIN 9.152 English as a second language reading ability. What are the factors that allow a native Spanish-speaking person to understand and read English? A study published in the Bilingual Research Journal (Summer 2006) investigated the relationship of Spanish (first-language) grammatical knowledge to English (second-language) reading. The study involved a sample of $n = 55$ $n = 55$ native Spanish-speaking adults who were students in an English as a second language college class. Each student took four standardized exams: Spanish grammar (SG), Spanish reading (SR), English grammar (EG), and English reading (ESLR). Simple linear regression was used to model the ESLR score (y) as a function of each of the other exam scores (x). The results are summarized in the next table.

Independent variable (`x`)	`r`²	`p`-value for testing $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$
SG score	.002	.739
SR score	.099	.012
EG score	.078	.022

At $α = .05$ $α = .05$ , is there sufficient evidence to indicate that ESLR score is linearly related to SG score?
At $α = .05$ $α = .05$ , is there sufficient evidence to indicate that ESLR score is linearly related to SG score?
At $α = .05$ $α = .05$ , is there sufficient evidence to indicate that ESLR score is linearly related to EG score?
Practically interpret the r² values.

BREAM 9.153 Feeding habits of fish. Refer to the Brain and Behavior Evolution (Apr. 2000) study of the feeding behavior of black-bream fish, presented in Exercise 2.162 (p. 96). Recall that the zoologists recorded the number of aggressive strikes of two black-bream fish feeding at the bottom of an aquarium in the 10-minute period following the addition of food. The table listing the weekly number of strikes and the age of the fish (in days) is reproduced below.

Week Number of Strikes Age of Fish (days)

1 85 120

2 63 136

3 34 150

4 39 155

5 58 162

6 35 169

7 57 178

8 12 184

9 15 190

Based on Shand, J., et al. “Variability in the location of the retinal ganglion cell area centralis is correlated with ontogenetic changes in feeding behavior in the Blackbream, Acanthopagrus ‘butcher’.” Brain and Behavior, Vol. 55, No. 4, Apr. 2000 (Figure H).
1. a. Write the equation of a straight-line model relating number of strikes (y) to age of fish (x).
2. b. Fit the model to the data by the method of least squares and give the least squares prediction equation.
3. c. Give a practical interpretation of the value of ${\hat{β}}_{0}$ ${\hat{β}}_{0}$ if possible.
4. d. Give a practical interpretation of the value of ${\hat{β}}_{1}$ ${\hat{β}}_{1}$ if possible.
5. e. Test $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ versus $H_{a} : β_{1} < 0,$ $H_{a} : β_{1} < 0,$ using $α = .10 .$ $α = .10 .$ Interpret the result.
6. *f. Find Spearman’s rank correlation relating number of strikes (y) to age (x).
7. *g. Test whether number of strikes (y) and age (x) are negatively rank correlated. Use $α = .10 .$ $α = .10 .$

Week	Number of Strikes	Age of Fish (days)
1	85	120
2	63	136
3	34	150
4	39	155
5	58	162
6	35	169
7	57	178
8	12	184
9	15	190

Applying the Concepts—Intermediate

9.154 New method of estimating rainfall. Accurate measurements of rainfall are critical for many hydrological and meteorological projects. Two standard methods of monitoring rainfall use rain gauges and weather radar. Both, however, can be contaminated by human and environmental interference. In the Journal of Data Science (Apr. 2004), researchers employed artificial neural networks (i.e., computer-based mathematical models) to estimate rainfall at a meteorological station in Montreal. Rainfall estimates were made every 5 minutes over a 70-minute period by each of the three methods. The data (in millimeters) are listed in the table.

Alternate View

Time Radar Rain Gauge Neural Network

8:00 a.m. 3.6 0 1.8

8:05 2.0 1.2 1.8

8:10 1.1 1.2 1.4

8:15 1.3 1.3 1.9

8:20 1.8 1.4 1.7

8:25 2.1 1.4 1.5

8:30 3.2 2.0 2.1

8:35 2.7 2.1 1.0

8:40 2.5 2.5 2.6

8:45 3.5 2.9 2.6

8:50 3.9 4.0 4.0

8:55 3.5 4.9 3.4

9:00 a.m. 6.5 6.2 6.2

9:05 7.3 6.6 7.5

9:10 6.4 7.8 7.2

Based on Hessami, M., et al. “Selection of an artificial neural network model for the post-calibration of weather radar rainfall estimation.” Journal of Data Science, Vol. 2, No. 2, Apr. 2004. (Adapted from Figures 2 and 4.)

Time	Radar	Rain Gauge	Neural Network
8:00 a.m.	3.6	0	1.8
8:05	2.0	1.2	1.8
8:10	1.1	1.2	1.4
8:15	1.3	1.3	1.9
8:20	1.8	1.4	1.7
8:25	2.1	1.4	1.5
8:30	3.2	2.0	2.1
8:35	2.7	2.1	1.0
8:40	2.5	2.5	2.6
8:45	3.5	2.9	2.6
8:50	3.9	4.0	4.0
8:55	3.5	4.9	3.4
9:00 a.m.	6.5	6.2	6.2
9:05	7.3	6.6	7.5
9:10	6.4	7.8	7.2

Propose a straight-line model relating rain gauge amount (y) to weather radar rain estimate (x).
Use the method of least squares to fit the model.
Graph the least squares line on a scatterplot of the data. Is there visual evidence of a relationship between the two variables? Is the relationship positive or negative?
Interpret the estimates of the y-intercept and slope in the words of the problem.
Find and interpret the value of s for this regression.
Test whether y is linearly related to x. Use $α = .01$ $α = .01$ .
Construct a 99% confidence interval for $β_{1}$ $β_{1}$ . Interpret the result practically.
Now consider a model relating rain gauge amount (y) to the artificial neural network rain estimate (x). Repeat parts a–g for this model.

SMELT 9.155 Extending the life of an aluminum smelter pot. An investigation of the properties of bricks used to line aluminum smelter pots was published in The American Ceramic Society Bulletin (Feb. 2005). Six different commercial bricks were evaluated. The life span of a smelter pot depends on the porosity of the brick lining (the less porosity, the longer is the life); consequently, the researchers measured the apparent porosity of each brick specimen, as well as the mean pore diameter of each brick. The data are given in the table.

Brick Apparent Porosity (%) Mean Pore Diameter (micrometers)

A 18.8 12.0

B 18.3 9.7

C 16.3 7.3

D 6.9 5.3

E 17.1 10.9

F 20.4 16.8

Based on Bonadia, P., et al. “Aluminosilicate refractories for aluminum cell linings.” The American Ceramic Society Bulletin, Vol. 84, No. 2, Feb. 2005 (Table II).
1. a. Find the least squares line relating porosity (y) to mean pore diameter (x).
2. b. Interpret the y-intercept of the line.
3. c. Interpret the slope of the line.
4. d. Conduct a test of model adequacy. Use $α = .10$ $α = .10$ .
5. e. Find r and r² and interpret these values.
6. f. Predict the apparent percentage of porosity for a brick with a mean pore diameter of 10 micrometers. Use a 90% prediction interval.
7. *g. Apply Spearman’s test for rank correlation to the data. Use $α = .10 .$ $α = .10 .$
9.156 Relation of eye and head movements. How do eye and head movements relate to body movements when a person reacts to a visual stimulus? Scientists at the California Institute of Technology designed an experiment to answer this question and reported their results in Nature (Aug. 1998). Adult male rhesus monkeys were exposed to a visual stimulus (i.e., a panel of light-emitting diodes), and their eye, head, and body movements were electronically recorded. In one variation of the experiment, two variables were measured: active head movement (x, percent per degree) and body-plus-head rotation (y, percent per degree). The data for $n = 39$ $n = 39$ trials were subjected to a simple linear regression analysis, with the following results: ${\hat{β}}_{1} = .88, s_{{\hat{β}}_{1}} = .14$ ${\hat{β}}_{1} = .88, s_{{\hat{β}}_{1}} = .14$
1. Conduct a test to determine whether the two variables, active head movement x and body-plus-head rotation y, are positively linearly related. Use $α = .05 .$ $α = .05 .$
2. Construct and interpret a 90% confidence interval for $β_{1} .$ $β_{1} .$
3. The scientists want to know whether the true slope of the line differs significantly from 1. On the basis of your answer to part b, make the appropriate inference.

Brick	Apparent Porosity (%)	Mean Pore Diameter (micrometers)
A	18.8	12.0
B	18.3	9.7
C	16.3	7.3
D	6.9	5.3
E	17.1	10.9
F	20.4	16.8

CONDOR 9.157 Mortality of predatory birds. Two species of predatory birds—collard flycatchers and tits—compete for nest holes during breeding season on the island of Gotland, Sweden. Frequently, dead flycatchers are found in nest boxes occupied by tits. A field study examined whether the risk of mortality to flycatchers is related to the degree of competition between the two bird species for nest sites (The Condor, May 1995). The next table gives data on the number y of flycatchers killed at each of 14 discrete locations (plots) on the island, as well as on the nest box tit occupancy x (i.e., the percentage of nest boxes occupied by tits) at each plot. Consider the simple linear regression model $E (y) = β_{0} + β_{1} x .$ $E (y) = β_{0} + β_{1} x .$

Plot	Number of Flycatchers Killed `y`	Nest Box Tit Occupancy `x` (%)
1	0	24
2	0	33
3	0	34
4	0	43
5	0	50
6	1	35
7	1	35
8	1	38
9	1	40
10	2	31
11	2	43
12	3	55
13	4	57
14	5	64

Based on Merila, J., and Wiggins, D. A. “Interspecific competition for nest holes causes adult mortality in the collard flycatcher.” The Condor, Vol. 97, No. 2, May 1995, p. 449 (Figure 2), Cooper Ornithological Society.

Plot the data in a scatterplot. Does the frequency of flycatcher casualties per plot appear to increase linearly with increasing proportion of nest boxes occupied by tits?
Use the method of least squares to find the estimates of $β_{0}$ $β_{0}$ and $β_{1}$ $β_{1}$ . Interpret their values.
Test the utility of the model, using $α = .05 .$ $α = .05 .$
Find r and $r^{2}$ $r^{2}$ and interpret their values.
Find s and interpret the result.
Do you recommend using the model to predict the number of flycatchers killed? Explain.

9.158 Winning marathon times. In Chance (Winter 2000), statistician Howard Wainer and two students compared men’s and women’s winning times in the Boston Marathon. One of the graphs used to illustrate gender differences is reproduced below. The scatterplot graphs the winning times (in minutes) against the year in which the race was run. Men’s times are represented by purple dots and women’s times by red dots.
1. Consider only the winning times for men. Is there evidence of a linear trend? If so, propose a straight-line model for predicting winning time (y) based on year (x). Would you expect the slope of this line to be positive or negative?
2. Repeat part b for women’s times.
3. Which slope, the men’s or the women’s, will be greater in absolute value?
4. Would you recommend using the straight-line models to predict the winning time in the 2020 Boston Marathon? Why or why not?
5. Which model, the men’s or the women’s, is likely to have the smallest estimate of $σ$ $σ$ ?

HELIUM 9.159 Quantum tunneling. At temperatures approaching absolute zero $(- 273 ° C),$ $(- 273 ° C),$ helium exhibits traits that seem to defy many laws of Newtonian physics. An experiment has been conducted with helium in solid form at various temperatures near absolute zero. The solid helium is placed in a dilution refrigerator along with a solid impure substance, and the fraction (in weight) of the impurity passing through the solid helium is recorded. (This phenomenon of solids passing directly through solids is known as quantum tunneling.) The data are given in the next table.

Temperature `x` (°C)	Proportion of Impurity
$- 262.0$ $- 262.0$	.315
$- 265.0$ $- 265.0$	.202
$- 256.0$ $- 256.0$	.204
$- 267.0$ $- 267.0$	.620
$- 270.0$ $- 270.0$	.715
$- 272.0$ $- 272.0$	.935
$- 272.4$ $- 272.4$	.957
$- 272.7$ $- 272.7$	.906
$- 272.8$ $- 272.8$	.985
$- 272.9$ $- 272.9$	.987

Find the least squares estimates of the intercept and slope. Interpret them.
Use a 95% confidence interval to estimate the slope $β_{1} .$ $β_{1} .$ Interpret the interval in terms of this application. Does the interval support the hypothesis that temperature contributes information about the proportion of impurity passing through helium?
Interpret the coefficient of determination for this model.
Find a 95% prediction interval for the percentage of impurity passing through solid helium at $- 273 ° C .$ $- 273 ° C .$ Interpret the result.
Note that the value of x in part d is outside the experimental region. Why might this lead to an unreliable prediction?

9.160 Dance/movement therapy. In cotherapy, two or more therapists lead a group. An article in the American Journal of Dance Therapy (Spring/Summer 1995) examined the use of cotherapy in dance/movement therapy. Two of several variables measured on each of a sample of 136 professional dance/movement therapists were years x of formal training and reported success rate y (measured as a percentage) of coleading dance/movement therapy groups.
1. Propose a linear model relating y to x.
2. The researcher hypothesized that dance/movement therapists with more years in formal dance training will report higher perceived success rates in cotherapy relationships. State the hypothesis in terms of the parameter of the model you proposed in part a.
3. The correlation coefficient for the sample data was reported as $r = - .26 .$ $r = - .26 .$ Interpret this result.
4. Does the value of r in part c support the hypothesis in part b? Test, using $α = .05 .$ $α = .05 .$

Applying the Concepts—Advanced

FLOUR 9.161 Regression through the origin. Sometimes it is known from theoretical considerations that the straight-line relationship between two variables x and y passes through the origin of the xy-plane. Consider the relationship between the total weight y of a shipment of 50-pound bags of flour and the number x of bags in the shipment. Since a shipment containing $x = 0$ $x = 0$ bags (i.e., no shipment at all) has a total weight of $y = 0,$ $y = 0,$ a straight-line model of the relationship between x and y should pass through the point $x = 0, y = 0.$ $x = 0, y = 0.$ In such a case, you could assume that $β_{0} = 0$ $β_{0} = 0$ and characterize the relationship between x and y with the following model:

y = β_{1} x + ε

$y = β_{1} x + ε$

The least squares estimate of $β_{1}$ $β_{1}$ for this model is

{\hat{β}}_{1} = \frac{Σ x_{i} y_{i}}{Σ x_{i}^{2}}

${\hat{β}}_{1} = \frac{Σ x_{i} y_{i}}{Σ x_{i}^{2}}$

From the records of past flour shipments, 15 shipments were randomly chosen and the data shown in the following table were recorded.

Weight of Shipment	Number of 50-Pound Bags in Shipment
5,050	100
10,249	205
20,000	450
7,420	150
24,685	500
10,206	200
7,325	150
4,958	100
7,162	150
24,000	500
4,900	100
14,501	300
28,000	600
17,002	400
16,100	400

Find the least squares line for the given data under the assumption that $β_{0} = 0.$ $β_{0} = 0.$ Plot the least squares line on a scatterplot of the data.
Find the least squares line for the given data, using the model

$y = β_{0} + β_{1} x + ε$ $y = β_{0} + β_{1} x + ε$

(i.e., do not restrict $β_{0}$ $β_{0}$ to equal 0). Plot this line on the same scatterplot you constructed in part a.
Refer to part b. Why might ${\hat{β}}_{0}$ ${\hat{β}}_{0}$ be different from 0 even though the true value of $β_{0}$ $β_{0}$ is known to be 0?
The estimated standard error of ${\hat{β}}_{0}$ ${\hat{β}}_{0}$ is equal to

$s \sqrt{\frac{1}{n} + \frac{{\bar{x}}^{2}}{{SS}_{x}_{x}}}$ $s \sqrt{\frac{1}{n} + \frac{{\bar{x}}^{2}}{{SS}_{x}_{x}}}$

Use the t-statistic

$t = \frac{{\hat{β}}_{0} – 0}{s \sqrt{(1 / n) + ({\bar{x}}^{2} / {SS}_{x x})}}$ $t = \frac{{\hat{β}}_{0} – 0}{s \sqrt{(1 / n) + ({\bar{x}}^{2} / {SS}_{x x})}}$

to test the null hypothesis $H_{0} : β_{0} = 0$ $H_{0} : β_{0} = 0$ against the alternative $H_{a} : β_{0} \neq 0.$ $H_{a} : β_{0} \neq 0.$ Take $α = .10 .$ $α = .10 .$ Should you include $β_{0}$ $β_{0}$ in your model?

JUMP 9.162 Long-jump “takeoff error.” The long jump is a track- and-field event in which a competitor attempts to jump a maximum distance into a sandpit after a running start. At the edge of the pit is a takeoff board. Jumpers usually try to plant their toes at the front edge of this board to maximize their jumping distance. The absolute distance between the front edge of the takeoff board and the spot where the toe actually lands on the board prior to jumping is called “takeoff error.” Is takeoff error in the long jump linearly related to best jumping distance? To answer this question, kinesiology researchers videotaped the performances of 18 novice long jumpers at a high school track meet (Journal of Applied Biomechanics, May 1995). The average takeoff error x and the best jumping distance y (out of three jumps) for each jumper are recorded in the accompanying table. If a jumper can reduce his/her average takeoff error by .1 meter, how much would you estimate the jumper’s best jumping distance to change? On the basis of your answer, comment on the usefulness of the model for predicting best jumping distance.

Jumper	Best Jumping Distance `y` (meters)	Average Takeoff Error `x` (meters)
1	5.30	.09
2	5.55	.17
3	5.47	.19
4	5.45	.24
5	5.07	.16
6	5.32	.22
7	6.15	.09
8	4.70	.12
9	5.22	.09
10	5.77	.09
11	5.12	.13
12	5.77	.16
13	6.22	.03
14	5.82	.50
15	5.15	.13
16	4.92	.04
17	5.20	.07
18	5.42	.04

Based on Berg, W. P., and Greer, N. L. “A kinematic profile of the approach run of novice long jumpers.” Journal of Applied Biomechanics, Vol. 11, No. 2, May 1995, p. 147 (Table 1).

Critical Thinking Challenge

BRICKS 9.163 Spall damage in bricks. A recent civil suit revolved around a five-building brick apartment complex located in the Bronx, New York, which began to suffer spalling damage (i.e., a separation of some portion of the face of a brick from its body). The owner of the complex alleged that the bricks were manufactured defectively. The brick manufacturer countered that poor design and shoddy management led to the damage. To settle the suit, an estimate of the rate of damage per 1,000 bricks, called the spall rate, was required (Chance, Summer 1994). The owner estimated the spall rate by using several scaffold-drop surveys. (With this method, an engineer lowers a scaffold down at selected places on building walls and counts the number of visible spalls for every 1,000 bricks in the observation area.) The brick manufacturer conducted its own survey by dividing the walls of the complex into 83 wall segments and taking a photograph of each one. (The number of spalled bricks that could be made out from each photo was recorded, and the sum over all 83 wall segments was used as an estimate of total spall damage.) In this court case, the jury was faced with the following dilemma: On the one hand, the scaffold-drop survey provided the most accurate estimate of spall rates in a given wall segment. Unfortunately, however, the drop areas were not selected at random from the entire complex; rather, drops were made at areas with high spall concentrations, leading to an overestimate of the total damage. On the other hand, the photo survey was complete in that all 83 wall segments in the complex were checked for spall damage. But the spall rate estimated by the photos, at least in areas of high spall concentration, was biased low (spalling damage cannot always be seen from a photo), leading to an underestimate of the total damage.

The data in the table are the spall rates obtained from the two methods at 11 drop locations. Use the data, as did expert statisticians who testified in the case, to help the jury estimate the true spall rate at a given wall segment. Then explain how this information, coupled with the data (not given here) on all 83 wall segments, can provide a reasonable estimate of the total spall damage (i.e., total number of damaged bricks).

Drop Location	Drop Spall Rate (per 1,000 bricks)	Photo Spall Rate (per 1,000 bricks)
1	0	0
2	5.1	0
3	6.6	0
4	1.1	.8
5	1.8	1.0
6	3.9	1.0
7	11.5	1.9
8	22.1	7.7
9	39.3	14.9
10	39.9	13.9
11	43.0	11.8

Based on Fairley, W. B., et al. “Bricks, buildings, and the Bronx: Estimating masonry deterioration.” Chance, Vol. 7. No. 3, Summer 1994, p. 36 (Figure 3).

[Note: The data points are estimated from the points shown on a scatterplot.]

Activity Applying Simple Linear Regression to Your Favorite Data

Many dependent variables in all areas of research serve as the subjects of regression-modeling efforts. We list five such variables here:

Crime rate in various communities
Daily maximum temperature in your town
Grade point average of students who have completed one academic year at your college
Gross domestic product of the United States
Points scored by your favorite football team in a single game

Choose one of these dependent variables, or choose some other dependent variable, for which you want to construct a prediction model. There may be a large number of independent variables that should be included in a prediction equation for the dependent variable you choose. List three potentially important independent variables, $x_{1}, x_{2},$ $x_{1}, x_{2},$ and $x_{3},$ $x_{3},$ that you think might be (individually) strongly related to your dependent variable. Next, obtain 10 data values, each of which consists of a measure of your dependent variable y and the corresponding values of $x_{1}, x_{2},$ $x_{1}, x_{2},$ and $x_{3} .$ $x_{3} .$

Use the least squares formulas given in this chapter to fit three straight-line models—one for each independent variable—for predicting y.
Interpret the sign of the estimated slope coefficient ${\hat{β}}_{1}$ ${\hat{β}}_{1}$ in each case, and test the utility of each model by testing $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ against $H_{a} : β_{1} \neq 0.$ $H_{a} : β_{1} \neq 0.$ What assumptions must be satisfied to ensure the validity of these tests?
Calculate the coefficient of determination, $r^{2},$ $r^{2},$ for each model. Which of the independent variables predicts y best for the 10 sampled sets of data? Is this variable necessarily best in general (i.e., for the entire population)? Explain.

Be sure to keep the data and the results of your calculations, since you will need them for the Activity section in Chapter 12.

References

Chatterjee, S., and Price, B. Regression Analysis by Example, 2nd ed. New York: Wiley, 1991.
Conover, W. J. Practical Nonparametric Statistics, 2nd ed. New York: Wiley, 1980.
Daniel, W. W. Applied Nonparametric Statistics, 2nd ed. Boston: PWS-Kent, 1990.
Draper, N., and Smith, H. Applied Regression Analysis, 3rd ed. New York: Wiley, 1987.
Graybill, F. Theory and Application of the Linear Model. North Scituate, MA: Duxbury, 1976.
Kleinbaum, D., and Kupper, L. Applied Regression Analysis and Other Multivariable Methods, 2nd ed. North Scituate, MA: Duxbury, 1997.
Kutner, M., Nachtsheim, C., Neter, J., and Li, W. Applied Linear Statistical Models, 5th ed. New York: McGraw-Hill/Irwin, 2006.
Mendenhall, W. Introduction to Linear Models and the Design and Analysis of Experiments. Belmont, CA: Wadsworth, 1968.
Mendenhall, W., and Sincich, T. A Second Course in Statistics: Regression Analysis, 7th ed. Upper Saddle River, NJ: Prentice Hall, 2011.
Montgomery, D., Peck, E., and Vining, G. Introduction to Linear Regression Analysis, 3rd ed. New York: Wiley, 2001.
Mosteller, F., and Tukey, J. W. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley, 1977.
Rousseeuw, P. J., and Leroy, A. M. Robust Regression and Outlier Detection. New York: Wiley, 1987.
Weisburg, S. Applied Linear Regression, 2nd ed. New York: Wiley, 1985.

Using Technology MINITAB: Simple Linear Regression

Simple Linear Regression Analysis

Step 1 Access the MINITAB worksheet file that contains the two quantitative variables (dependent and independent variables).
Step 2 Click on the “Stat” button on the MINITAB menu bar, and then click on “Regression” and “Regression” again, as shown in Figure 9.M.1.

Figure 9.M.1

MINITAB menu options for regression
Step 3 On the resulting dialog box (see Figure 9.M.2), specify the dependent variable in the “Response” box and the independent variable in the “Predictors” box.

Figure 9.M.2

MINITAB regression dialog box
Step 4 To produce prediction intervals for y and confidence intervals for E(y), click the “Options” button. The resulting dialog box is shown in Figure 9.M.3.

Figure 9.M.3

MINITAB regression options
Step 5 Check “Confidence limits” and/or “Prediction limits,” specify the “Confidence level,” and enter the value of x in the “Prediction intervals for new observations” box.
Step 6 Click “OK” to return to the main Regression dialog box and then click “OK” again to produce the MINITAB simple linear regression printout.

Correlation Analysis

Step 1 Click on the “Stat” button on the MINITAB main menu bar, then click on “Basic Statistics,” and then click on “Correlation,” as shown in Figure 9.M.4.

Figure 9.M.4

MINITAB menu options for correlation
Step 2 On the resulting dialog box (see Figure 9.M.5), enter the two variables of interest in the “Variables” box.

Figure 9.M.5

MINITAB correlation dialog box
Step 3 Click “OK” to obtain a printout of the correlation.

Rank Correlation

Step 1 To obtain Spearman’s rank correlation coefficient in MINITAB, you must first rank the values of the two quantitative variables of interest. Click the “Calc” button on the MINITAB menu bar and create two additional columns, one for the ranks of the x-variable and one for the ranks of the y-variable. (Use the “Rank” function on the MINITAB calculator as shown in Figure 9.M.6.)

Figure 9.M.6

MINITAB calculator menu screen
Step 2 Click on the “Stat” button on the main menu bar, then click on “Basic Statistics” and “Correlation.”
Step 3 On the resulting dialog box (see Figure 9.M.7), enter the ranked variables in the “Variables” box and unselect the “Display p-values” option.

Figure 9.M.7

MINITAB Correlation dialog box
Step 4 Click “OK” to obtain the MINITAB printout. (You will need to look up the critical value of Spearman’s rank correlation to conduct the test.)

TI-83/TI-84 Plus Graphing Calculator: Simple Linear Regression

Finding the Least Squares Regression Equation

Step 1 Enter the data
- Press STAT and select 1:Edit
  
  Note: If a list already contains data, clear the old data.
- Use the up arrow to highlight the list name, “L1” or “L2”
- Press CLEAR ENTER
- Enter your x-data in L1 and your y-data in L2
Step 2 Find the equation
- Press STAT and highlight CALC
- Press 4 for $L i n R e g (a x + b)$ $L i n R e g (a x + b)$
- Press ENTER
- The screen will show the values for a and b in the equation $y = a x + b$ $y = a x + b$

Finding `r` and `r`²

Use this procedure if r and r² do not already appear on the LinReg screen from part I:

Step 1 Turn the diagnostics feature on
- Press 2nd 0 for CATALOG
- Press the ALPHA key and $x^{- 1}$ $x^{- 1}$ for D
- Press the down ARROW until DiagnosticsOn is highlighted
- Press ENTER twice
Step 2 Find the regression equation as shown in part I on p. 651

The values for r and r² will appear on the screen as well.

Graphing the Least Squares Line with the Scatterplot

Step 1 Enter the data as shown in part I on p. 651
Step 2 Set up the data plot
- Press $Y = and C L E A R$ $Y = and C L E A R$ all functions from the Y register
- Press $2 n d Y = for S T A T P L O T$ $2 n d Y = for S T A T P L O T$
- Press 1 for Plot1
- Set the cursor so that ON is flashing and press ENTER
- For Type, use the ARROW and ENTER keys to highlight and select the scatterplot (first icon in the first row)
- For Xlist, choose the column containing the x-data
- For Ylist, choose the column containing the y-data
Step 3 Find the regression equation and store the equation in Y1
- Press STAT and highlight CALC
- Press 4 for $L i n R e g (a x + b)$ $L i n R e g (a x + b)$ (Note: Don’t press ENTER here because you want to store the regression equation in Y1.)
- Press VARS
- Use the right arrow to highlight Y-VARS
- Press ENTER to select 1:Function
- Press ENTER to select 1:Y1
- Press ENTER
Step 4 View the scatterplot and regression line
- Press ZOOM and then press 9 to select 9:ZoomStat
You should see the data graphed along with the regression line.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter Notes

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter Notes