Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9.5 The Coefficients of Correlation and Determination

In this section, we present two statistics that describe the adequacy of a model: the coefficient of correlation and the coefficient of determination.

Coefficient of Correlation

Recall (from optional Section 2.8) that a bivariate relationship describes a relationship—or correlation—between two variables x and y. Scatterplots are used to describe a bivariate relationship graphically. In this section, we will discuss the concept of correlation and how it can be used to measure the linear relationship between two variables x and y. A numerical descriptive measure of correlation is provided by the coefficient of correlation, r.

The coefficient of correlation,^*

r, is a measure of the strength of the linear relationship between two variables x and y. It is computed (for a sample of n measurements on x and y) as follows:

*The two tests are equivalent in simple linear regression only.

r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}}

$r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}}$

where

\begin{array}{l} {SS}_{x y} & = & \sum (x - \bar{x}) (y - \bar{y}) = \sum x y - \frac{(\sum x) (\sum y)}{n} \\ {SS}_{x x} & = & \sum {(x - \bar{x})}^{2} = \sum x^{2} - \frac{{(\sum x)}^{2}}{n} \\ {SS}_{y y} & = & \sum {(y - \bar{y})}^{2} = \sum y^{2} - \frac{{(\sum y)}^{2}}{n} \end{array}

$\begin{array}{l} {SS}_{x y} & = & \sum (x - \bar{x}) (y - \bar{y}) = \sum x y - \frac{(\sum x) (\sum y)}{n} \\ {SS}_{x x} & = & \sum {(x - \bar{x})}^{2} = \sum x^{2} - \frac{{(\sum x)}^{2}}{n} \\ {SS}_{y y} & = & \sum {(y - \bar{y})}^{2} = \sum y^{2} - \frac{{(\sum y)}^{2}}{n} \end{array}$

Note that the computational formula for the correlation coefficient r given above involves the same quantities that were used in computing the least squares prediction equation. In fact, since the numerators of the expressions for ${\hat{β}}_{1}$ ${\hat{β}}_{1}$ and r are identical, it is clear that $r = 0$ $r = 0$ when ${\hat{β}}_{1} = 0$ ${\hat{β}}_{1} = 0$ (the case where x contributes no information for the prediction of y) and that r is positive when the slope is positive and negative when the slope is negative. Unlike ${\hat{β}}_{1},$ ${\hat{β}}_{1},$ the correlation coefficient r is scaleless and assumes a value between $- 1$ $- 1$ and $+ 1,$ $+ 1,$ regardless of the units of x and y.

A value of r near or equal to 0 implies little or no linear relationship between y and x. In contrast, the closer r comes to 1 or $- 1,$ $- 1,$ the stronger is the linear relationship between y and x. And if $r = 1$ $r = 1$ or $r = - 1,$ $r = - 1,$ all the sample points fall exactly on the least squares line. Positive values of r imply a positive linear relationship between y and x; that is, y increases as x increases. Negative values of r imply a negative linear relationship between y and x; that is, y decreases as x increases. Each of these situations is portrayed in Figure 9.16.

Now Work Exercise 9.79

We use the data in Table 9.1 for the drug reaction example to demonstrate how to calculate the coefficient of correlation, r. The quantities needed to calculate r are ${SS}_{x y}, {SS}_{x x},$ ${SS}_{x y}, {SS}_{x x},$ and ${SS}_{y y} .$ ${SS}_{y y} .$ The first two quantities have been calculated previously and are ${SS}_{x y} = 7$ ${SS}_{x y} = 7$ and ${SS}_{x x} = 10.$ ${SS}_{x x} = 10.$ The calculation for ${SS}_{y y} = \sum {(y - \bar{y})}^{2}$ ${SS}_{y y} = \sum {(y - \bar{y})}^{2}$ is shown in the last column of the Excel spreadsheet, Figure 9.5 (p. 508). The result is ${SS}_{y y} = 6.$ ${SS}_{y y} = 6.$

We now find the coefficient of correlation:

r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}} = \frac{7}{\sqrt{(10) (6)}} = \frac{7}{\sqrt{60}} = .904

$r = \frac{{SS}_{x y}}{\sqrt{{SS}_{x x} {SS}_{y y}}} = \frac{7}{\sqrt{(10) (6)}} = \frac{7}{\sqrt{60}} = .904$

The fact that r is positive and near 1 indicates that the reaction time tends to increase as the amount of drug in the bloodstream increases—for the given sample of five subjects. This is the same conclusion we reached when we found the calculated value of the least squares slope to be positive.

Example 9.5 Using the Correlation Coefficient—Relating Crime Rate and Casino Employment

Problem

Legalized gambling is available on several riverboat casinos operated by a city in Mississippi. The mayor of the city wants to know the correlation between the number of casino employees and the yearly crime rate. The records for the past 10 years are examined, and the results listed in Table 9.3 are obtained. Calculate the coefficient of correlation, r, for the data. Interpret the result.

Table 9.3 Data on Casino Employees and Crime Rate, Example 9.5

Year	Number `x` of Casino Employees (thousands)	Crime Rate `y` (number of crimes per 1,000 population)
2006	15	1.35
2007	18	1.63
2008	24	2.33
2009	22	2.41
2010	25	2.63
2011	29	2.93
2012	30	3.41
2013	32	3.26
2014	35	3.63
2015	38	4.15

Data Set: CASINO

Solution

Rather than use the computing formula given earlier, we resort to a statistical software package. The data of Table 9.3 were entered into a computer and MINITAB was used to compute r. The MINITAB printout is shown in Figure 9.17.

Figure 9.17

MINITAB correlation printout and scatterplot for Example 9.5

Ethics in Statistics

Intentionally using the correlation coefficient only to make an inference about the relationship between two variables in situations where a nonlinear relationship may exist is considered unethical statistical practice.

The coefficient of correlation, highlighted at the top of the printout, is $r = .987 .$ $r = .987 .$ Thus, the size of the casino workforce and crime rate in this city are very highly correlated—at least over the past 10 years. The implication is that a strong positive linear relationship exists between these variables. (See Figure 9.17.) We must be careful, however, not to jump to any unwarranted conclusions. For instance, the mayor may be tempted to conclude that hiring more casino workers next year will increase the crime rate—that is, that there is a causal relationship between the two variables. However, high correlation does not imply causality. The fact is, many things have probably contributed both to the increase in the casino workforce and to the increase in crime rate. The city’s tourist trade has undoubtedly grown since riverboat casinos were legalized, and it is likely that the casinos have expanded both in services offered and in number. We cannot infer a causal relationship on the basis of high sample correlation. When a high correlation is observed in the sample data, the only safe conclusion is that a linear trend may exist between x and y.

Look Back

Another variable, such as the increase in tourism, may be the underlying cause of the high correlation between x and y.

Now Work Exercise 9.85a

Caution

Two caveats apply in using the sample correlation coefficient r to infer the nature of the relationship between x and y: (1) A high correlation does not necessarily imply that a causal relationship exists between x and y—only that a linear trend may exist; (2) a low correlation does not necessarily imply that x and y are unrelated—only that x and y are not strongly linearly related.

Keep in mind that the correlation coefficient r measures the linear correlation between x values and y values in the sample, and a similar linear coefficient of correlation exists for the population from which the data points were selected. The population correlation coefficient is denoted by the symbol $ρ$ $ρ$ (rho). As you might expect, $ρ$ $ρ$ is estimated by the corresponding sample statistic r. Or, instead of estimating $ρ,$ $ρ,$ we might want to test the null hypothesis $H_{0} : ρ = 0$ $H_{0} : ρ = 0$ against $H_{a} : ρ \neq 0$ $H_{a} : ρ \neq 0$ ; that is, we can test the hypothesis that x contributes no information for the prediction of y by using the straight-line model against the alternative that the two variables are at least linearly related.

However, we already performed this identical test in Section 9.4 when we tested $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ against $H_{a} : β_{1} \neq 0.$ $H_{a} : β_{1} \neq 0.$ That is, the null hypothesis $H_{0} : ρ = 0$ $H_{0} : ρ = 0$ is equivalent to the hypothesis $H_{0} : β_{1} = 0.$ $H_{0} : β_{1} = 0.$ * When we tested the null hypothesis $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ in connection with the drug reaction example, the data led to a rejection of the null hypothesis at the $α = .05$ $α = .05$ level. This rejection implies that the null hypothesis of a 0 linear correlation between the two variables (drug and reaction time) can also be rejected at the $α = .05$ $α = .05$ level. The only real difference between the least squares slope ${\hat{β}}_{1}$ ${\hat{β}}_{1}$ and the coefficient of correlation, r, is the measurement scale. Therefore, the information they provide about the usefulness of the least squares model is to some extent redundant. For this reason, we will use the slope to make inferences about the existence of a positive or negative linear relationship between two variables.

For the sake of completeness, a summary of the test for linear correlation is provided in the following boxes.

A Test for Linear Correlation

T e s t s t a t i s t i c : t_{c} = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}} = \frac{{\hat{β}}_{1}}{s {\hat{β}}_{1}}

$T e s t s t a t i s t i c : t_{c} = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}} = \frac{{\hat{β}}_{1}}{s {\hat{β}}_{1}}$

One-Tailed Tests
$H_{0} : ρ = 0$ $H_{0} : ρ = 0$ $H_{0} : ρ = 0$ $H_{0} : ρ = 0$
$H_{a} : ρ > 0$ $H_{a} : ρ > 0$ $H_{a} : ρ < 0$ $H_{a} : ρ < 0$
Two-Tailed Test
$H_{0} : ρ = 0$ $H_{0} : ρ = 0$
$H_{a} : ρ \neq 0$ $H_{a} : ρ \neq 0$

Rejection region: $t_{c} > t_{α}$ $t_{c} > t_{α}$ $t_{c} < - t_{α}$ $t_{c} < - t_{α}$ $| t_{c} | > t_{α / 2}$ $| t_{c} | > t_{α / 2}$

`p`-value: $P (t > t_{c})$ $P (t > t_{c})$ $P (t < t_{c})$ $P (t < t_{c})$		$2 \cdot P (t > t_{c})$ $2 \cdot P (t > t_{c})$ if t_c is positive
		$2 \cdot P (t < t_{c})$ $2 \cdot P (t < t_{c})$ if t_c is negative

where the distribution of t depends on $(n - 2)$ $(n - 2)$ df.

Condition Required for a Valid Test of Correlation

The sample of (x, y) values is randomly selected from a normal population.

Coefficient of Determination

Another way to measure the usefulness of a linear model is to measure the contribution of x in predicting y. To accomplish this, we calculate how much the errors of prediction of y were reduced by using the information provided by x. To illustrate, consider the sample shown in the scatterplot of Figure 9.18a. If we assume that x contributes no information for the prediction of y, the best prediction for a value of y is the sample mean $\overline{y},$ $\overline{y},$ which is shown as the horizontal line in Figure 9.18b. The vertical line segments in Figure 9.18b are the deviations of the points about the mean $\overline{y} .$ $\overline{y} .$ Note that the sum of the squares of the deviations for the prediction equation $\hat{y} = \overline{y}$ $\hat{y} = \overline{y}$ is

{SS}_{y y} = \sum {(y_{i} - \bar{y})}^{2}

${SS}_{y y} = \sum {(y_{i} - \bar{y})}^{2}$

Now suppose you fit a least squares line to the same set of data and locate the deviations of the points about the line, as shown in Figure 9.18c. Compare the deviations about the prediction lines in Figures 9.18b and 9.18c You can see that

If x contributes little or no information for the prediction of y, the sums of the squares of the deviations for the two lines

$\begin{array}{l} {SS}_{y y} = \sum {(y_{i} - \bar{y})}^{2} & and & SSE = \sum {(y_{i} - {\hat{y}}_{i})}^{2} \end{array}$ $\begin{array}{l} {SS}_{y y} = \sum {(y_{i} - \bar{y})}^{2} & and & SSE = \sum {(y_{i} - {\hat{y}}_{i})}^{2} \end{array}$

will be nearly equal.
If x does contribute information for the prediction of y, the SSE will be smaller than ${SS}_{y y} .$ ${SS}_{y y} .$ In fact, if all the points fall on the least squares line, then $S S E = 0.$ $S S E = 0.$

Consequently, the reduction in the sum of the squares of the deviations that can be attributed to x, expressed as a proportion of ${SS}_{y y},$ ${SS}_{y y},$ is

\frac{{SS}_{y y} - SSE}{{SS}_{y y}}

$\frac{{SS}_{y y} - SSE}{{SS}_{y y}}$

A comparison of the sum of squares of deviations for two models

Note that ${SS}_{y y}$ ${SS}_{y y}$ is the “total sample variability” of the observations around the mean $\overline{y}$ $\overline{y}$ and that SSE is the remaining “unexplained sample variability” after fitting the line $\hat{y} .$ $\hat{y} .$ Thus, the difference ${(SS}_{y y} - SSE)$ ${(SS}_{y y} - SSE)$ is the “explained sample variability” attributable to the linear relationship with x. Thus, a verbal description of the proportion is

\begin{array}{l} \frac{{SS}_{y y} - SSE}{{SS}_{y y}} & = & \frac{Explained sample variability}{Total sample variability} \\ = & Proportion of total sample variability explained by the \\ linear relationship \end{array}

$\begin{array}{l} \frac{{SS}_{y y} - SSE}{{SS}_{y y}} & = & \frac{Explained sample variability}{Total sample variability} \\ = & Proportion of total sample variability explained by the \\ linear relationship \end{array}$

In simple linear regression, it can be shown that this proportion—called the coefficient of determination—is equal to the square of the simple linear coefficient of correlation, r.

The coefficient of determination is

r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} = 1 - \frac{SSE}{{SS}_{y y}}

$r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} = 1 - \frac{SSE}{{SS}_{y y}}$

and represents the proportion of the total sample variability around $\overline{y}$ $\overline{y}$ that is explained by the linear relationship between y and x. (In simple linear regression, it may also be computed as the square of the coefficient of correlation, r.)

Note that $r^{2}$ $r^{2}$ is always between 0 and 1 because r is between $- 1$ $- 1$ and $+ 1.$ $+ 1.$ Thus, an $r^{2}$ $r^{2}$ of .60 means that the sum of the squares of the deviations of the y values about their predicted values has been reduced 60% by the use of the least squares equation $\hat{y},$ $\hat{y},$ instead of $\overline{y},$ $\overline{y},$ to predict y.

Example 9.6 Obtaining the Value of `r`²—Drug Reaction Regression

Problem

Calculate the coefficient of determination for the drug reaction example. The data are repeated in Table 9.4 for convenience. Interpret the result.

Solution

From previous calculations,

$\begin{array}{l} {SS}_{y y} = 6 & and & SSE = \sum {(y - \hat{y})}^{2} = 1.10 \end{array}$ $\begin{array}{l} {SS}_{y y} = 6 & and & SSE = \sum {(y - \hat{y})}^{2} = 1.10 \end{array}$

Then, from our earlier definition, the coefficient of determination is

$r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} = \frac{6.0 - 1.1}{6.0} = \frac{4.9}{6.0} = .817$ $r^{2} = \frac{{SS}_{y y} - SSE}{{SS}_{y y}} = \frac{6.0 - 1.1}{6.0} = \frac{4.9}{6.0} = .817$

Table 9.4

Percent x of Drug Reaction Time y (seconds)

1 1

2 1

3 2

4 2

5 4

Data Set: STIMULUS

Another way to compute $r^{2}$ $r^{2}$ is to recall from earlier in this section that $r = .904 .$ $r = .904 .$ Then we have $r^{2} = (.904)^{2} = .817 .$ $r^{2} = (.904)^{2} = .817 .$ A third way to obtain $r^{2}$ $r^{2}$ is from a computer printout. Its value is highlighted on the SPSS printout in Figure 9.19. Our interpretation is as follows: We know that using the percent x of drug in the blood to predict y with the least squares line

$\hat{y} = - .1 + .7 x$ $\hat{y} = - .1 + .7 x$

accounts for nearly 82% of the total sum of the squares of the deviations of the five sample y values about their mean. Or, stated another way, 82% of the sample variation in reaction time (y) can be “explained” by using the percent x of drug in a straight-line model.

Percent `x` of Drug	Reaction Time `y` (seconds)
1	1
2	1
3	2
4	2
5	4

Portion of SPSS printout for time-drug regression

Now Work Exercise 9.87a

Practical Interpretation of the Coefficient of Determination, $r^{2}$ $r^{2}$

$100 (r^{2}) %$ $100 (r^{2}) %$ of the sample variation in y (measured by the total sum of the squares of the deviations of the sample y values about their mean $\overline{y}$ $\overline{y}$ ) can be explained by (or attributed to) using x to predict y in the straight-line model.

Statistics in Action Revised

Using the Coefficients of Correlation and Determination to Assess the Dowsing Data

In the previous Statistics in Action Revisited, we discovered that using a dowser’s guess (x) in a straight-line model was not statistically useful in predicting actual pipe location (y). Both the coefficient of correlation and the coefficient of determination (highlighted on the MINITAB printouts in Figure SIA9.4) also support this conclusion. The value of the correlation coefficient, $r = .314,$ $r = .314,$ indicates a fairly weak positive linear relationship between the variables. This value, however, is not statistically significant $(p - v a l u e = .118) .$ $(p - v a l u e = .118) .$ In other words, there is no evidence to indicate that the population correlation coefficient is different from 0. The coefficient of determination, $r^{2} = .099,$ $r^{2} = .099,$ implies that only about 10% of the sample variation in pipe location values can be explained by the simple linear model.

MINITAB printouts with coefficients of correlation and determination for the dowsing data

Exercises 9.77–9.100

Understanding the Principles

9.77 True or False. The correlation coefficient is a measure of the strength of the linear relationship between x and y.
9.78 Describe the slope of the least squares line if
1. $r = .7$ $r = .7$
2. $r = - .7$ $r = - .7$
3. $r = 0$ $r = 0$
4. $r^{2} = .64$ $r^{2} = .64$
9.79 Explain what each of the following sample correlation coefficients tells you about the relationship between the x and y values in the sample:
1. $r = 1$ $r = 1$
2. $r = - 1$ $r = - 1$
3. $r = 0$ $r = 0$
4. $r = .90$ $r = .90$
5. $r = .10$ $r = .10$
6. $r = - .88$ $r = - .88$
9.80 True or False. A value of the correlation coefficient near 1 or near $- 1$ $- 1$ implies a causal relationship between x and y.

Learning the Mechanics

9.81 Construct a scatterplot for each data set. Then calculate r and $r^{2}$ $r^{2}$ for each data set.

`x`	$- 2$ $- 2$	$- 1$ $- 1$	0	1	2
`y`	$- 2$ $- 2$	1	2	5	6

`x`	$- 2$ $- 2$	$- 1$ $- 1$	0	1	2
`y`	6	5	3	2	0

c.

x 1 2 2 3 3 3 4

y 2 1 3 1 2 3 2
d.

x 0 1 3 5 6

y 0 1 2 1 0

9.82 Calculate $r^{2}$ $r^{2}$ for the least squares line in Exercise 9.18 (p. 512).
9.83 Calculate $r^{2}$ $r^{2}$ for the least squares line in Exercise 9.21 (p. 512).

Applet Exercise 9.2

Use the applet entitled Correlation by the Eye to explore the relationship between the pattern of data in a scatterplot and the corresponding correlation coefficient.

Run the applet several times. Each time, guess the value of the correlation coefficient. Then click Show r to see the actual correlation coefficient. How close is your value to the actual value of r? Click New data to reset the applet.
Click the trash can to clear the graph. Use the mouse to place five points on the scatterplot that are approximately in a straight line. Then guess the value of the correlation coefficient. Click Show r to see the actual correlation coefficient. How close were you this time?
Continue to clear the graph and plot sets of five points with different patterns among the points. Guess the value of r. How close do you come to the actual value of r each time?
On the basis of your experiences with the applet, explain why we need to use more reliable methods of finding the correlation coefficient than just “eyeing” it.

Applying the Concepts—Basic

9.84 RateMyProfessors.com. A popular Web site among college students is RateMyProfessors.com (RMP). Established over 10 years ago, RMP allows students to post quantitative ratings of their instructors. In Practical Assessment, Research & Evaluation (May 2007), University of Maine researchers investigated whether instructor ratings posted on RMP are correlated with the formal in-class student evaluations of teaching (SET) that all universities are required to administer at the end of the semester. Data collected for $n = 426$ $n = 426$ University of Maine instructors yielded a correlation between RMP and SET ratings of .68.
1. Give the equation of a linear model relating SET rating (y) to RMP rating (x).
2. Give a practical interpretation of the value $r = .68$ $r = .68$ .
3. Is the estimated slope of the line, part a, positive or negative? Explain.
4. A test of the null hypothesis $H_{0} : ρ = 0$ $H_{0} : ρ = 0$ yielded a p-value of .001. Interpret this result.
5. Compute the coefficient of determination, r², for the regression analysis. Interpret the result.
9.85 Last name and acquisition timing. Refer to the Journal of Consumer Research (Aug. 2011) study of the speed with which consumers decide to purchase a product, Exercise7.12 (p. 382). Recall that the researchers theorized that consumers with last names that begin with letters later in the alphabet will tend to acquire items faster than those whose last names are earlier in the alphabet (i.e., the last name effect). Each in a sample of 50 MBA students was offered free tickets to attend a college basketball game for which there was a limited supply of tickets. The first letter of the last name of those who responded to an e-mail offer in time to receive the tickets was noted and given a numerical value (e.g., “A” $= 1$ $= 1$ , “B” $= 2$ $= 2$ , etc.). Each student’s response time (measured in minutes) was also recorded.
1. a. The researchers computed the correlation between the two variables as $r = - .271$ $r = - .271$ . Interpret this result.
2. b. The observed significance level for testing for a negative correlation in the population was reported as p-value $= .018$ $= .018$ . Interpret this result for $α = .05$ $α = .05$ .
3. c. Does this analysis support the researchers’ last name effect theory? Explain.
TASTE 9.86 Taste-testing scales. The Journal of Food Science (Feb. 2014) published the results of a taste-testing study. The researchers evaluated the general Labeled Magnitude Scale (gLMS), used to rate the palatability of food items on a scale ranging from $- 100$ $- 100$ (for strongest imaginable dislike) to $+ 100$ $+ 100$ (for strongest imaginable like). The researchers called this rating the perceived hedonic intensity. A sample of 200 students and staff at the University of Florida used the scale to rate their most favorite and least favorite foods. In addition, each taster rated the sensory intensity of four different solutions: salt, sucrose, citric acid, and hydrochloride. The averages of these four ratings were used by the researchers to quantify individual variation in taste intensity—called perceived sensory intensity. These data are saved in the TASTE file. The accompanying MINITAB printout shows the correlation between perceived sensory intensity (PSI) and perceived hedonic intensity for both favorite (PHI-F) and least favorite (PHI-L) foods. According to the researchers, “the palatability of the favorite and least favorite foods varies depending on the perceived intensity of taste: Those who experience the greatest taste intensity (that is, supertasters) tend to experience more extreme food likes and dislikes.” Do you agree? Explain.
9.87 Going for it on fourth down in the NFL. Each week coaches in the National Football League (NFL) face a decision during the game. On fourth down, should the team punt the ball or go for a first down? To aid in the decision-making process, statisticians at California State University, Northridge, developed a regression model for predicting the number of points scored (y) by a team that has a first down with a given number of yards (x) from the opposing goal line (Chance, Winter 2009). One of the models fit to data collected on five NFL teams from a recent season was the simple linear regression model, $E (y) = β_{0} + β_{1} x$ $E (y) = β_{0} + β_{1} x$ . The regression yielded the following results: $\hat{y} = 4.42 - .048, r^{2} = .18$ $\hat{y} = 4.42 - .048, r^{2} = .18$ .
1. a. Give a practical interpretation of the coefficient of determination, r².
2. b. Compute the value of the coefficient of correlation, r, from the value of r². Is the value of r positive or negative? Why?
TRAPS 9.88 Lobster fishing study. Refer to the Bulletin of Marine Science (Apr. 2010) study of teams of fishermen fishing for the red spiny lobster in Baja California Sur, Mexico, Exercise 9.63 (p. 529). Recall that simple linear regression was used to model $y =$ $y =$ total catch of lobsters (in kilograms) during the season as a function of $x =$ $x =$ average percentage of traps allocated per day to exploring areas of unknown catch (called search frequency).
1. Locate and interpret the coefficient of determination, r², on the SAS printout shown on p. 529.
2. Note that the coefficient of correlation, r, is not shown on the SAS printout. Is there information on the printout to determine whether total catch (y) is negatively linearly related to search frequency (x)? Explain.
9.89 Physical activity of obese young adults. The International Journal of Obesity (Jan. 2007) published a study of the physical activity of obese young adults. For two groups of young adults—13 obese and 15 of normal weight—researchers recorded the total number of registered movements (counts) of each young adult over a period of time. Baseline physical activity was then computed as the number of counts per minute (cpm). Four years later, physical activity measurements were taken again—called physical activity at follow-up.
1. For the 13 obese young adults, the researchers reported a correlation of $r = .50$ $r = .50$ between baseline and follow-up physical activity, with an associated p-value of .07. Give a practical interpretation of this correlation coefficient and p-value.
2. Refer to part a. Construct a scatterplot of the 13 data points that would yield a value of $r = .50 .$ $r = .50 .$
3. For the 15 young adults of normal weight, the researchers reported a correlation of $r = - .12$ $r = - .12$ between baseline and follow-up physical activity, with an associated p-value of .66. Give a practical interpretation of this correlation coefficient and p-value.
4. Refer to part c. Construct a scatterplot of the 15 data points that would yield a value of $r = - 12.$ $r = - 12.$

Applying the Concepts—Intermediate

9.90 Salary linked to height. Are short people shortchanged when it comes to salary? According to business professors T. A. Judge (University of Florida) and D. M. Cable (University of North Carolina), tall people tend to earn more money over their career than short people earn (Journal of Applied Psychology, June 2004). Using data collected from participants in the National Longitudinal Surveys, the researchers computed the correlation between average earnings (in dollars) and height (in inches) for several occupations. The results are given in the following table.

Occupation	Correlation, `r`	Sample Size, `n`
Sales	.41	117
Managers	.35	455
Blue Collar	.32	349
Service Workers	.31	265
Professional/Technical	.30	453
Clerical	.25	358
Crafts/Forepersons	.24	250

Source: Judge, T. A., and Cable, D. M. “The effect of physical height on workplace success and income: Preliminary test of a theoretical model.” Journal of Applied Psychology, Vol. 89, No. 3, June 2004 (Table 5). Copyright © 2004 by the American Psychological Association. Reprinted with permission.

Interpret the value of r for people in sales occupations.
Compute $r^{2}$ $r^{2}$ for people in sales occupations. Interpret the result.
Give $H_{0}$ $H_{0}$ and $H_{a}$ $H_{a}$ for testing whether average earnings and height are positively correlated.
Compute the test statistic for testing $H_{0}$ $H_{0}$ and $H_{a}$ $H_{a}$ in part c for people in sales occupations.
Use the result you obtained in part d to conduct the test at $α = .01 .$ $α = .01 .$ State the appropriate conclusion.
Select another occupation and repeat parts a–e.

9.91 View of rotated objects. Perception & Psychophysics (July 1998) reported on a study of how people view three-dimensional objects projected onto a rotating two-dimensional image. Each in a sample of 25 university students viewed various depth-rotated objects (e.g., a hairbrush, a duck, and a shoe) until they recognized the object. The recognition exposure time—that is, the minimum time (in milliseconds) required for the subject to recognize the object—was recorded for each object. In addition, each subject rated the “goodness of view” of the object on a numerical scale, with lower scale values corresponding to better views. The following table gives the correlation coefficient r between recognition exposure time and goodness of view for several different rotated objects:

Object	`r`	`t`
Piano	.447	2.40
Bench	$- .057$ $- .057$	.27
Motorbike	.619	3.78
Armchair	.294	1.47
Teapot	.949	14.50

Interpret the value of r for each object.
Calculate and interpret the value of $r^{2}$ $r^{2}$ for each object.
The table also includes the t-value for testing the null hypothesis of no correlation (i.e., for testing $H_{0} : β_{1} = 0$ $H_{0} : β_{1} = 0$ ). Interpret these results using $x = .05$ $x = .05$ .

9.92 Eye anatomy of giraffes. Refer to the African Zoology (Oct. 2013) study of giraffe eye characteristics, Exercise 9.71 (p. 530). Recall that the researchers fit a simple linear regression equation of the form $ln (y) = β_{0} + β_{1} ln (x) + ε$ $ln (y) = β_{0} + β_{1} ln (x) + ε$ , where y represents an eye characteristic and x represents body mass (measured in kilograms).
1. For the eye characteristic $y = e y e$ $y = e y e$ mass (grams), the regression equation yielded $r^{2} = .948$ $r^{2} = .948$ . Give a practical interpretation of this result.
2. Refer to part a above and Exercise 9.71 part a. Find the value of the correlation coefficient, r, and interpret its value.
3. For the eye characteristic $y = o r b i t$ $y = o r b i t$ axis angle (degrees), the regression equation yielded $r^{2} = .375$ $r^{2} = .375$ . Give a practical interpretation of this result.
4. Refer to part c above and Exercise 9.71 part b. Find the value of the correlation coefficient, r, and interpret its value.
9.93 Do nice guys finish first or last? Refer to the Nature (Mar. 20, 2008) study of the use of punishment in cooperation games, Exercise 9.22 (p. 512). Recall that college students repeatedly played a version of the game “prisoner’s dilemma” and the researchers recorded the average payoff (y) and the number of times punishment was used (x) for each player. A negative correlation was discovered between x and y.
1. Give the null and alternative hypotheses for testing whether average payoff and punishment use are negatively correlated.
2. The test, part a, yielded a p-value of .001. Interpret this result using $x = .05$ $x = .05$ .
3. Does the result, part b, imply that increasing punishment causes your payoff to decrease? Explain.
NAME2 9.94 The “name game.” Refer to the Journal of Experimental Psychology—Applied (June 2000) name-retrieval study, first presented in Exercise 9.34 (p. 517). Find and interpret the values of r and $r^{2}$ $r^{2}$ for the simple linear regression relating the proportion of names recalled (y) and the position (order) of the student (x) during the “name game.”
BOXING2 9.95 Effect of massage on boxing. Refer to the British Journal of Sports Medicine (Apr. 2000) study of the effect of massage on boxing performance, presented in Exercise 9.70 (p. 530). Find and interpret the values of r and $r^{2}$ $r^{2}$ for the simple linear regression relating the blood lactate concentration and the boxer’s perceived recovery.

MINES 9.96 Child labor in diamond mines. The role of child laborers in Africa’s colonial-era diamond mines was the subject of research published in the Journal of Family History (Vol. 35, 2010). One particular mining company lured children to the mines by offering incentives for adult male laborers to relocate their families close to the diamond mine. The success of the incentive program was examined by determining the annual accompaniment rate, i.e., the percentage of wives (or sons or daughters) who accompanied their husbands (or fathers) in relocating to the mine. The accompaniment rates over the years 1939–1947 are shown in the table below.

Find the correlation coefficient relating the accompaniment rates for wives and sons. Interpret this value.
Find the correlation coefficient relating the accompaniment rates for wives and daughters. Interpret this value.
Find the correlation coefficient relating the accompaniment rates for sons and daughters. Interpret this value.

Alternate View

Year Wives Sons Daughters

1939 27.2 2.2 16.9

1940 40.1 1.5 15.7

1941 35.7 0.3 12.6

1942 37.8 3.5 22.2

1943 38.0 5.4 22.0

1944 38.4 11.0 24.3

1945 38.7 11.9 17.9

1946 29.8 8.6 17.7

1947 23.8 7.4 22.2

Source: Cleveland, T. “Minors in name only: Child laborers on the diamond mines of the Companhia de Diamantes de Angola (Diamang), 1917–1975.” Journal of Family History, Vol. 35, No. 1, 2010 (Table 1).

Year	Wives	Sons	Daughters
1939	27.2	2.2	16.9
1940	40.1	1.5	15.7
1941	35.7	0.3	12.6
1942	37.8	3.5	22.2
1943	38.0	5.4	22.0
1944	38.4	11.0	24.3
1945	38.7	11.9	17.9
1946	29.8	8.6	17.7
1947	23.8	7.4	22.2

CLIFFS 9.97 Plants that grow on Swiss cliffs. Refer to the Alpine Botany (Nov. 2012) study of rare plants that grow on the limestone cliffs of the Northern Swiss Jura mountains, Exercise 2.165 (p. 97). Data on altitude above sea level (meters), plant population size (number of plants growing), and molecular variance (i.e., the variance in molecular weight of the plants) for a sample of 12 limestone cliffs are reproduced in the table. Recall that the researchers are interested in whether either altitude or population size is related to molecular variance.

Alternate View

Cliff Number Altitude Population Size Molecular Variance

1 468 147 59.8

2 589 209 24.4

3 700 28 42.2

4 664 177 59.5

5 876 248 65.8

6 909 53 17.7

7 1032 33 12.5

8 952 114 27.6

9 832 217 35.9

10 1099 10 13.3

11 982 8 3.6

12 1053 15 3.2

Source: Rusterholz, H., Aydin, D., and Baur, B. “Population structure and genetic diversity of relict populations of Alyssum montanum on limestone cliffs in the Northern Swiss Jura mountains.” Alpine Botany, Vol. 122, No. 2, Nov. 2012 (Tables 1 and 2).

Cliff Number	Altitude	Population Size	Molecular Variance
1	468	147	59.8
2	589	209	24.4
3	700	28	42.2
4	664	177	59.5
5	876	248	65.8
6	909	53	17.7
7	1032	33	12.5
8	952	114	27.6
9	832	217	35.9
10	1099	10	13.3
11	982	8	3.6
12	1053	15	3.2

Use simple linear regression to investigate the relationship between molecular variance (y) and altitude (x). Find and interpret the value of $r^{2}$ $r^{2}$ .
Use simple linear regression to investigate the relationship between molecular variance (y) and population size (x). Find and interpret the value of $r^{2}$ $r^{2}$ .
What are your recommendations to the researchers?

Applying the Concepts—Advanced

9.98 Pain tolerance study. A study published in Psychosomatic Medicine (Mar./Apr. 2001) explored the relationship between reported severity of pain and actual pain tolerance in 337 patients who suffer from chronic pain. Each patient reported his/her severity of chronic pain on a seven-point scale $(1 = no pain, 7 = extreme pain) .$ $(1 = no pain, 7 = extreme pain) .$ To obtain a pain tolerance level, a tourniquet was applied to the arm of each patient and twisted. The maximum pain level tolerated was measured on a quantitative scale.
1. According to the researchers, “Correlational analysis revealed a small but significant inverse relationship between [actual] pain tolerance and the reported severity of chronic pain.” On the basis of this statement, is the value of r for the 337 patients positive or negative?
2. Suppose that the result reported in part a is significant at $α = .05 .$ $α = .05 .$ Find the approximate value of r for the sample of 337 patients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9.5 The Coefficients of Correlation and Determination

Create new playlist

Sign In

Sign Up

Table of Contents for
9.5 The Coefficients of Correlation and Determination