9.5 The Coefficients of Correlation and Determination

In this section, we present two statistics that describe the adequacy of a model: the coefficient of correlation and the coefficient of determination.

Coefficient of Correlation

Recall (from optional Section 2.8) that a bivariate relationship describes a relationship—or correlation—between two variables x and y. Scatterplots are used to describe a bivariate relationship graphically. In this section, we will discuss the concept of correlation and how it can be used to measure the linear relationship between two variables x and y. A numerical descriptive measure of correlation is provided by the coefficient of correlation, r.

The coefficient of correlation,*

r, is a measure of the strength of the linear relationship between two variables x and y. It is computed (for a sample of n measurements on x and y) as follows:

*The two tests are equivalent in simple linear regression only.

r=SSxySSxxSSyy

where

SSxy=(xx¯)(yy¯)=xy(x)(y)nSSxx=(xx¯)2=x2(x)2nSSyy=(yy¯)2=y2(y)2n

Note that the computational formula for the correlation coefficient r given above involves the same quantities that were used in computing the least squares prediction equation. In fact, since the numerators of the expressions for β^1 and r are identical, it is clear that r=0 when β^1=0 (the case where x contributes no information for the prediction of y) and that r is positive when the slope is positive and negative when the slope is negative. Unlike β^1, the correlation coefficient r is scaleless and assumes a value between 1 and +1, regardless of the units of x and y.

A value of r near or equal to 0 implies little or no linear relationship between y and x. In contrast, the closer r comes to 1 or 1, the stronger is the linear relationship between y and x. And if r=1 or r=1, all the sample points fall exactly on the least squares line. Positive values of r imply a positive linear relationship between y and x; that is, y increases as x increases. Negative values of r imply a negative linear relationship between y and x; that is, y decreases as x increases. Each of these situations is portrayed in Figure 9.16.

Figure 9.16

Values of r and their implications

Now Work Exercise 9.79

We use the data in Table 9.1 for the drug reaction example to demonstrate how to calculate the coefficient of correlation, r. The quantities needed to calculate r are SSxy,SSxx, and SSyy. The first two quantities have been calculated previously and are SSxy=7 and SSxx=10. The calculation for SSyy=(yy¯)2 is shown in the last column of the Excel spreadsheet, Figure 9.5 (p. 508). The result is SSyy=6.

We now find the coefficient of correlation:

r=SSxySSxxSSyy=7(10)(6)=760=.904

The fact that r is positive and near 1 indicates that the reaction time tends to increase as the amount of drug in the bloodstream increases—for the given sample of five subjects. This is the same conclusion we reached when we found the calculated value of the least squares slope to be positive.

Example 9.5 Using the Correlation Coefficient—Relating Crime Rate and Casino Employment 

Problem

  1. Legalized gambling is available on several riverboat casinos operated by a city in Mississippi. The mayor of the city wants to know the correlation between the number of casino employees and the yearly crime rate. The records for the past 10 years are examined, and the results listed in Table 9.3 are obtained. Calculate the coefficient of correlation, r, for the data. Interpret the result.

    Table 9.3 Data on Casino Employees and Crime Rate, Example 9.5

    Year Number x of Casino Employees (thousands) Crime Rate y (number of crimes per 1,000 population)
    2006 15 1.35
    2007 18 1.63
    2008 24 2.33
    2009 22 2.41
    2010 25 2.63
    2011 29 2.93
    2012 30 3.41
    2013 32 3.26
    2014 35 3.63
    2015 38 4.15

    Data Set: CASINO

Solution

  1. Rather than use the computing formula given earlier, we resort to a statistical software package. The data of Table 9.3 were entered into a computer and MINITAB was used to compute r. The MINITAB printout is shown in Figure 9.17.

    Figure 9.17

    MINITAB correlation printout and scatterplot for Example 9.5

    Ethics in Statistics

    Intentionally using the correlation coefficient only to make an inference about the relationship between two variables in situations where a nonlinear relationship may exist is considered unethical statistical practice.

    The coefficient of correlation, highlighted at the top of the printout, is r=.987. Thus, the size of the casino workforce and crime rate in this city are very highly correlated—at least over the past 10 years. The implication is that a strong positive linear relationship exists between these variables. (See Figure 9.17.) We must be careful, however, not to jump to any unwarranted conclusions. For instance, the mayor may be tempted to conclude that hiring more casino workers next year will increase the crime rate—that is, that there is a causal relationship between the two variables. However, high correlation does not imply causality. The fact is, many things have probably contributed both to the increase in the casino workforce and to the increase in crime rate. The city’s tourist trade has undoubtedly grown since riverboat casinos were legalized, and it is likely that the casinos have expanded both in services offered and in number. We cannot infer a causal relationship on the basis of high sample correlation. When a high correlation is observed in the sample data, the only safe conclusion is that a linear trend may exist between x and y.

Look Back

Another variable, such as the increase in tourism, may be the underlying cause of the high correlation between x and y.

Now Work Exercise 9.85a

Caution

Two caveats apply in using the sample correlation coefficient r to infer the nature of the relationship between x and y: (1) A high correlation does not necessarily imply that a causal relationship exists between x and y—only that a linear trend may exist; (2) a low correlation does not necessarily imply that x and y are unrelated—only that x and y are not strongly linearly related.

Keep in mind that the correlation coefficient r measures the linear correlation between x values and y values in the sample, and a similar linear coefficient of correlation exists for the population from which the data points were selected. The population correlation coefficient is denoted by the symbol ρ (rho). As you might expect, ρ is estimated by the corresponding sample statistic r. Or, instead of estimating ρ, we might want to test the null hypothesis H0:ρ=0 against Ha:ρ0; that is, we can test the hypothesis that x contributes no information for the prediction of y by using the straight-line model against the alternative that the two variables are at least linearly related.

However, we already performed this identical test in Section 9.4 when we tested H0:β1=0 against Ha:β10. That is, the null hypothesis H0:ρ=0 is equivalent to the hypothesis H0:β1=0.* When we tested the null hypothesis H0:β1=0 in connection with the drug reaction example, the data led to a rejection of the null hypothesis at the α=.05 level. This rejection implies that the null hypothesis of a 0 linear correlation between the two variables (drug and reaction time) can also be rejected at the α=.05 level. The only real difference between the least squares slope β^1 and the coefficient of correlation, r, is the measurement scale. Therefore, the information they provide about the usefulness of the least squares model is to some extent redundant. For this reason, we will use the slope to make inferences about the existence of a positive or negative linear relationship between two variables.

For the sake of completeness, a summary of the test for linear correlation is provided in the following boxes.

A Test for Linear Correlation

Test statistic: tc=rn21r2=β^1sβ^1
One-Tailed Tests
H0:ρ=0 H0:ρ=0
Ha:ρ>0 Ha:ρ<0
Two-Tailed Test
H0:ρ=0
Ha:ρ0

Rejection region: tc>tα tc<tα |tc|>tα/2

p-value: P(t>tc) P(t<tc) 2P(t>tc) if tc is positive
2P(t<tc) if tc is negative

where the distribution of t depends on (n2) df.

Condition Required for a Valid Test of Correlation

The sample of (x, y) values is randomly selected from a normal population.

Coefficient of Determination

Another way to measure the usefulness of a linear model is to measure the contribution of x in predicting y. To accomplish this, we calculate how much the errors of prediction of y were reduced by using the information provided by x. To illustrate, consider the sample shown in the scatterplot of Figure 9.18a. If we assume that x contributes no information for the prediction of y, the best prediction for a value of y is the sample mean y, which is shown as the horizontal line in Figure 9.18b. The vertical line segments in Figure 9.18b are the deviations of the points about the mean y. Note that the sum of the squares of the deviations for the prediction equation y^=y is

SSyy=(yiy¯)2

Now suppose you fit a least squares line to the same set of data and locate the deviations of the points about the line, as shown in Figure 9.18c. Compare the deviations about the prediction lines in Figures 9.18b and 9.18c You can see that

  1. If x contributes little or no information for the prediction of y, the sums of the squares of the deviations for the two lines

    SSyy=(yiy¯)2andSSE=(yiy^i)2

    will be nearly equal.

  2. If x does contribute information for the prediction of y, the SSE will be smaller than SSyy. In fact, if all the points fall on the least squares line, then SSE=0.

Consequently, the reduction in the sum of the squares of the deviations that can be attributed to x, expressed as a proportion of SSyy, is

SSyySSESSyy

Figure 9.18

A comparison of the sum of squares of deviations for two models

Note that SSyy is the “total sample variability” of the observations around the mean y and that SSE is the remaining “unexplained sample variability” after fitting the line y^. Thus, the difference (SSyySSE) is the “explained sample variability” attributable to the linear relationship with x. Thus, a verbal description of the proportion is

SSyySSESSyy=Explained sample variabilityTotal sample variability=Proportion of total sample variability explained by thelinear relationship

In simple linear regression, it can be shown that this proportion—called the coefficient of determination—is equal to the square of the simple linear coefficient of correlation, r.

The coefficient of determination is

r2=SSyySSESSyy=1SSESSyy

and represents the proportion of the total sample variability around y that is explained by the linear relationship between y and x. (In simple linear regression, it may also be computed as the square of the coefficient of correlation, r.)

Note that r2 is always between 0 and 1 because r is between 1 and +1. Thus, an r2 of .60 means that the sum of the squares of the deviations of the y values about their predicted values has been reduced 60% by the use of the least squares equation y^, instead of y, to predict y.

Example 9.6 Obtaining the Value of r2—Drug Reaction Regression

Problem

  1. Calculate the coefficient of determination for the drug reaction example. The data are repeated in Table 9.4 for convenience. Interpret the result.

Solution

  1. From previous calculations,

    SSyy=6andSSE=(yy^)2=1.10

    Then, from our earlier definition, the coefficient of determination is

    r2=SSyySSESSyy=6.01.16.0=4.96.0=.817

    Table 9.4

    Percent x of Drug Reaction Time y (seconds)
    1 1
    2 1
    3 2
    4 2
    5 4

    Data Set: STIMULUS

    Another way to compute r2 is to recall from earlier in this section that r=.904. Then we have r2=(.904)2=.817. A third way to obtain r2 is from a computer printout. Its value is highlighted on the SPSS printout in Figure 9.19. Our interpretation is as follows: We know that using the percent x of drug in the blood to predict y with the least squares line

    y^=.1+.7x

    accounts for nearly 82% of the total sum of the squares of the deviations of the five sample y values about their mean. Or, stated another way, 82% of the sample variation in reaction time (y) can be “explained” by using the percent x of drug in a straight-line model.

Figure 9.19

Portion of SPSS printout for time-drug regression

Now Work Exercise 9.87a

Practical Interpretation of the Coefficient of Determination, r2

100(r2)% of the sample variation in y (measured by the total sum of the squares of the deviations of the sample y values about their mean y) can be explained by (or attributed to) using x to predict y in the straight-line model.

Statistics in Action Revised

Using the Coefficients of Correlation and Determination to Assess the Dowsing Data

In the previous Statistics in Action Revisited, we discovered that using a dowser’s guess (x) in a straight-line model was not statistically useful in predicting actual pipe location (y). Both the coefficient of correlation and the coefficient of determination (highlighted on the MINITAB printouts in Figure SIA9.4) also support this conclusion. The value of the correlation coefficient, r=.314, indicates a fairly weak positive linear relationship between the variables. This value, however, is not statistically significant (p-value=.118). In other words, there is no evidence to indicate that the population correlation coefficient is different from 0. The coefficient of determination, r2=.099, implies that only about 10% of the sample variation in pipe location values can be explained by the simple linear model.

Figure SIA9.4

MINITAB printouts with coefficients of correlation and determination for the dowsing data

Exercises 9.77–9.100

Understanding the Principles

  1. 9.77 True or False. The correlation coefficient is a measure of the strength of the linear relationship between x and y.

  2. 9.78 Describe the slope of the least squares line if

    1. r=.7

    2. r=.7

    3. r=0

    4. r2=.64

  3. 9.79 Explain what each of the following sample correlation coefficients tells you about the relationship between the x and y values in the sample:

    1. r=1

    2. r=1

    3. r=0

    4. r=.90

    5. r=.10

    6. r=.88

  4. 9.80 True or False. A value of the correlation coefficient near 1 or near 1 implies a causal relationship between x and y.

Learning the Mechanics

  1. 9.81 Construct a scatterplot for each data set. Then calculate r and r2 for each data set.

    1. a.

      x 2 1 0 1 2
      y 2 1 2 5 6
    2. b.

      x 2 1 0 1 2
      y 6 5 3 2 0
    3. c.

      x 1 2 2 3 3 3 4
      y 2 1 3 1 2 3 2
    4. d.

      x 0 1 3 5 6
      y 0 1 2 1 0
  2. 9.82 Calculate r2 for the least squares line in Exercise 9.18 (p. 512).

  3. 9.83 Calculate r2 for the least squares line in Exercise 9.21 (p. 512).

Applet Exercise 9.2

Use the applet entitled Correlation by the Eye to explore the relationship between the pattern of data in a scatterplot and the corresponding correlation coefficient.

  1. Run the applet several times. Each time, guess the value of the correlation coefficient. Then click Show r to see the actual correlation coefficient. How close is your value to the actual value of r? Click New data to reset the applet.

  2. Click the trash can to clear the graph. Use the mouse to place five points on the scatterplot that are approximately in a straight line. Then guess the value of the correlation coefficient. Click Show r to see the actual correlation coefficient. How close were you this time?

  3. Continue to clear the graph and plot sets of five points with different patterns among the points. Guess the value of r. How close do you come to the actual value of r each time?

  4. On the basis of your experiences with the applet, explain why we need to use more reliable methods of finding the correlation coefficient than just “eyeing” it.

Applying the Concepts—Basic

  1. 9.84 RateMyProfessors.com. A popular Web site among college students is RateMyProfessors.com (RMP). Established over 10 years ago, RMP allows students to post quantitative ratings of their instructors. In Practical Assessment, Research & Evaluation (May 2007), University of Maine researchers investigated whether instructor ratings posted on RMP are correlated with the formal in-class student evaluations of teaching (SET) that all universities are required to administer at the end of the semester. Data collected for n=426 University of Maine instructors yielded a correlation between RMP and SET ratings of .68.

    1. Give the equation of a linear model relating SET rating (y) to RMP rating (x).

    2. Give a practical interpretation of the value r=.68.

    3. Is the estimated slope of the line, part a, positive or negative? Explain.

    4. A test of the null hypothesis H0:ρ=0 yielded a p-value of .001. Interpret this result.

    5. Compute the coefficient of determination, r2, for the regression analysis. Interpret the result.

  2. 9.85 Last name and acquisition timing. Refer to the Journal of Consumer Research (Aug. 2011) study of the speed with which consumers decide to purchase a product, Exercise7.12  (p. 382). Recall that the researchers theorized that consumers with last names that begin with letters later in the alphabet will tend to acquire items faster than those whose last names are earlier in the alphabet (i.e., the last name effect). Each in a sample of 50 MBA students was offered free tickets to attend a college basketball game for which there was a limited supply of tickets. The first letter of the last name of those who responded to an e-mail offer in time to receive the tickets was noted and given a numerical value (e.g., “A”=1, “B”=2, etc.). Each student’s response time (measured in minutes) was also recorded.

    1. a. The researchers computed the correlation between the two variables as r=.271. Interpret this result.

    2. b. The observed significance level for testing for a negative correlation in the population was reported as p-value =.018. Interpret this result for α=.05.

    3. c. Does this analysis support the researchers’ last name effect theory? Explain.

  3. TASTE 9.86 Taste-testing scales. The Journal of Food Science (Feb. 2014) published the results of a taste-testing study. The researchers evaluated the general Labeled Magnitude Scale (gLMS), used to rate the palatability of food items on a scale ranging from 100 (for strongest imaginable dislike) to +100 (for strongest imaginable like). The researchers called this rating the perceived hedonic intensity. A sample of 200 students and staff at the University of Florida used the scale to rate their most favorite and least favorite foods. In addition, each taster rated the sensory intensity of four different solutions: salt, sucrose, citric acid, and hydrochloride. The averages of these four ratings were used by the researchers to quantify individual variation in taste intensity—called perceived sensory intensity. These data are saved in the TASTE file. The accompanying MINITAB printout shows the correlation between perceived sensory intensity (PSI) and perceived hedonic intensity for both favorite (PHI-F) and least favorite (PHI-L) foods. According to the researchers, “the palatability of the favorite and least favorite foods varies depending on the perceived intensity of taste: Those who experience the greatest taste intensity (that is, supertasters) tend to experience more extreme food likes and dislikes.” Do you agree? Explain.

  4. 9.87 Going for it on fourth down in the NFL. Each week coaches in the National Football League (NFL) face a decision during the game. On fourth down, should the team punt the ball or go for a first down? To aid in the decision-making process, statisticians at California State University, Northridge, developed a regression model for predicting the number of points scored (y) by a team that has a first down with a given number of yards (x) from the opposing goal line (Chance, Winter 2009). One of the models fit to data collected on five NFL teams from a recent season was the simple linear regression model, E(y)=β0+β1x. The regression yielded the following results: y^=4.42.048,r2=.18.

    1. a. Give a practical interpretation of the coefficient of determination, r2.

    2. b. Compute the value of the coefficient of correlation, r, from the value of r2. Is the value of r positive or negative? Why?

  5. TRAPS 9.88 Lobster fishing study. Refer to the Bulletin of Marine Science (Apr. 2010) study of teams of fishermen fishing for the red spiny lobster in Baja California Sur, Mexico, Exercise 9.63 (p. 529). Recall that simple linear regression was used to model y= total catch of lobsters (in kilograms) during the season as a function of x= average percentage of traps allocated per day to exploring areas of unknown catch (called search frequency).

    1. Locate and interpret the coefficient of determination, r2, on the SAS printout shown on p. 529.

    2. Note that the coefficient of correlation, r, is not shown on the SAS printout. Is there information on the printout to determine whether total catch (y) is negatively linearly related to search frequency (x)? Explain.

  6. 9.89 Physical activity of obese young adults. The International Journal of Obesity (Jan. 2007) published a study of the physical activity of obese young adults. For two groups of young adults—13 obese and 15 of normal weight—researchers recorded the total number of registered movements (counts) of each young adult over a period of time. Baseline physical activity was then computed as the number of counts per minute (cpm). Four years later, physical activity measurements were taken again—called physical activity at follow-up.

    1. For the 13 obese young adults, the researchers reported a correlation of r=.50 between baseline and follow-up physical activity, with an associated p-value of .07. Give a practical interpretation of this correlation coefficient and p-value.

    2. Refer to part a. Construct a scatterplot of the 13 data points that would yield a value of r=.50.

    3. For the 15 young adults of normal weight, the researchers reported a correlation of r=.12 between baseline and follow-up physical activity, with an associated p-value of .66. Give a practical interpretation of this correlation coefficient and p-value.

    4. Refer to part c. Construct a scatterplot of the 15 data points that would yield a value of r=12.

Applying the Concepts—Intermediate

  1. 9.90 Salary linked to height. Are short people shortchanged when it comes to salary? According to business professors T. A. Judge (University of Florida) and D. M. Cable (University of North Carolina), tall people tend to earn more money over their career than short people earn (Journal of Applied Psychology, June 2004). Using data collected from partici­pants in the National Longitudinal Surveys, the researchers computed the correlation between average earnings (in dollars) and height (in inches) for several occupations. The results are given in the following table.

    Occupation Correlation, r Sample Size, n
    Sales .41 117
    Managers .35 455
    Blue Collar .32 349
    Service Workers .31 265
    Professional/Technical .30 453
    Clerical .25 358
    Crafts/Forepersons .24 250

    Source: Judge, T. A., and Cable, D. M. “The effect of physical height on workplace success and income: Preliminary test of a theoretical model.” Journal of Applied Psychology, Vol. 89, No. 3, June 2004 (Table 5). Copyright © 2004 by the American Psychological Association. Reprinted with permission.

    1. Interpret the value of r for people in sales occupations.

    2. Compute r2 for people in sales occupations. Interpret the result.

    3. Give H0 and Ha for testing whether average earnings and height are positively correlated.

    4. Compute the test statistic for testing H0 and Ha in part c for people in sales occupations.

    5. Use the result you obtained in part d to conduct the test at α=.01. State the appropriate conclusion.

    6. Select another occupation and repeat parts a–e.

  2. 9.91 View of rotated objects. Perception & Psychophysics (July 1998) reported on a study of how people view three-­dimensional objects projected onto a rotating two-­dimensional image. Each in a sample of 25 university students viewed various depth-rotated objects (e.g., a hairbrush, a duck, and a shoe) until they recognized the object. The recognition exposure time—that is, the minimum time (in milliseconds) required for the subject to recognize the object—was recorded for each object. In addition, each subject rated the “goodness of view” of the object on a numerical scale, with lower scale values corresponding to better views. The following table gives the correlation coefficient r between recognition exposure time and goodness of view for several different rotated objects:

    Object r t
    Piano .447 2.40
    Bench .057 .27
    Motorbike .619 3.78
    Armchair .294 1.47
    Teapot .949 14.50
    1. Interpret the value of r for each object.

    2. Calculate and interpret the value of r2 for each object.

    3. The table also includes the t-value for testing the null hypothesis of no correlation (i.e., for testing H0:β1=0). Interpret these results using x=.05.

  3. 9.92 Eye anatomy of giraffes. Refer to the African Zoology (Oct. 2013) study of giraffe eye characteristics, Exercise 9.71 (p. 530). Recall that the researchers fit a simple linear regression equation of the form ln(y)=β0+β1ln(x)+ε, where y represents an eye characteristic and x represents body mass (measured in kilograms).

    1. For the eye characteristic y=eye mass (grams), the regression equation yielded r2=.948. Give a practical interpretation of this result.

    2. Refer to part a above and Exercise 9.71 part a. Find the value of the correlation coefficient, r, and interpret its value.

    3. For the eye characteristic y=orbit axis angle (degrees), the regression equation yielded r2=.375. Give a practical interpretation of this result.

    4. Refer to part c above and Exercise 9.71 part b. Find the value of the correlation coefficient, r, and interpret its value.

  4. 9.93 Do nice guys finish first or last? Refer to the Nature (Mar. 20, 2008) study of the use of punishment in cooperation games, Exercise 9.22 (p. 512). Recall that college students repeatedly played a version of the game “prisoner’s dilemma” and the researchers recorded the average payoff (y) and the number of times punishment was used (x) for each player. A negative correlation was discovered between x and y.

    1. Give the null and alternative hypotheses for testing whether average payoff and punishment use are negatively correlated.

    2. The test, part a, yielded a p-value of .001. Interpret this result using x=.05.

    3. Does the result, part b, imply that increasing punishment causes your payoff to decrease? Explain.

  5. NAME2 9.94 The “name game.” Refer to the Journal of Experimental Psychology—Applied (June 2000) name-retrieval study, first presented in Exercise 9.34 (p. 517). Find and interpret the values of r and r2 for the simple linear regression relating the proportion of names recalled (y) and the position (order) of the student (x) during the “name game.”

  6. BOXING2 9.95 Effect of massage on boxing. Refer to the British Journal of Sports Medicine (Apr. 2000) study of the effect of massage on boxing performance, presented in Exercise 9.70 (p. 530). Find and interpret the values of r and r2 for the simple linear regression relating the blood lactate concentration and the boxer’s perceived recovery.

  7. MINES 9.96 Child labor in diamond mines. The role of child laborers in Africa’s colonial-era diamond mines was the subject of research published in the Journal of Family History (Vol. 35, 2010). One particular mining company lured children to the mines by offering incentives for adult male laborers to relocate their families close to the diamond mine. The success of the incentive program was examined by determining the annual accompaniment rate, i.e., the percentage of wives (or sons or daughters) who accompanied their husbands (or fathers) in relocating to the mine. The accompaniment rates over the years 1939–1947 are shown in the table below.

    1. Find the correlation coefficient relating the accompaniment rates for wives and sons. Interpret this value.

    2. Find the correlation coefficient relating the accompaniment rates for wives and daughters. Interpret this value.

    3. Find the correlation coefficient relating the accompaniment rates for sons and daughters. Interpret this value.

    Alternate View
    Year Wives Sons Daughters
    1939 27.2 2.2 16.9
    1940 40.1 1.5 15.7
    1941 35.7 0.3 12.6
    1942 37.8 3.5 22.2
    1943 38.0 5.4 22.0
    1944 38.4 11.0 24.3
    1945 38.7 11.9 17.9
    1946 29.8 8.6 17.7
    1947 23.8 7.4 22.2

    Source: Cleveland, T. “Minors in name only: Child laborers on the diamond mines of the Companhia de Diamantes de Angola (Diamang), 1917–1975.” Journal of Family History, Vol. 35, No. 1, 2010 (Table 1).

  8. CLIFFS 9.97 Plants that grow on Swiss cliffs. Refer to the Alpine Botany (Nov. 2012) study of rare plants that grow on the limestone cliffs of the Northern Swiss Jura mountains, Exercise 2.165 (p. 97). Data on altitude above sea level (meters), plant population size (number of plants growing), and molecular variance (i.e., the variance in molecular weight of the plants) for a sample of 12 limestone cliffs are reproduced in the table. Recall that the researchers are interested in whether either altitude or population size is related to molecular variance.

    Alternate View
    Cliff Number Altitude Population Size Molecular Variance
    1 468 147 59.8
    2 589 209 24.4
    3 700 28 42.2
    4 664 177 59.5
    5 876 248 65.8
    6 909 53 17.7
    7 1032 33 12.5
    8 952 114 27.6
    9 832 217 35.9
    10 1099 10 13.3
    11 982 8 3.6
    12 1053 15 3.2

    Source: Rusterholz, H., Aydin, D., and Baur, B. “Population structure and genetic diversity of relict populations of Alyssum montanum on limestone cliffs in the Northern Swiss Jura mountains.” Alpine Botany, Vol. 122, No. 2, Nov. 2012 (Tables 1 and 2).

    1. Use simple linear regression to investigate the relationship between molecular variance (y) and altitude (x). Find and interpret the value of r2.

    2. Use simple linear regression to investigate the relationship between molecular variance (y) and population size (x). Find and interpret the value of r2.

    3. What are your recommendations to the researchers?

Applying the Concepts—Advanced

  1. 9.98 Pain tolerance study. A study published in Psychosomatic Medicine (Mar./Apr. 2001) explored the relationship between reported severity of pain and actual pain tolerance in 337 patients who suffer from chronic pain. Each patient reported his/her severity of chronic pain on a seven-point scale (1=nopain,7=extremepain). To obtain a pain tolerance level, a tourniquet was applied to the arm of each patient and twisted. The maximum pain level tolerated was measured on a quantitative scale.

    1. According to the researchers, “Correlational analysis revealed a small but significant inverse relationship between [actual] pain tolerance and the reported severity of chronic pain.” On the basis of this statement, is the value of r for the 337 patients positive or negative?

    2. Suppose that the result reported in part a is significant at α=.05. Find the approximate value of r for the sample of 337 patients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset