7.7 Comparing Three or More Population Means: Analysis of Variance (Optional)

Suppose we are interested in comparing the means of three or more populations. For example, we may want to compare the mean SAT scores of three different high schools.Or, we could compare the mean income per hopusehold of residents in four census districts. Since the methods of Section 7.27.6 apply to two populations only, we require an alternative technique. In this optional section, we present a method for comparing three or more population means based on independent random samples, called an analsyis of variance (ANOVA).

Biography Sir Ronald A. Fisher (1890–1962)

The Founder of Modern Statistics

At a young age, Ronald Fisher demonstrated special abilities in mathematics, astronomy, and biology. (Fisher’s biology teacher once divided all his students into two groups on the basis of their “sheer brilliance”: Fisher and the rest.) Fisher graduated from prestigious Cambridge University in London in 1912 with a B.A. degree in astronomy. After several years teaching mathematics, he found work at the Rothamsted Agricultural Experiment station, where he began his extraordinary career as a statistician. Many consider Fisher to be the leading founder of modern statistics. His contributions to the field include the notion of unbiased statistics, the development of p-values for hypothesis tests, the invention of analysis of variance for designed experiments, the maximum-likelihood estimation theory, and the formulation of the mathematical distributions of several well-known statistics. Fisher’s book, Statistical Methods for Research Workers (written in 1925), revolutionized applied statistics, demonstrating with very readable and practical examples how to analyze data and interpret the results. In 1935, Fisher wrote The Design of Experiments, in which he first described his famous experiment on the “lady tasting tea.” (Fisher showed, through a designed experiment, that the lady really could determine whether tea poured into milk tastes better than milk poured into tea.) Before his death, Fisher was elected a Fellow of the Royal Statistical Society, was awarded numerous medals, and was knighted by the Queen of England.

In the jargon of ANOVA, “treatments” represent groups or populations of interest. Thus, the objective of an ANOVA is to compare the treatment (or population) means for the quantitative (or dependent) variable of interest. If we denote the true population means of the k treatments as μ1,μ2,,μk, then we will test the null hypothesis that the treatment means are all equal against the alternative that at least two of the treatment means differ:

H0:μ1=μ2==μkHa:At least two of the k treatment means differ

The μ's might represent the means of all female and male high school seniors’ SAT scores or the means of all households’ income in each of four census regions.

To conduct a statistical test of these hypotheses, we will use the means of the independent random samples selected from the treatment populations. That is, we obtain the k sample means x1,x2,,xk.

For example, suppose you select independent random samples of five female and five male high school seniors and record their SAT scores. The data are shown in Table 7.10. A MINITAB analysis of the data, shown in Figure 7.21, reveals that the sample mean SAT scores (shaded) are 590 for females and 550 for males. Can we conclude that the population of female high school students scores 40 points higher, on average, than the population of male students?

Figure 7.21

MINITAB descriptive statistics for data in Table 7.10

Table 7.10 SAT Scores for High School Students

Females Males
530 490
560 520
590 550
620 580
650 610

Data Set: TAB7_10

To answer this question, we must consider the amount of sampling variability among the experimental units (students). The SAT scores in Table 7.10 are depicted in the dot plot shown in Figure 7.22. Note that the difference between the sample means is small relative to the sampling variability of the scores within the treatments, namely, Female and Male. We would be inclined not to reject the null hypothesis of equal population means in this case.

Figure 7.22

Dot plot of SAT scores: difference between means dominated by sampling variability

In contrast, if the data are as depicted in the dot plot of Figure 7.23, then the sampling variability is small relative to the difference between the two means. In this case, we would be inclined to favor the alternative hypothesis that the population means differ.

Figure 7.23

Dot plot of SAT scores: difference between means large relative to sampling variability

Now Work Exercise 7.107a

You can see that the key is to compare the difference between the treatment means with the amount of sampling variability. To conduct a formal statistical test of the hypothesis requires numerical measures of the difference between the treatment means and the sampling variability within each treatment. The variation between the treatment means is measured by the sum of squares for treatments (SST), which is calculated by squaring the distance between each treatment mean and the overall mean of all sample measurements, then multiplying each squared distance by the number of sample measurements for the treatment, and, finally, adding the results over all treatments. For the data in Table 7.10, the overall mean is 570. Thus, we have:

SST=i=1kni(x¯ix¯)2=5(550570)2+5(590570)2=4,000

In this equation, we use x to represent the overall mean response of all sample measurements—that is, the mean of the combined samples. The symbol ni is used to denote the sample size for the ith treatment. You can see that the value of SST is 4,000 for the two samples of five female and five male SAT scores depicted in Figures 7.22 and 7.23.

Next, we must measure the sampling variability within the treatments. We call this the sum of squares for error (SSE) because it measures the variability around the treatment means that is attributed to sampling error. The value of SSE is computed by summing the squared distance between each response measurement and the corresponding treatment mean and then adding the squared differences over all measurements in the entire sample:

SSE=j=1n1(x1jx¯1)2=j=1n2(x2jx¯2)2+j=1nk(xkjx¯k)2

Here, the symbol x1j is the jth measurement in sample 1, x2j is the jth measurement in sample 2, and so on. This rather complex-looking formula can be simplified by recalling the formula for the sample variance s2 given in Chapter 2:

s2=i=1n(xix¯)2n1

Note that each sum in SSE is simply the numerator of s2 for that particular treatment. Consequently, we can rewrite SSE as

SSE=(n11)s12+(n21)s22++(nk1)sk2

where s12,s22,,sk2 are the sample variances for the k treatments. For the SAT scores in Table 7.10, the MINITAB printout (Figure 7.21) shows that s12=2,250 (for females) and s22=2,250 (for males); then we have

SSE=(51)(2,250)+(51)(2,250)=18,000

To make the two measurements of variability comparable, we divide each by the number of degrees of freedom in order to convert the sums of squares to mean squares. First, the mean square for treatments (MST), which measures the variability among the treatment means, is equal to

MST=SSTk1=4,00021=4,000

where the number of degrees of freedom for the k treatments is (k1). Next, the mean square for error (MSE), which measures the sampling variability within the treatments, is

MSE=SSEnk=18,000102=2,250

Finally, we calculate the ratio of MST to MSE—an F-statistic:

F=MSTMSE=4,0002,250=1.78

These quantities—MST, MSE, and F—are shown (highlighted) on the MINITAB printout displayed in Figure 7.24.

Figure 7.24

MINITAB printout with ANOVA results for data in Table 7.10

Values of the F-statistic near 1 indicate that the two sources of variation, between treatment means and within treatments, are approximately equal. In this case, the difference between the treatment means may well be attributable to sampling error, which provides little support for the alternative hypothesis that the population treatment means differ. Values of F well in excess of 1 indicate that the variation among treatment means well exceeds that within means and therefore support the alternative hypothesis that the population treatment means differ.

When does F exceed 1 by enough to reject the null hypothesis that the means are equal? This depends on the sampling distribution of F. An F-distribution, which depends on v1=(k1) numerator degrees of freedom and v2=(nk) denominator degrees, is shown in figure 7.25. As you can see, the distribution is skewed to the right since F = MST/MSE cannot be less than 0 and can increase without bound.

The rejection region for the F-test is taken from a table (see Table VII through X of AppendixB). For the example of the SAT scores, the F-statistic has v1=(21)=1 numerator degrees of freedom and v2=(102)=8 denominator degrees of freedom. Thus, for α=.05, we find (from Table VIII of AppendixB) that

F.05=5.32

The implication is that MST would have to be 5.32 times greater than MSE before we could conclude, at the .05 level of significance, that the two population treatment means differ. Since the data yielded F=1.78, our initial impressions of the dot plot in Figure 7.22 are confirmed: There is insufficient information to conclude that the mean SAT scores differ for the populations of female and male high school seniors. The rejection region and the calculated F value are shown in Figure 7.25.

Figure 7.25

Rejection region and calculated F-values for SAT score samples

In contrast, consider the dot plot in Figure 7.26. The SAT scores depicted in this dot plot are listed in Table 7.11, followed by MINITAB descriptive statistics in Figure 7.26. Note that the sample means for females and males, 590 and 550, respectively, are the same as in the previous example. Consequently, the variation between the means is the same, namely, MST=4,000. However, the variation within the two treatments appears to be considerably smaller. In fact, Figure 7.26 shows that s12=62.5 and s22=62.5.

Table 7.11 SAT Scores for High School Students Shown in Figure 7.23

Females Males
580 540
585 545
590 550
595 555
600 560

Data Set: TAB7_11

Figure 7.26

MINITAB descriptive statistics and ANOVA results for data in Table 7.11

Thus, the variation within the treatments is measured by

SSE=(51)(62.5)+(51)(62.5)=500MSE=SSEnk=5008=62.5(shaded on Figure7.26)

Then the F-ratio is

F=MSTMSE=4,00062.5=64.0(shaded on Figure 7.26)

Again, our visual analysis of the dot plot is confirmed statistically: F=64.0 well exceeds the table’s F value, 5.32, corresponding to the .05 level of significance. We would therefore reject the null hypothesis at that level and conclude that the SAT mean score of males differs from that of females.

Now Work Exercise 7.107bh

The analysis of variance F-test for comparing treatment means is summarized in the following box. Note that the test is one-tailed (upper-tailed) even though we are testing for differences in means in any direction. Only if the numerator of the F-statistic (MST) is large relative to the denominator (MSE) will we conclude that treatment means differ.

ANOVA F-Test to Compare k Treatment Means: Independent Sampling Design

H0:μ1=μ2==μk

Ha:At least two treatment means differ.

Test statistic: Fc=MSTMSE

Rejection region: Fc>Fα

p-value: P(F>Fc)

where Fα is based on v1=(k1) numerator degrees of freedom (associated with MST) and v2=(nk) denominator degrees of freedom (associated with MSE)

Decision: Reject H0 if α> p-value or if test statistic (Fc) falls into rejection region.

Conditions Required for a Valid ANOVA F-Test:

  1. The samples are randomly selected in an independent manner from the k treatment populations. (This can be accomplished by randomly assigning the experimental units to the treatments.)

  2. All k sampled populations have distributions that are approximately normal.

  3. The k population variances are equal (i.e., σ12=σ22=σ32==σk2 ).

Computational formulas for MST and MSE are given in Appendix C. We will rely on statistical software to compute the F-statistic, concentrating on the interpretation of the results rather than their calculation.

GOLFCRD Example 7.10 Conducting an Anova F-Test—Comparing Golf Ball Brands

Problem

  1. Suppose the USGA wants to compare the mean distances reached of four different brands of golf balls struck with a driver. A completely randomized design is employed, with Iron Byron, the USGA’s robotic golfer, using a driver to hit a random sample of 10 balls of each brand in a random sequence. The distance is recorded for each hit, and the results are shown in Table 7.12, organized by brand.

    Table 7.12 Results of Completely Randomized Design: Iron Byron Driver

    Alternate View
    Brand A Brand B Brand C Brand D
    251.2 263.2 269.7 251.6
    245.1 262.9 263.2 248.6
    248.0 265.0 277.5 249.4
    251.1 254.5 267.4 242.0
    260.5 264.3 270.5 246.5
    250.0 257.0 265.5 251.3
    253.9 262.8 270.7 261.8
    244.6 264.4 272.9 249.0
    254.6 260.6 275.6 247.1
    248.8 255.9 266.5 245.9
    Sample means 250.8 261.1 270.0 249.3

    Data Set: GOLFCRD

    1. Set up the test to compare the mean distances for the four brands. Use α=.10.

    2. Use statistical software to obtain the test statistic and p-value. Give the appropriate conclusion.

Solution

  1. To compare the mean distances of the k=4 brands, we first specify the hypotheses to be tested. Denoting the population mean of the ith brand by μi, we test

    H0:μ1=μ2=μ3=μ4Ha:The mean distances differ for at least two of the brands.

    The test statistic compares the variation among the four treatment (Brand) means with the sampling variability within each of the treatments:

    Teststatistic:F=MSTMSERejectionregion:F>Fα=F.10withv1=(k1)=3dfandv2=(nk)=36df

    From Table VII of AppendixB, we find that F.102.25 for 3 and 36 df. Thus, we will reject H0 if F>2.25. (See Figure 7.27.)

    Figure 7.27

    F-test for completely randomized design: golf ball experiment

    The assumptions necessary to ensure the validity of the test are as follows:

    1. The samples of 10 golf balls for each brand are selected randomly and independently.

    2. The probability distributions of the distances for each brand are normal.

    3. The variances of the distance probability distributions for each brand are equal.

  2. The MINITAB printout for the data in Table 7.12 resulting from this completely random­ized design is given in Figure 7.28. The values, MST=931.5, MSE=21.2,and F=43.99, are highlighted on the printout. Since F>2.25, we reject H0. Also, the p-value of the test (.000) is highlighted on the printout. Since α=.10 exceeds the p-value, we draw the same conclusion: Reject H0. Therefore, at the .10 level of significance, we conclude that at least two of the brands differ with respect to mean distance traveled when struck by the driver.

    Figure 7.28

    MINITAB ANOVA for completely randomized design

Look Ahead

Now that we know that mean distances differ, a logical follow-up question is: “Which ball brand travels farther, on average, when hit with a driver?” We discuss a method for ranking treatment means in an ANOVA later in this section.

Now Work Exercise 7.110

The results of an analysis of variance (ANOVA) can be summarized in a simple tabular format similar to that obtained from the MINITAB program in Example 7.10. The general form of the table is shown in Table 7.13, where the symbols df, SS, and MS stand for degrees of freedom, sum of squares, and mean square, respectively. Note that the two sources of variation, Treatments and Error, add to the total sum of squares, SS(Total). The ANOVA summary table for Example 7.14 is given in Table 7.14, and the partitioning of the total sum of squares into its two components is illustrated in Figure 7.29.

Table 7.13 General ANOVA Summary Table for a Completely Randomized Design

Alternate View
Source df SS MS F
Treatments k1 SST MST=SSTk1 MSTMSE
Error nk SSE MSE=SSEnk
Total n1 SS(Total)

Table 7.14 ANOVA Summary Table for Example 7.10

Alternate View
Source df SS MS F p-Value
Brands  3 2,794.39 931.46 43.99 .000
Error 36 762.30  21.18
Total 39 3,556.69

Figure 7.29

Partitioning of the total sum of squares for the completely randomized design

GOLFCRD Example 7.11 Checking the ANOVA Assumptions

Problem

  1. Refer to the completely randomized ANOVA design conducted in Example 7.10. Are the assumptions required for the test approximately satisfied?

Solution

  1. The assumptions for the test are repeated as follows:

    1. The samples of golf balls for each brand are selected randomly and independently.

    2. The probability distributions of the distances for each brand are normal.

    3. The variances of the distance probability distributions for each brand are equal.

    Since the sample consisted of 10 randomly selected balls of each brand and since the robotic golfer Iron Byron was used to drive all the balls, the first assumption of independent random samples is satisfied. To check the next two assumptions, we will employ two graphical methods presented in Chapter 2: histograms and box plots. A MINITAB histogram of driving distances for each brand of golf ball is shown in Figure 7.30, and SAS box plots are shown in Figure 7.31.

    Figure 7.30

    MINITAB histograms for golf ball driving distances

    Figure 7.31

    SAS box plots for golf ball distances

    The normality assumption can be checked by examining the histograms in Figure 7.30. With only 10 sample measurements for each brand, however, the displays are not very informative. More data would need to be collected for each brand before we could assess whether the distances come from normal distributions. Fortunately, analysis of variance has been shown to be a very robust method when the assumption of normality is not satisfied exactly. That is, moderate departures from normality do not have much effect on the significance level of the ANOVA F-test or on confidence coefficients. Rather than spend the time, energy, or money to collect additional data for this experiment in order to verify the normality assumption, we will rely on the robustness of the ANOVA methodology.

    Box plots are a convenient way to obtain a rough check on the assumption of equal variances. With the exception of a possible outlier for Brand D, the box plots in Figure 7.31 show that the spread of the distance measurements is about the same for each brand. Since the sample variances appear to be the same, the assumption of equal population variances for the brands is probably satisfied. Although robust with respect to the normality assumption, ANOVA is not robust with respect to the equal-variances assumption. Departures from the assumption of equal population variances can affect the associated measures of reliability (e.g., p-values and confidence levels). Fortunately, the effect is slight when the sample sizes are equal, as in this experiment.

Now Work Exercise 7.114

Although graphs can be used to check the ANOVA assumptions as in Example 7.11, no measures of reliability can be attached to these graphs. When you have a plot that is unclear as to whether an assumption is satisfied, you can use formal statistical tests that are beyond the scope of this text. Consult the references at the end of the chapter for information on these tests. When the validity of the ANOVA assumptions is in doubt, nonparametric statistical methods are useful.

What Do You Do When the Assumptions Are Not Satisfied for the Analysis of Variance for a Completely Randomized Design?

Answer: Use a nonparametric statistical method such as the Kruskal-Wallis H-Test. Consult the references to learn more about this method.

We conclude this section by making two important points about an analysis of variance. First, recall that we performed a hypothesis test for the difference between two means in Section 7.2 using a two-sample t-statistic for two independent samples. When two independent samples are being compared, the t- and F-tests are equivalent. To see this, apply the formula for t to the two samples of SAT scores in Table 7.11:

t=x¯1x¯2sp2(1n1+1n2)=590550(62.5)(15+15)=405=8

Here, we used the fact that sp2=MSE, which you can verify by comparing the formulas. Recall that the calculated F for the two samples in Table 7.11 is F=64. This value equals the square of the calculated t for the same samples (t=8). Likewise, the critical F-value (5.32) equals the square of the critical t-value at the two-sided .05 level of significance ( t.025=2.306 with 8 df). Since both the rejection region and the calculated values are related in the same way, the tests are equivalent. Moreover, the assumptions that must be met to ensure the validity of the t- and F-tests are the same:

  1. The probability distributions of the populations of responses associated with each treatment must all be normal.

  2. The probability distributions of the populations of responses associated with each treatment must have equal variances.

  3. The samples of experimental units selected for the treatments must be random and independent.

In fact, the only real difference between the tests is that the F-test can be used to compare more than two treatment means, whereas the t-test is applicable to two samples only.

For our second point, refer to Example 7.10. Our conclusion that at least two of the brands of golf balls have different mean distances traveled when struck with a driver leads naturally to the following questions: Which of the brands differ? and How are the brands ranked with respect to mean distance?

One way to obtain this information is to construct a confidence interval for the difference between the means of any pair of treatments, using the method of Section 7.2. For example, if a 95% confidence interval for μAμC in Example 7.10 is found to be (24,13), we are confident that the mean distance for Brand C exceeds the mean for Brand A (since all differences in the interval are negative). Constructing these confidence intervals for all possible pairs of brands allows you to rank the brand means. A method for conducting these multiple comparisons—one that controls for Type I errors—is beyond the scope of this introductory text. Consult the references to learn more about this methodology.

Exercises 7.103–7.121

Understanding the Principles

  1. 7.103 Explain how to collect the data for an independent sampling design.

  2. 7.104 What conditions are required for a valid ANOVA F-test?

  3. 7.105 True or False. The ANOVA method is robust when the assumption of normality is not exactly satisfied in a completely randomized design.

Learning the Mechanics

  1. 7.106 Use Table VI , VII, IX, and X of Appendix B or statistical software to find each of the following F values:

    1. F.05,v1=3,v2=4

    2. F.01,v1=3,v2=4

    3. F.10,v1=20,v2=40

    4. F.025,v1=12,v2=9

  2. 7.107 Consider dot plots A and B shown below. Assume that the two samples represent independent random samples corresponding to two treatments in a completely randomized design.

    1. In which dot plot is the difference between the sample means small relative to the variability within the sample observations? Justify your answer.

    2. Calculate the treatment means (i.e., the means of samples 1 and 2) for both dot plots.

    3. Use the means to calculate the sum of squares for treatments (SST) for each dot plot.

    4. Calculate the sample variance for each sample and use these values to obtain the sum of squares for error (SSE) for each dot plot.

    5. Calculate the total sum of squares [SS(Total)] for the two dot plots by adding the sums of squares for treatment and error. What percentage of SS(Total) is accounted for by the treatments—that is, what percentage of the total sum of squares is the sum of squares for treatment—in each case?

    6. Convert the sums of squares for treatment and error to mean squares by dividing each by the appropriate number of degrees of freedom. Calculate the F-ratio of the mean square for treatment (MST) to the mean square for error (MSE) for each dot plot.

    7. Use the F-ratios to test the null hypothesis that the two samples are drawn from populations with equal means. Take α=.05.

    8. What assumptions must be made about the probability distributions corresponding to the responses for each treatment in order to ensure the validity of the F-tests conducted in part g?

    9. Conduct a two-sample t-test (Section 7.2) of the null hypothesis that the two treatment means are equal for each dot plot. Use α=.05 and two-tailed tests. Verify that the F-test and t-test results are equivalent.

    10. Complete the following ANOVA table for each of the two dot plots:

      Alternate View
      Source df SS MS F
      Treatments
      Error
      Total
  3. L07108 7.108 The data in the following table resulted from an experiment that utilized a completely randomized design:

    Treatment 1 Treatment 2 Treatment 3
    3.9 5.4 1.3
    1.4 2.0  .7
    4.1 4.8 2.2
    5.5 3.8
    2.3 3.5
    1. Use statistical software (or the formulas in Appendix B) to complete the following ANOVA table:

      Alternate View
      Source df SS MS F
      Treatments
      Error
      Total
    2. Test the null hypothesis that μ1=μ2=μ3, where μi represents the true mean for treatment i, against the alternative that at least two of the means differ. Use α=.01.

Applying the Concepts—Basic

  1. 7.109 Treating cancer with yoga. According to a study funded by the National Institutes of Health, yoga classes can help cancer survivors sleep better. The study results were presented at the June 2010 American Society of Clinical Oncology’s annual meeting. Researchers randomly assigned 410 cancer patients (who had finished cancer therapy) to receive either their usual follow-up care or attend a 75-minute yoga class twice per week. After four weeks, the researchers measured the level of fatigue and sleepiness experienced by each cancer survivor. Those who took yoga were less fatigued than those who did not.

    1. Assume the patients are numbered 1 through 410. Use the random number generator of a statistical software package to randomly assign each patient to either receive the usual follow-up care or to attend yoga classes. Assign 205 patients to each treatment.

    2. Consider the following treatment assignment scheme. The patients are ranked according to severity of cancer and the most severe patients are assigned to the yoga class while the others are assigned to receive their usual follow-up care. Comment on the validity of the results obtained from such an assignment.

  2. 7.110 Whales entangled in fishing gear. Entanglement of marine mammals (e.g., whales) in fishing gear is considered a significant threat to the species. A study published in Marine Mammal Science (Apr. 2010) investigated the type of net most likely to entangle a certain species of whale inhabiting the East Sea of Korea. A sample of 207 entanglements of whales in the area formed the data for the study. These entanglements were caused by one of three types of fishing gear: set nets, pots, and gill nets. One of the variables investigated was body length (in meters) of the entangled whale.

    1. Set up the null and alternative hypotheses for determining whether the average body length of entangled whales differs for the three types of fishing gear.

    2. An ANOVA F-test yielded the following results: F=34.81,pvalue<.0001. Interpret the results for α=.05.

  3. 7.111 Does the media influence your attitude toward tanning? Dermatologists’ primary recommendation to prevent skin cancer is minimal exposure to the sun. Yet models used in product advertisements are typically well-tanned. Do such advertisements influence a consumer’s attitude toward tanning? University of California and California State University researchers designed an experiment to investigate this phenomenon and published their results in Basic and Applied Social Psychology (May 2010). College student participants were randomly assigned to one of three conditions: (1) view product advertisements featuring models with a tan, (2) view product advertisements featuring models without a tan, or (3) view products advertised with no models (control group). The objective was to determine whether the mean attitude toward tanning differs across the three conditions. A tanning attitude index (measured on a scale of 0 to 5 points) was recorded for each participant. The results are summarized in the accompanying table.

    Alternate View
    Tanned Models Models with No Tan No Models
    Sample size 56 56 56
    Mean 2.40 2.11 2.50
    Standard deviation  .85  .73  .82

    Source: Mahler, H., et al. “Effects of media images on attitudes toward tanning.” ­Basic and Applied Social Psychology, Vol. 32, No. 2, May 2010 (adapted from Table 1).

    1. Identify the type of experimental design utilized by the researchers.

    2. Identify the experimental units, dependent variable, and treatments for the design.

    3. Set up the null hypothesis for a test to compare the treatment means.

    4. The sample means shown in the table are obviously different. Explain why the researchers should not use these means alone to test the hypothesis, part c.

    5. The researchers conducted an ANOVA on the data and reported the following results:

      F=3.60,p-value=.03. Carry out the test, part c. Use α=.05 to draw your conclusion.

    6. What assumptions are required for the inferences derived from the test to be valid?

  4. 7.112 Evaluation of flexography printing plates. Flexography is a printing process used in the packaging industry. The process is popular because it is cost-effective and can be used to print on a variety of surfaces (e.g., paperboard, foil, and plastic). A study was conducted to determine if flexography exposure time has an impact on the quality of the printing (Journal of Graphic Engineering and Design, Vol. 3, 2012). Four different exposure times were studied: 8, 10, 12, and 14 minutes. A sample of 36 print images were collected at each exposure time level, for a total of 144 print images. The measure of print quality used was dot area (hundreds of dots per square millimeter). The data were subjected to an analysis of variance, with partial results shown in the accompanying table.

    Alternate View
    Source df SS MS F p-value
    Exposure   3 .010 <.001
    Error 140 .029
    Total 143 .039
    1. Compute the missing entries in the ANOVA table.

    2. Is there sufficient evidence to indicate that mean dot area differs depending on the exposure time? Use α=.05.

  5. 7.113 Robots trained to behave like ants. Robotics researchers investigated whether robots could be trained to behave like ants in an ant colony (Nature, Aug. 2000). Robots were trained and randomly assigned to “colonies” (i.e., groups) consisting of 3, 6, 9, or 12 robots. The robots were assigned the tasks of foraging for “food” and recruiting another robot when they identified a resource-rich area. One goal of the experiment was to compare the mean energy expended (per robot) of the four different sizes of colonies.

    1. What type of experimental design was employed?

    2. Identify the treatments and the dependent variable.

    3. Set up the null and alternative hypotheses of the test.

    4. The following ANOVA results were reported: F=7.70, numerator df=3, denominator df=56, p-value<.001. Conduct the test at a significance level of α=.05 and interpret the result.

  6. SCOURING 7.114 Soil scouring and overturned trees. Trees that grow in flood plains are susceptible to overturning. This is typically due to floodwaters exposing the tree roots (called soil scouring). Environmental engineers at Saitama University (Japan) investigated the impact of soil scouring on the characteristics of overturned and uprooted trees (Landscape Ecology Engineering, Jan. 2013). Tree pulling experiments were conducted in the floodplains of the Komagama river. Trees were randomly selected to be uprooted in each of three areas that had different scouring conditions: no scouring (NS), shallow scouring (SS), and deep scouring (DS). During the uprooting of the trees, the maximum resistive bending moment at the trunk base (kiloNewton-meters) was measured. Simulated data for five medium-sized trees selected at each area are shown in the table, followed by a MINITAB printout of the analysis of the data. Interpret the results. Does soil scouring affect the mean maximum resistive bending moment at the tree trunk base?

    None Shallow Deep
    23.68 11.13  4.27
     8.88 29.19  2.36
     7.52 13.66  8.48
    25.89 20.47 12.09
    22.58 23.24  3.46

Applying the Concepts—Intermediate

  1. TVADS 7.115 Study of recall of TV commercials. Do TV shows with violence and sex impair memory for commercials? To answer this question, researchers conducted a designed experiment in which 324 adults were randomly assigned to one of three viewer groups of 108 participants each (Journal of Applied Psychology, June 2002). One group watched a TV program with a violent content code (V) rating, the second group viewed a show with a sex content code (S) rating, and the last group watched a neutral TV program. Nine commercials were embedded into each TV show. After viewing the program, each participant was scored on his/her recall of the brand names in the commercial messages, with scores ranging from 0 (no brands recalled) to 9 (all brands recalled). The data (simulated from information provided in the article) are saved in the TVADS file. The researchers compared the mean recall scores of the three viewing groups with an analysis of variance for a completely randomized design.

    1. Identify the experimental units in the study.

    2. Identify the dependent (response) variable in the study.

    3. Identify the factor and treatments in the study.

    4. The sample mean recall scores for the three groups were x¯v=2.08,x¯s=1.71, and x¯Neutral=3.17. Explain why one should not draw an inference about differences in the population mean recall scores on the basis of only these summary statistics.

    5. An ANOVA on the data yielded the results shown in the accompanying MINITAB printout. Locate the test statistic and p-value on the printout.

    6. Interpret the results from part e, using α=0.01. What can the researchers conclude about the three groups of TV ad viewers?

  2. FATIGUE 7.116 Improving driving performance while fatigued. Can a secondary task—such as a word association task—­improve your performance when driving while fatigued? This was the question of interest in a Human Factors (May 2014) study. The researchers used a driving simulator to obtain their data. Each of 40 college students was assigned to drive a long distance in the simulator. However, the student-drivers were divided into four groups of 10 drivers each. Group 1 performed the verbal task continuously (continuous verbal condition); Group 2 performed the task only at the end of the drive (late verbal condition); Group 3 did not perform the task at all (no verbal condition); and Group 4 listened to a program on the car radio (radio show condition). At the end of the simulated drive, drivers were asked to recall billboards that they saw along the way. The percentage of billboards recalled by each student-driver is provided in the next table. Use the information in the accompanying SPSS printout to determine if the mean recall percentage differs for student-drivers in the four groups. Test using α=.01.

    Alternate View
    Continuous Verbal Late Verbal No Verbal Radio Show
    14 57 64 37
    63 64 83 45
    10 66 54 87
    29 18 59 62
    37 95 60 14
    60 52 39 46
    43 58 56 59
     4 92 73 45
    36 85 78 45
    47 47 73 50
  3. FERMENT 7.117 Effects of temperature on ethanol production. A low-cost and highly productive bio-fuel production method is high-temperature fermentation. However, heat stress can inhibit the amount of ethanol produced during the process. In Engineering Life Sciences (Mar. 2013), biochemical engineers carried out a series of experiments to assess the effect of temperature on ethanol production during fermentation. The maximum inhibitory concentration of ethanol (grams per liter) was measured at four different temperatures (30°, 35°, 40°, and 45° Celsius). The experiment was replicated 3 times, with the data shown in the table. (Note: The data are simulated based on information provided in the journal article.) Do the data indicate that high temperatures inhibit mean concentration of ethanol? Test using α=.10.

    Alternate View
    Temperature
    30° 35° 40° 45°
    103.3 101.7 97.2 55.0
    103.4 102.0 96.9 56.4
    101.0 101.1 96.2 54.9
  4. DRINKS 7.118 Restoring self-control when intoxicated. Does coffee or some other form of stimulation really allow a person suffering from alcohol intoxication to “sober up”? Psychologists investigated the matter in Experimental and Clinical Psychopharmacology (Feb. 2005). A sample of 44 healthy male college students participated in the experiment. Each student was asked to memorize a list of 40 words (20 words on a green list and 20 words on a red list). The students were then randomly assigned to one of four different treatment groups (11 students in each group). Students in three of the groups were each given two alcoholic beverages to drink prior to performing a word completion task. Students in Group A received only the alcoholic drinks. Participants in Group AC had caffeine powder dissolved in their drinks. Group AR participants received a monetary award for correct responses on the word completion task. Students in Group P (the placebo group) were told that they would receive alcohol, but instead received two drinks containing a carbonated beverage (with a few drops of alcohol on the surface to provide an alcoholic scent). After consuming their drinks and resting for 25 minutes, the students performed the word completion task. Their scores (simulated on the basis of summary information from the article) are reported in the table. (Note: A task score represents the difference between the proportion of correct responses on the green list of words and the proportion of incorrect responses on the red list of words.)

    Alternate View
    AR AC A P
    .51 .50 .16 .58
    .58 .30 .10 .12
    .52 .47 .20 .62
    .47 .36 .29 .43
    .61 .39 .14 .26
    .00 .22 .18 .50
    .32 .20 .35 .44
    .53 .21 .31 .20
    .50 .15 .16 .42
    .46 .10 .04 .43
    .34 .02 .25 .40

    Based on Grattan-Miscio, K. E., and Vogel-Sprott, M. “Alcohol, intentional control, and inappropriate behavior: Regulation by caffeine or an incentive.” Experimental and Clinical Psychopharmacology, Vol. 13, No. 1, Feb. 2005 (Table 1).

    1. What type of experimental design is employed in this study?

    2. Analyze the data for the researchers, using α=.05. Are there differences among the mean task scores for the four groups?

    3. What assumptions must be met in order to ensure the validity of the inference you made in part b?

  5. COUGH 7.119Is honey a cough remedy? Pediatric researchers carried out a designed study to test whether a teaspoon of honey before bed calms a child’s cough and published their results in Archives of Pediatrics and Adolescent Medicine (Dec. 2007). (This experiment was first described in Exercise 2.40 , p. 51). A sample of 105 children who were ill with an upper respiratory tract infection and their parents participated in the study. On the first night, the parents rated their children’s cough symptoms on a scale from 0 (no problems at all) to 6 (extremely severe) in five different areas. The total symptoms score (ranging from 0 to 30 points) was the variable of interest for the 105 patients. On the second night, the parents were instructed to give their sick child a dosage of liquid “medicine” prior to bedtime. Unknown to the parents, some were given a dosage of dextromethorphan (DM)—an over-the-counter cough medicine—while others were given a similar dose of honey. Also, a third group of parents (the control group) gave their sick children no dosage at all. Again, the parents rated their children’s cough symptoms, and the improvement in total cough symptoms score was determined for each child. The data (improvement scores) for the study are shown in the next table. The goal of the researchers was to compare the mean improvement scores for the three treatment groups.

    1. Identify the type of experimental design employed. What are the treatments?

    2. Conduct an analysis of variance on the data and interpret the results.

      Alternate View
      Honey Dosage: 12 11 15 11 10 13 10 4 15 16 9 14 10
      6 10 8 11 12 12 8 12 9 11 15 10 15
      9 13 8 12 10 8 9 5 12
      DM Dosage: 4 6 9 4 7 7 7 9 12 10 11 6 3
      4 9 12 7 6 8 12 12 4 12 13 7 10
      13 9 4 4 10 15 9
      No Dosage (Control): 5 8 6 1 0 8 12 8 7 7 1 6 7
      7 12 7 9 7 9 5 11 9 5 6 8 8
      6 7 10 9 4 8 7 3 1 4 3

      Based on Paul, I. M., et al. “Effect of honey, dextromethorphan, and no treatment on nocturnal cough and sleep quality for coughing children and their parents,” Archives of Pediatrics and Adolescent Medicine, Vol. 161, No. 12, Dec. 2007 (data simulated).

  6. NAME 7.120 The “name game.” Psychologists evaluated three methods of name retrieval in a controlled setting (Journal of Experimental Psychology—Applied, June 2000). A sample of 139 students was randomly divided into three groups, and each group of students used a different method to learn the names of the other students in the group. Group 1 used the “simple name game,” in which a student states his/her name, and the names of all students who proceeded him/her. Group 2 used the “elaborate name game,” a modification of the simple name game such that the students state not only their names, but also their favorite activity (e.g., sports). Group 3 used “pairwise introductions,” according to which students are divided into pairs and each student must introduce the other member of the pair. One year later, all subjects were sent pictures of the students in their group and asked to state the full name of each. The researchers measured the percentage of names recalled by each student respondent. The data (simulated on the basis of summary statistics provided in the research article) are shown in the table. Conduct an analysis of variance to determine whether the mean percentages of names recalled differ for the three name-­retrieval methods. Use α=.05.

    Alternate View
    Simple Name Game
    24 43 38 65 35 15 44 44 18 27 0 38 50 31
    7 46 33 31 0 29 0 0 52 0 29 42 39 29
    51 0 42 20 37 51 0 30 43 30 99 39 35 19
    24 34 3 60 0 29 40 40
    Elaborate Name Game
    39 71 9 86 26 45 0 38 5 53 29 0 62 0
    1 35 10 6 33 48 9 26 83 33 12 5 0 0
    25 36 39 1 37 2 13 26 7 35 3 8 55 50
    Pairwise Introductions
    5 21 22 3 32 29 32 0 4 41 0 27 5 9
    66 54 1 15 0 26 1 30 2 13 0 2 17 14
    5 29 0 45 35 7 11 4 9 23 4 0 8 2
    18 0 5 21 14

    Source: Morris, P. E., and Fritz, C. O. “The name game: Using retrieval practice to improve the learning of names,” Journal of Experimental Psychology—­Applied, Vol. 6, No. 2, June 2000 (data simulated from Figure 1).

Applying the Concepts—Advanced

  1. 7.121 Animal-assisted therapy for heart patients. Refer to the American Heart Association Conference (Nov. 2005) study to gauge whether animal-assisted therapy can improve the physiological responses of heart failure patients, presented in Exercise 2.112 (p. 78). Recall that 76 heart patients were randomly assigned to one of three groups. Each patient in group T was visited by a human volunteer accompanied by a trained dog, each patient in group V was visited by a volunteer only, and the patients in group C were not visited at all. The anxiety level of each patient was measured (in points) both before and after the visits. The accompanying table gives summary statistics for the drop in anxiety level for patients in the three groups. The mean drops in anxiety levels of the three groups of patients were compared with the use of an analysis of variance. Although the ANOVA table was not provided in the article, sufficient information is given to reconstruct it.

    Alternate View
    Sample Size Mean Drop Std. Dev.
    Group T: Volunteer+Trained Dog 26 10.5 7.6
    Group V: Volunteer only 25  3.9 7.5
    Group C: Control group (no visit) 25  1.4 7.5

    Based on Cole, K., et al. “Animal assisted therapy decreases hemodynamics, plasma epinephrine and state anxiety in hospitalized heart failure patients.” American Journal of Critical Care, 2007, 16: 575–585.

    1. Compute SST for the ANOVA, using the formula (on p. 513)

      SST=i=13ni(x¯ix¯)2

      where x is the overall mean drop in anxiety level of all 76 subjects. [Hint: x¯=(Σi=13ni)(x¯i)/76.]

    2. Recall that SSE for the ANOVA can be written as

      SSE=(n11)s12+(n21)s22+(n31)s32

      where s12,s22, and s32 are the sample variances associated with the three treatments. Compute SSE for the ANOVA.

    3. Use the results from parts a and b to construct the ANOVA table.

    4. Is there sufficient evidence (at α=.01 ) of differences among the mean drops in anxiety levels by the patients in the three groups?

    5. Comment on the validity of the ANOVA assumptions. How might this affect the results of the study?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset