Chapter 9

Analysis of Variance

Preview: Testing for the differences between two means is a relatively straightforward exercise, but what happens when there are three or more groups? These multiple groups may have means that differ significantly from one another, which makes the comparison process much more complicated. Fortunately, there is a way to test the means no matter how many groups are involved. For example, if a researcher wanted to study the impact of listening to music on student test scores, she could randomly divide the class into three groups of students. One group would listen to popular music while they study or do homework. The second group would listen to classical music while studying and the third group would study in silence. At the conclusion of the study period, the performance of students in each group is evaluated. The analysis of variance (ANOVA) procedure is then used to determine whether there is a significant difference among a group of means, and Excel will do all calculations for you automatically.

Learning Objectives: At the conclusion of this chapter, you should be able to:

  1. Use a factor to partition an experiment into multiple groups
  2. Understand the assumptions to check before using a one-way ANOVA
  3. Use the one-way ANOVA to test for differences among the means of several groups
  4. Understand how to obtain information about two-factor ANOVA

Introduction

We have already introduced a test for the difference of (two) means. But in many cases there are three or more groups whose means may or may not differ significantly.

Example: Many parents complain that students listen to pop music while they study. To test the impact of music on students’ concentration levels, we divide 40 students randomly into three groups: Group A does not listen to any music, group B listens to pop music, and group C listens to classical music. All students study a text for 30 min and then they take a test about their understanding of the text.

There are two types of variables in this example: the performance of students on a test and the type of music students listen to. The first variable (quiz_score) is numerical, while the second (music_type) is categorical and is used to partition the first variable into three groups. This, as it turns out, is typical for an ANOVA test to compare multiple means. Note that to decide whether multiple means of various groups are different from each other, we could use multiple difference of means tests between pairs of means, but that would quickly escalate.

  • For three groups we would need three comparisons: m1 m2, m1 m3, and m2 m3.
  • For four groups we would need six comparisons: m1 m2, m1 m3, m1 m4, m2 m3, m2 m4, and m3 m4.
  • For five groups we would need 10 comparisons, and so on.

This quickly becomes a lot of work. In addition—and perhaps more important—each time we conduct a difference of means test we accept an error, typically 5 percent. These errors add up, approximately, so that checking three group means would add up to an error of 15 percent. Thus, we need a new procedure that keeps the error at a constant level even if we are comparing quite a number of means.

Definition: The ANOVA procedure is used to test for a significant difference among a group of means. The null hypothesis is: H0: m1 = m2 = = mk and the alternative is that at least two means differ significantly: Ha: mj mi.

Note that in case the ANOVA procedure is used for only two means it should reduce to a difference of means test. It might sound strange for a procedure that claims to test for differences of means to be called ANOVA. However, it turns out that we can analyze the variances yet draw conclusions about any difference of means.

The procedure works by comparing the variance SSB between group means against the variance SSW within groups to determine whether the groups are all part of one larger population (no difference between means) or separate populations with different characteristics (at least two means differ significantly).

One-Factor ANOVA

Suppose we have quiz scores of students in a statistics course as follows:

2 (male), 5 (female), 1 (male), 3 (male), 6 (female), 7 (female).

We want to know if there is a difference in mean scores between males and females. Since these represent two samples only, we could use our familiar difference of means test (which incidentally results in a t-value of t = 4.0 and a p-value of p = 0.016) but we want to use a different method that will readily extend to more than two groups and their means. First, we use sex to divide the scores into two groups, as follows.

Male

Female

2

6

1

5

3

7

Definition: The variable we are interested in analyzing is called the dependent variable, while the variable we use to divide the dependent variable into groups is called the independent variable or factor. For ANOVA, the dependent variable should be numerical while the factor should be categorical. A single-factor ANOVA uses one factor to divide the dependent variable into groups.

For a single-factor ANOVA we will assume that all groups are approximately normal with roughly the same standard deviations. In the preceding example we are using one factor (sex) to define the groups; consequently we will perform a single-factor ANOVA. Both groups have sample variances of 1, as you can readily compute. The samples are really too small to check for normality but for this example we will simply assume everything is normal. If the variable used as factor had more values (categories), it would result in that many groups; still it would be a single-factor ANOVA.

We now compare the variance of our overall data with the variances of each group. In fact, since the variance is a function of the square difference of each point to the respective mean, we compute only the sum of the square differences to the mean instead of the full variance (see Table 9.1).

As you can see, the means for the two groups are quite different. The sums of square differences within each group are 2. Adding them together, we get SSW = 4. If we ignore group membership and compute the total sum of square differences SST based on the overall mean 4, we get SST = 28. In other words, the variance (aka sums of squares) SSW within the groups is much smaller than the total variability SST, which indicates that the means are indeed different.

More precisely, under the null hypothesis that there are no mean differences between groups in the population, we would expect only minor random fluctuation in the means of the two groups. Therefore, under the null hypothesis, it turns out that the variance SSW within groups divided by the degree s of freedom within groups should be about the same as the variance SSB between groups divided by the degrees of freedom between groups. We can compare those expressions via the F distribution and test whether their ratio is significantly greater than 1. Here is the formal definition of a single-factor ANOVA.

Table 9.1 Group and total variances or sum of square (SS) differences to the mean

Male

Female

Data

2

6

1

5

3

7

Mean

2

6

Group SS

(2 2)2 + (1 2)2 + (3 1)2 = 2

(6 6)2 + (5 6)2 + (7 6)2 = 2

Total mean

4

Total SS

(2 4)2 + (1 4)2 + (3 4)2 + (6 4)2 + (5 4)2 + (7 4)2 = 28

Definition: Suppose we have k groups and each group contains nj measurements, 1 j k. Assume that all groups are approximately normal and that their standard deviations are approximately the same. The four components of a single-factor ANOVA are:

H0: m1 = m1 = = mk,

Ha: mi mj for some i, j.

Test statistics: 106311.jpg, where dfB = k 1, dfW = (n1 1) + (n2 1) + + (nk 1), sum of squares within groups SSW = SS1 + SS2 + + SSk, the total sum of squares SST , and the sum of squares between groups SSB = SST SSW .

Rejection region: Reject H0 if p = P( f > f0) < a, where P( f > f0) = FDIST( f0,dfB,dfW ) using the F distribution.

Now we can finish the preceding example: dfB = 2 1 = 1, dfW = (3 1) + (3 − 1) = 4, SST = 28, SSW = 2 + 2 = 4, SSB = SST SSW = 28 4 = 24. At this point we know the variance within groups SSW as well as the variance between groups SSB so that 106362.jpg and p = P( f > f0) = FDIST(24,1,4) = 0.008.

Hence, assuming as usual that a = 0.05, we reject the null hypothesis and conclude that there is a difference between the means. Incidentally, this would have been our conclusion had we used the difference of means procedure. Now we are ready for a true ANOVA, that is, for an example with more than two groups.

Example: We want to determine if a new drug is effective in lowering blood pressure, and what dosage might work best. So we give three different levels of the drug (zero drugs, low amount, high amount) to 20 patients and measure the difference in blood pressure before and 30 min after administering the drug. The 20 people were randomly assigned to receive one of the three dosages. The results are:

Zero dosage: 4, 1, 7, 8, 2, 10

Low dosage: 11, 15, 12, 13, 18, 16, 14

High dosage: 15, 17, 12, 13, 10, 11, 10.

Is there a significant difference between the three averages? Test at the a = 0.05 level.

The computations for our ANOVA test are as follows:

106520.jpg

106530.jpg

106538.jpg

106545.jpg

so that 106555.jpg

106565.jpg

so that 106574.jpg

106582.jpg so that, finally:

106591.jpg and therefore

106598.jpg and 106605.jpg.

Thus, we reject the null hypothesis and accept the alternative, that is, there is a significant difference between (at least two of) the means.

This is a lot of work and it is very easy to make mistakes. Fortunately, Excel has a procedure to perform these calculations automatically.

Exercise: Given the data from the previous example, use Excel to perform an ANOVA to decide whether the means differ significantly.

First, we enter the data into Excel in three columns, one column per group (see Figure 9.1).

Next, click on the “Data Analysis” on the Data ribbon, select “ANOVA: Single factor,” and define as input range the data you entered, including the data labels in the first row. Make sure that “Labels in First Row” is checked and hit OK. The output of the ANOVA procedure is shown in Figure 9.2.

fig9.1.jpg

Figure 9.1 Data entered in columns for each group

fig9.2.jpg

Figure 9.2 Output of single-factor ANOVA routine

The SUMMARY block in Figure 9.2 shows the means and variances of the three groups. In the more interesting ANOVA section we can see that SSB = 279.04 and SSW = 139.90. It lists the degrees of freedom next and finally shows the f0 value of 16.95 together with the associated probability p = 0.000089, computed using the F distribution. Our conclusion is, just as before, to reject the null hypothesis and accept that some means differ significantly from each other.

We will show another, carefully worked out problem in the last section of this chapter; you can check that if you have any questions. You might want to work out that example manually, as we did earlier, and compare your answers.

But the idea of the ANOVA is even more powerful and applies to more complex experiments.

Two-Way ANOVA

As we mentioned, a one-factor ANOVA uses one categorical variable called factor to divide the dependent variable into groups. But complex situations often depend on more factors, which will result in two-way ANOVA or higher. As it turns out, the ANOVA procedure cannot only decide on whether means are different but detect interaction effects between variables, and therefore can be used to test more complex hypotheses about reality. As it turns out, in many areas of research five-way or higher interactions are not that uncommon.

However, this discussion is difficult and, while most certainly useful, is beyond the scope of this text. Thus, we will refer the reader to more advanced statistical textbooks, or better yet, appropriate online resources and we will not discuss two-way (or higher) ANOVA here. For more information, the reader can check:

Excel Demonstration

Company S, the accounting firm, has an HR department that has suggested managers learn different leadership styles to help decrease employee stress levels. An experiment was conducted to determine if leadership styles (transformational and transactional) in management significantly impacted employee stress levels compared to no intervention. An employee survey was given to three groups: one group under a manager who started using a transformational motivation technique, one group under a manager who began using a transactional motivation technique, and then a control group with no new technique used. Stress levels were self-reported by the employees as shown in Table 9.2. Use alpha = 0.05.

Step 1: Start a new Excel worksheet and enter the preceding data (see Figure 9.3).

Step 2: Click on DATA, then Data Analysis, and then ANOVA: Single Factor. We are choosing Single Factor because we have only one dependent variable (stress level) that defines three categories of independent variables (transformational, transactional, control).

Step 3: In the dialog that comes up, select all of the data including the labels at once as shown in Figure 9.3. Check the box called “Labels in First Row.” Notice that Alpha is automatically set to 0.05. Click OK. The output of this procedure is shown in Figure 9.4.

In a review of the p-value in the between groups row, we see a p = 0.02 that is less than our alpha level of 0.05, so we know there is a significant difference in stress levels between the groups.

Table 9.2 Data for stress levels

Transformational

Transactional

Control

0

2

6

7

5

5

3

3

8

5

0

9

2

1

5

fig9.4.jpg

Figure 9.3 Data for stress levels of employees together with ANOVA dialog

fig9.4.jpg

Figure 9.4 Output of the single-factor ANOVA procedure

Now that we know there is a general difference between all the groups, we would like to run a post hoc test to determine if there is a significant difference between certain groups, like between transformational and control; or between control and transactional; and so on. In Excel, it is difficult to do this. However, a quick and easy way is to run a two-sample t-test between the two groups that you wish to check (only do this if you have determined that there is a significant difference between all groups).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset