Chapter 9
t Tests for Related and Unrelated Data

The chi-square discussion in Chapter 8 provides a means of assessing whether two or more categorical variables are statistically independent of one another. This chapter takes up the t test. The t test is a means of assessing whether a numerical variable—either continuous or discrete—is independent of a categorical variable that takes on only two values. The t test provides another capability. It provides the ability to assess whether a value found from a sample could have come from a population in which a hypothesized value is true. This chapter addresses both of these issues, beginning with the latter.

9.1 What Is a t Test?

To begin with a discussion of the t test, recall the question addressed by the hospital financial officer in Chapter 7. The CFO wished to know the true average cost of a hospital discharge. He decided to take a sample of 100 discharge records. He then calculated the true cost of each hospital stay. In turn, he used the mean cost of this sample of discharges as the mean cost for all 12,000 hospital stays. Suppose that before the financial officer had taken the sample and determined the cost of each, the chief executive officer of the hospital had said to him, “If the average cost of a discharge is $6,000, our charges schedule is fine. If the average cost is more than $6,000, we are going to have to renegotiate our charges schedule with all our payers, and that will be a big hassle. On the other hand, if our cost is less than $6,000, our payers are going to demand that we renegotiate our charges schedule, and that is going to be a hassle also. So I want you to use your sample to determine if our average cost per discharge is $6,000 or not. But I want your best estimate of the true situation, because if it is not $6,000, we need to get to work.”

If the financial officer has had some statistics in graduate school (which, of course, he has), he could tell the CEO that there is no way to determine from a sample if the average cost per discharge is exactly $6,000. He can get an estimate of only the average cost. But the financial officer decides to try to answer the CEO's question in the best way he can, using statistical analysis.

The t test can produce a result that may range from c09-math-0001, although practically speaking it is usually in a more finite range. The =TDIST() function works only with positive values of t. Therefore, when using the =TDIST() function to determine the true probability of a negative t value, it is necessary to change the negative value to a positive one. Because the t distribution is symmetrical with regard to the probability of being at c09-math-0002 or t, the conversion makes no difference. Consequently, if the two-tail probability of a value of −2.3 were desired, the way to determine this would be by using =TDIST(2.3,df,2).

t Tests: Setting Up the Hypotheses

The financial officer posits the CEO's concern in the following two hypotheses:

  1. H0: The average cost per discharge is $6,000.
  2. or
  3. c09-math-0003
  4. and
  5. H1: The average cost per discharge is not $6,000.
  6. or
  7. c09-math-0004

The financial officer knows that the sample mean he obtains for the cost of hospital discharges will be only one of many means he could have gotten, had he taken other samples instead of the one he took. He also knows that the distribution of the means from all those other samples will be approximately a t distribution, with degrees of freedom equal to the sample size minus 1. Can he use this information to test whether the mean cost he discovers from the one sample he gets would lead him to accept or reject H0? The answer is yes. The way this test is carried out is given in Equation 9.1.

where c09-math-0006 is the sample mean, c09-math-0007 is the population value hypothesized by H0, and c09-math-0008 is the standard error of the sample mean.

Chapter 7 discussed a sample of 100 discharges drawn from the hospital discharge records. The average cost per discharge was exactly $6,586.30. Although not mentioned there, the standard deviation for the sample was $5,262.73. The estimate of the standard error of the means, based on a sample size of 100, is then $5,262.73/sqrt(100), or $526.27. Using this sample information, the t test can be used to assess the probability that the true cost per case is $6,000, as shown in Equation 9.2.

Finding t Values in Excel: The =TDIST() Function

The value of c09-math-0010 can now be assessed using the =TDIST() function in Microsoft Excel. The =TDIST() function takes three arguments: the t value, the number of degrees of freedom, and a 1 or a 2. The last argument depends on whether the test is a one-tail test or a two-tail test. Because the CEO is interested in whether the true population mean cost per case is either greater or less than $6,000, a two-tail test is appropriate. If the CEO had cared only if the true cost per case was greater than $6,000, we would have conducted a one-tail test.

Two-tailed tests are always associated with a hypothesis that is testing whether something is either greater or less than a value. One-tailed tests are always associated with a hypothesis that is testing whether something is solely greater than a value or solely less than a value.

The result of the =TDIST() function is shown in Equation 9.3.

where 1.11 is the value of t, 99 is the degrees of freedom c09-math-0012, and 2 indicates the two-tailed test.

Interpreting the t Test Results

The result found in Equation 9.3 indicates that there is a 0.268 probability that H0 is true (i.e., the true cost per case is $6,000), given that the sample mean was $6,586.30 with a standard error of the mean of $526.27. Stated another way, there is a 0.268 probability that we could have gotten a sample value of $6,586.30 from a population in which the true average cost per case is $6,000. As was discussed in Chapter 7, it is usually the case that we are interested in having a small Type I error—or the likelihood of rejecting H0 when it is true. In this case, we would want a small likelihood of rejecting the belief that the cost-per-case average is $6,000 if that belief were true. Usually, we would want the result of the t test to produce a probability of 0.05 or less before we would reject H0. And this is exactly what we will do in this case. Because the probability that H0 is true is as large as 0.268, we will not reject it (even though, having all the data, we know that the true mean is $5,905 and change). This is a t test.

Where Does a t Test Come From?

Where does the t test come from and what does it mean? To try to answer these questions, let us consider another example. Suppose the CEO had said that if the true cost per case is not $5,905.75, change must occur in the hospital. Now we know the true mean is $5,905.75. But the financial officer doesn't know this. He still has to take a sample, calculate the true average cost of the sample, and run a t test to see if the sample mean was likely to have come from a population in which the true mean is $5,905.75. But it is highly unlikely that he will find the sample mean equal to $5,905.75. As was discussed earlier, one possible sample outcome is $6,586.30.

Now, it is essential to remember that in real life, one ever takes only a single sample and one must be satisfied, in general, with the results of that single sample. However, in our case, instead of taking a single sample, let's take a large number of samples—say, 250. We can do this by using the Data Analysis add-in in Excel as was presented earlier in the text. In turn, we can calculate estimates of the mean and standard error for each of these samples and conduct a t test by the formula in Equation 9.1.

The Shape of the t Distribution

It was indicated earlier that the t test results are distributed as a t distribution. If we examined the results of 250 t tests, we would expect a large number of those to produce values in the range of c09-math-0013. (This corresponds to 1 standard deviation on each side of 0, the value that would be generated if there were no difference between the sample mean and H0.) In fact, we would expect that 68 percent of the results of the t tests from our 250 samples would lie between c09-math-0014. Furthermore, we would expect 95 percent to lie between c09-math-0015, for the same reason.

Figure 9.1 is a graph showing the distribution of the 250 t tests, based on the samples taken from the 12,000 hospital discharge records. Exactly 68 percent of the t tests provide results in the range c09-math-0016. Ninety-three percent (rather than the expected 95 percent) of t tests provide results in the range c09-math-0017. Seven percent of the results are outside the range c09-math-0018 for the means of these 250 samples when H0 is the true mean. So for a perfect t distribution, this means that when H0 is the true mean, there is about a 5 percent probability of rejecting H0 when it is true. However, in this example, there is a 7 percent probability of rejecting the null hypothesis, H0. But, in general, the probability of rejecting H0 when it is true is small. The consequence of this is that if the t test produced a value outside the range c09-math-0019, we would typically conclude that the sample did not come from a population in which H0 was true (even though we recognize that there is a small likelihood that it could have).

Bar graph presenting the distribution of 250 t tests when H0 is equal to $5,905.

Figure 9.1 Distribution of 250 t tests when H0 = $5,905

What Does the t Test Mean? The Link to Type I and Type II Errors

Let us look at this result from another perspective. Suppose H0 had been $4,500. That is, suppose the CEO had said, “If the average cost of a hospital stay is not $4,500, we are going to have to make some changes.” Now, when we draw the 250 samples and carry out the t tests, the distribution of the t values is as shown in Figure 9.2. This figure shows that the distribution of t values is centered on 3.5. A t value greater than 2 would lead us to reject H0 and accept the alternative. Thus, we would (correctly) conclude that the population mean is not $4,500.

Bar graph of the distribution of 250 t tests when H0 = $4,500, depicting the distribution of t values centered at 3.5. Values 4 and 5 on x-axis have the same value on its y-axis.

Figure 9.2 Distribution of 250 t tests when H0 = $4,500

But the calculation of the 250 t tests and their distribution as shown in Figure 9.2 also gives us one other type of information of interest. In general, we would accept H0 for any value of t within the range c09-math-0020. Looking at Figure 9.2, it is possible to see that some of the t tests produced t values of less than 2 (the column labels in Figures 9.1 and 9.2 reflect the Excel =FREQUENCY() function convention of putting the frequency between two values at the point of the higher value). In fact, about 9 percent of the 250 t tests produced values of less than 2. In testing the null hypothesis, c09-math-0021, that 9 percent of samples would have led us to accept H0, producing the Type II error of accepting H0 when it is false. So, in this case (where cost is posited to be $4,500), beta is about 0.09.

One-Tail and Two-Tail t Tests

The t test that is discussed in the previous section is a two-tailed test. As was discussed in Chapter 7, assessments of hypotheses can be one-tailed or two-tailed. In a two-tailed test, the region of rejection of H0 can be at either end of the continuum of possible t values. In a one-tailed test, the region of rejection of H0 is at only one end of the continuum. In regard to the question of cost per hospital stay discussed earlier, the discussion is of a two-tail test. This is because the CEO indicated he was concerned about the possibility that the mean value might be either more or less than H0 (in either case, adjustments would have to be made).

But suppose, for example, that the CEO is not interested in knowing whether the true cost is less than $6,000. His only concern is that if the true costs are greater than $6,000, he will have to take steps to renegotiate charges to bring them into line with true costs. So, in this case, the hypotheses posed by the financial officer would be as follows:

  1. H0: The average cost per discharge is $6,000 or less.
  2. or
  3. c09-math-0022
  4. and
  5. H1: The average cost per discharge is greater than $6,000.
  6. or
  7. c09-math-0023

Now the initial hypothesis and the alternative hypothesis are stated in such a way that the null will be rejected only if the sample mean is greater than $6,000. Any sample mean less than $6,000 produces a t test value that will never be in the region of rejection of H0. This is logical, because any sample value less than $6,000 cannot be used to reject H0. The t test to assess H0 and H1 is given in Equation 9.1. But, now, the number of standard deviation units required for rejection of H0 will not be approximately 2 (depending on sample size) but will be closer to 1.7. The lower number of standard deviation units is due to the one-tail nature of the test.

Figures 9.3 and 9.4 show the comparative regions of rejection for a two-tail test (Figure 9.3) and a one-tail test (Figure 9.4). The region of rejection for the two-tail test is outside the two vertical dashed lines at c09-math-0024. If the t test result falls outside c09-math-0025, H0 will be rejected. The region of rejection for the one-tail test is to the right of the dashed line in Figure 9.4 at about 1.7. Any t value larger than 1.7 (but not smaller than −1.7) will lead to the rejection of H0. In the one-tail test, any value lower than 1.7, no matter how low it may be, will not lead to the rejection of H0.

Image described by caption and surrounding text.

Figure 9.3 Region of rejection for two-tail test

Image described by caption and surrounding text.

Figure 9.4 Region of rejection for one-tail test

It is important to keep in mind the direction of H0 when interpreting the result of a one-tail t test. In the case of the hypothesis H0: c ≤ $6,000, any value of t in the negative range will not lead to the rejection of H0. However, if H0 had been stated as c09-math-0026, then any value of t in the positive range would not lead to the rejection of H0.

9.2 A t Test for Comparing Two Groups

Thus far, we have discussed a t test concerned with comparing a sample mean with an a priori hypothesized value. With that in mind, we tested whether the result of a sample could have been taken from a population in which the true mean was H0. But t tests can also be used to compare samples from two potentially different groups to determine if, statistically, they can be seen as different or not.

For example, suppose the financial officer of our hypothetical hospital were asked by the CEO to determine whether the costs of a hospital discharge differed between males and females. Although it may not be possible to negotiate differing charges on the basis of sex, it might still be important for the CEO to know if actual costs differed. So now, the financial officer might be addressing the following two hypotheses:

  1. H0: The average cost per discharge is equal between men and women.
  2. or
  3. c09-math-0049
  4. and
  5. H1: The average cost per discharge is not equal between men and women.
  6. or
  7. c09-math-0050

Selecting the Groups to Test: A Stratified Sampling Approach

Now, the financial officer must be somewhat careful about how he selects his sample of discharges. If the selection is done strictly randomly, there is a small possibility that either no men or no women might be selected for the sample, making it impossible to assess the hypotheses. Luckily, the financial officer knows that he can divide the total 12,000 records into those for men and those for women, so that he can take a sample of each group. This is what is known as a stratified sample, because he will be stratifying the population into men and women before taking a sample from each group.

In general, with a stratified sample, it is best to take samples of equal size from each stratum. This is true if the desire is to compare the two strata and if there is no perception that the variances of the two strata differ. There is no particular reason, a priori, for the financial officer to expect that the costs for men are more or less variable than the costs for women (even though the means may differ). Therefore, the best decision is to take equal-sized samples of men and women. The financial officer believes he can still muster the resources to ascertain the true costs of hospital stays for a total of 100 discharges. Therefore, he decides to take a sample of discharges for 50 men and another for 50 women.

Using a t Test to Compare Two Means

To determine if the cost for men and women is the same (the assessment of H0), it will first be necessary to determine the average cost for each sample of 50. It will then be necessary to use the appropriate t test to determine whether the means of the two samples could have come from two populations in which the true mean of costs was the same. The t test that will be used to determine this when sample sizes are the same is given in Equation 9.4.

where c09-math-0052 is the sample mean for men, c09-math-0053 is the sample mean for women, c09-math-0054 is the sample variance for men, c09-math-0055 is the sample variance for women, and n is equal to the sample size in either group.

Assume that the financial officer has taken a sample of 50 men and 50 women and has recorded an average cost of $6,460.04 for men and $6,177.30 for women, with variances (the square of the standard deviation, or c09-math-0056) of 23837156 and 24477861. The t test to compare men and women is then found by the formula in Equation 9.5.

Using the =TDIST Function: Finding the Probability of a t Value

The t value of 0.288 given in Equation 9.5 will not lead to rejection of H0. In general, if the level of the Type I error (a) is set at 0.05, H0 will be rejected only when the absolute value of t is approximately 2 or greater. (Recall that t can be a negative or a positive number.) However, the exact probability of finding a t as large as 0.288 can be determined by using the =TDIST() function. For this t test, because there are actually 100 (2n) observations, and because one degree of freedom is used up for each sample, there are c09-math-0058 degrees of freedom. With c09-math-0059 degrees of freedom, the exact probability of finding a t value as large as 0.288 or larger is given by Equation 9.6. The interpretation of the probability given in Equation 9.6 is as follows:

  • We initially believe that the population mean for cost per discharge was exactly the same for both men and women, and
  • In turn, the t value calculated for a large number of samples of discharges for 50 men and 50 women would be as large as 0.288 in about 77 percent of the samples selected.

Because the probability of getting a t value as large as 0.288 is very high, H0 is accepted and the financial officer reaches the conclusion that there is no cost difference between men and women.

Type I and Type II Errors in Comparing Means

Let us consider further what it means to say that there is no difference in the cost between men and women. For the 12,000 hospital discharges, there are 4,560 men and 7,440 women. The true mean of cost for men is $5,825.63, and for women it is $5,954.85. So, in fact, there is a real difference in the mean cost between men and women, although the difference is only $129.22. This is a real difference, but is it an important difference? Such a question is not one that statistics can answer. The question of what may be an important difference must be left to the CEO to decide. However, we still can use statistics to assist in the decision-making process.

Type I Error

But suppose there were truly no population-level difference between the cost for men and the cost for women. Suppose the true mean for men were exactly equal to the true mean for women, which would then both be equal to $5,905.75. Suppose we were to select a large number of samples of size 50 for men and size 50 for women from these two populations. Next we calculated the t value for each sample by using Equation 9.4. In turn, the distribution of those mean values would be a t distribution with 98 degrees of freedom. Furthermore, the distribution would look identical to that shown in Figure 9.3. Similarly, we would expect that about 5 percent of all t values calculated by using Equation 9.4 would have values outside the range c09-math-0061. These would be the t values that would lead us to reject H0, even when it was true and would represent a Type I error.

Type II Error

Consider now the Type II error. Rather than look at the real average difference between the two populations, which is only $129.22, assume that the true population mean for men was $5,000 and the true population mean for women was $7,000. Let's also assume the same variance as that which is true of the actual data. Therefore, the distribution of the t test values from a large number of samples would look very much like the distribution shown in Figure 9.5. In Figure 9.5, about half the distribution is to the left of the dashed vertical line. That is the region in which we would reject H0 and conclude, in fact, that the true cost is different between men and women.

Image described by caption and surrounding text.

Figure 9.5 Type II error for true means of $5,000 and $7,000

To the right of the dashed vertical line, which also includes about half the distribution, is the region in which we will erroneously accept H0, even when the true average difference between the two groups is as large as $2,000. In this case, then, the Type II error is about 0.5. As previously discussed, the only way to reduce the Type II error in this setting is to increase the size of the sample.

If we had the resources to increase the sample size to, for example, 150 for men and 150 for women, then the pooled standard error would be reduced considerably over a sample of 50 each. That would produce a graph of all possible t test values as shown in Figure 9.6. In this figure, the bulk of the t distribution is to the left of the vertical dashed line and only about 7 percent of the distribution is to the right of the dashed line. Thus, in this case, the probability of a Type II error has been reduced to about 0.07.

Image described by caption and surrounding text.

Figure 9.6 Type II error for true means of $5,000 and $7,000 and sample size 150 for each group

Comparing Samples of Unequal Size

Thus far, the discussion has centered on equal sample sizes. This section expands the t test for two groups to unequal sample sizes. In addition, the section introduces the notion of an experimental design.

A new resident physician has been asked by the senior physician with whom she is working to assist in a study. The study investigates the effect of physician interaction with patients on the degree of knowledge gained by the patients about breast cancer. When the patients come to the clinic, they will be given a brochure about breast cancer with information about its etiology, detection, treatment, and prognosis. They will all have waiting time adequate to read the brochure if they wish to do so. They will then be randomly divided into two groups. One group (the control group) receives no further information unless they ask for it. The second group (the experimental group) will have a 5- to 10-minute one-on-one discussion with the physician about breast cancer, providing the same information as was contained in the brochure. At the end of their visit, all patients will complete a short questionnaire that assesses their knowledge of breast cancer on a 20-point scale. The purpose of the study is to determine if the patients who have the one-on-one discussion with the physician score higher on the questionnaire than those who do not.

This is a classic experimental design (although not a double-blind random clinical trial). In this experiment we will introduce one modification in the classic design: The study will be carried out over a one-month period. During that time, the physicians expect to have about 65 women come to the clinic who are appropriate for the study. But because of the time that is involved in interacting with the experimental group, they do not feel that they can include more than 25 women in the experimental group. One decision, then, would be to say that only 25 women would be included in the control group. But this may not be the best decision. Because the control group costs no extra resources (the space required to take the brief test and its scoring is not a significant resource drain), it is a better strategy to include all women who are not in the experimental group as controls. Given that the physicians expect 65 women in total, the projected control group will consist of 40 women.

Conducting the t Test for the Unequal Sample Sizes: Using Pooled Variance

Let us now assume that the month has passed, the experiment has taken place, and post-intervention knowledge tests have been obtained for 65 women. How do we conduct the t test that will determine whether the two groups are different? Because the two group (or sample) sizes are different, we cannot use the t test as given in Equation 9.4. Instead, we need a formula for the t test that takes into account the fact that the two sample sizes are not the same. A pooled variance statistic adjusts for unequal sample sizes. The appropriate t test is as given in Equation 9.7.

where c09-math-0063, and c09-math-0064 is the sample variance for group 1, c09-math-0065 is the sample variance for group 2, c09-math-0066 is the sample size for group 1, and c09-math-0067 is the sample size for group 2.

The complete data set from this experiment (wholly fictional) is given in Chpt 9-1.xls. The results of the experiment, using those data, are shown in Figure 9.7. The figure shows, in columns A and B, some of the data from the study for both the Experimental group (the one that received the intervention) and the Control group (the one that did not). Columns E and F give the sample sizes, the mean, and the variance for each group, as well as the degrees of freedom for each group c09-math-0068 and c09-math-0069. Cells E8:E12 give the actual results of the t test. The value in cell E8 is the pooled variance following the formula for c09-math-0070 given in Equation 9.7. Cell E9 gives the pooled standard error, which is the denominator of the equation for t in Equation 9.7. The t value in cell E10 is calculated by using the formula in Equation 9.7. The degrees of freedom in cell E11 are the sum of cells E5 and F5 but are also c09-math-0071. Cell E12 is the 95 percent confidence limit probability of finding a t value as large as 3.3 when there is actually no difference between the experimental group and the control group. This probability, at approximately 2 chances in 1,000, is very small. Therefore, the conclusion is drawn that there is a difference between the two groups. Furthermore, because the mean for the experimental group is larger than the mean for the control group, the conclusion is extended to say that the one-on-one discussion with the physician made a difference in the amount of knowledge that the women have. Table 9.1 contains the formulas for Figure 9.7.

Image described by surrounding text.

Figure 9.7 Results of a breast cancer experiment

Table 9.1 Formulas for Figure 9.7

Cell Formula
E1 =COUNT(A2:A26)
F1 =COUNT(B2:A41)
E2 =AVERAGE(A2:A26)
F2 =AVERAGE(B2:A41)
E3 =VAR(A2:A26)
F3 =VAR(B2:A41)
E5 =E1−1
F5 =F1−1
E6 =E5*E3
F6 =F5*F3
E8 =(E61F6)/(E51F5)
E9 =(E8*((1/E1)1(1/F1)))^0.5
E10 =(E2−F2)/E9
E11 =E51F5
E12 =TDIST(E10,E11,2)

The two t tests discussed thus far have different implications for what is actually known. The first t test, which compared discharges for men with discharges for women, was concerned only with whether there was a difference. The test does not and cannot say anything about the source of that difference. The difference might be due to differences in age between the men and women who were discharged, differences in their diagnoses or severity of illness, differences in length of stay, or other factors that have not been considered in the test. Other statistical procedures might shed light on the source of difference, but the t test cannot identify those possible differences. Conversely, if the assignment to the experimental and the control groups for the second t test was adequately carried out, the t test does have the capability of indicating whether the experimental intervention has the ability to affect the knowledge women have about breast cancer. However, the ability of the t test to allow that conclusion to be drawn depends entirely on the adequacy of the random assignment process.

An Assumption of Equal Variance

The t tests discussed thus far carry with them an assumption that the variances of the two groups being compared are equal. With regard to the test for the difference between costs for men and women, the variances of the two groups (shown as the denominator in Equation 9.5) differ by only about 3 percent. With regard to the variances in the experimental setting, the values, shown as cells E3 and F3 in Figure 9.7, differ by about 41 percent. Is there a statistical test for the equivalence of variance between the two groups? In turn, if they are not equal, what can be done about that?

First, a test does exist for equivalence of variance in the t test setting. To examine this test, we will use the data from the experimental setting discussed in the second subsection of Section 9.2. The variances presented there differ by about 41 percent. The test of equivalence of variance is an F test. Whereas a t test is typically a test comparing mean values, an F test is typically a test comparing variances. The F test for the equivalence of variance for two groups in a t test setting is given in Equation 9.8. Chapter 10 contains detailed discussion on F tests and analysis of variance (ANOVA). The concept is introduced here for comparison to t tests.

where c09-math-0073 is the variance for one sample and c09-math-0074 is the variance for the other.

Testing for Equal Variance: The F Test

Refer to the example discussed in the second subsection of Section 9.2. The experimental group of 25 women has a variance of 7.12 (see Figure 9.7), whereas the control group has a variance of 10.05. So the F test for the difference between these two variances is that given in Equation 9.9 and has the value of 0.71. Whereas a t test has a single number representing degrees of freedom, an F test has two numbers representing degrees of freedom—one number for the numerator and one for the denominator. In this case, the degrees of freedom are 24 and 39 (c09-math-0075 for the experimental group and c09-math-0076 for the control group).

The F test is a test of equality of the variances, or c09-math-0078. The F distribution is a nonsymmetrical distribution that generally looks like Figure 9.8. Figure 9.8 represents the F distribution for 24 and 39 degrees of freedom, but other F distributions are quite similar in appearance. The two dashed vertical lines indicate the lower and upper 95 percent limits. For 24 and 39 degrees of freedom, the F value must be approximately 0.5 or less (if the smaller variance is divided by the larger) or 2 or greater (if the larger variance is divided by the smaller) before the difference between the two variances is considered large enough to have come from populations in which the true variances are not equal. The actual value of the F test in Equation 9.9 (0.71) has a probability of occurrence of about 0.81. Because this probability is relatively high, the conclusion that we would reach is that the two variances are equal.

Image described by surrounding text.

Figure 9.8 F distribution

Unequal Variances: Now What?

In the case discussed earlier, the conclusion was drawn that the variances were equal. But what happens if the two variances prove not to be equal? In the case of unequal variances, the pooled variance formula in Equation 9.7 is no longer appropriate. Instead, the t test is conducted not by pooling the variance but by simply adding the variances divided by the sample sizes. In that case, the appropriate t test is given by Equation 9.10. It might be noted that when sample sizes are the same, the formula in Equation 9.10 will be exactly the same as the formula in Equation 9.4. The value of this is that if sample sizes are the same, the formula in Equation 9.4 can be used, regardless of whether variances are equal. But if the sample sizes are not the same, the formula given in Equation 9.7, or that given in Equation 9.10, will produce generally the same decision (although not the same numerical result for the t) most of the time. That is, if the decision to reject H0 was produced by Equation 9.7, it is highly likely that the same decision would be reached by using Equation 9.10.

To see the effect of the changes in the computation of the t test for unequal variance and unequal sample size, the t test as indicated in Equation 9.10 would be employed. The equation would produce the results shown in Equation 9.11 for the data from the breast cancer education study.

If variances are known to be unequal and sample sizes are either equal or unequal, the appropriate degrees of freedom for the t test are reduced by the degree to which the variances differ. This reduction in the degrees of freedom for the t test may be carried out by several different formulas, all of which give approximately the same answer. But the most widely accepted method—the one most likely used by Excel—is given by the formula in Equation 9.12.

where f is the adjusted degrees of freedom,

equation

and

equation

The formula in Equation 9.12 gives 57.48 as the degrees of freedom for the t test in Figure 9.7. We have already concluded that there is no difference in the variance of the two samples. However, a more conservative test would be to use 57 degrees of freedom (the nearest whole integer to the calculated value) to make the test. In that case, we would conclude that the probability of a t value as large as 3.43 would be 0.0011.

An important question with regard to the presumed unequal variance is the question of how much difference it makes if variances are unequal. First, it is useful to note that if sample sizes are equal, using the equal or unequal variance formulas will generally result in the same decision. In only rare situations, when the t test value was right around 2, calculated using either formula (and for equal sample size, the t from either formula is the same), would the unequal variance formula lead to a different decision. And in this situation, the unequal variance t test will always be less likely to reject H0. This is because the lower degrees of freedom for the unequal variance t test require a slightly larger value of t for statistical significance.

However, with both unequal variance and unequal sample sizes, it is essential to use the unequal variance t test. If the smaller sample also has the larger variance, the unequal variance t test is more likely to result in the acceptance of H0 than the equal variance t test. But if the smaller sample also has the smaller variance, the unequal variance t test is more likely than the equal variance t test to result in the rejection of H0. Thus, it is important, with the case of unequal sample sizes, to use the appropriate t test.

The Excel t Test Add-In

Excel provides two add-ins for a t test of two groups. To access the add-in within Excel, go to the Data ribbon, in the Analysis group, and click the Data Analysis option. In the Data Analysis menu the test is called t Test: Two Sample Assuming Equal Variance and t Test: Two Sample Assuming Unequal Variance. Let us look at each of these with the data from the breast cancer education experiment as the example. After selecting Data Analysis, select t Test: Two Sample Assuming Equal Variance. The dialog box that appears is shown in Figure 9.9. In this dialog box, the data for the experimental group are included as variable 1 range, whereas the data for the control group are included as variable 2 range. Leaving the hypothesized mean difference blank is equivalent to putting in the value of zero. The labels box is checked because the first row contains the labels for the two data streams. The output will go on a new worksheet. Table 9.2 displays the steps for using Excel to calculate a t test.

Image described by surrounding text.

Figure 9.9 Dialog box for t test for equal variance

Table 9.2 Formulas for Figure 9.12

Cell Formula
F2 =AVERAGE(C2:C41)
F3 =STDEV(C2:C41)
F4 =COUNT(C2:C41)
F5 =F3/SQRT(F4)
F6 =F2/F5
F7 =TDIST(F6,F4,2)

Clicking OK will produce the result of this t test on a new worksheet. It can be seen, by comparing Figures 9.7 and 9.10, that the results computed by the formula in Equation 9.7 and the results of the add-in in Figure 9.10 are identical. The add-in also provides some additional information, particularly the level at which H0 would be rejected for one- and two-tail tests (t Critical one tail and t Critical two tail).

Screenshot of the t-Test: Two-Sample Assuming Equal Variances table including experimental value for pooled variance.

Figure 9.10 Results of Excel t test for equal variance

Differences in Results of Excel t Tests

Consider, now, the add-in for a t test assuming unequal variance. As stated before, to access the add-in within Excel, go to the Data ribbon, in the Analysis group, and click the Data Analysis option. Then select t Test: Two Sample Assuming Unequal Variance. When this option is selected, the dialog box that appears is nearly identical to the one shown in Figure 9.9. The difference is in the title. It is now t Test: Two Sample Assuming Unequal Variance. When the add-in is invoked, it produces a result—for the experimental data under consideration here—as shown in Figure 9.11.

Screenshot of the t-Test: Two-Sample Assuming Unequal Variances table without the experimental value for pooled variance.

Figure 9.11 Results of Excel t test for unequal variance

There are three differences between the results shown in Figure 9.10 and those shown in Figure 9.11. The first is in the calculated value of the t test. It is 3.30 in Figure 9.10 and 3.43 in Figure 9.11. As noted previously, the value of 3.30 was calculated using Equation 9.7, whereas the value of 3.43 was calculated using Equation 9.10. A second is the absence of a pooled variance c09-math-0084 value in Figure 9.11. This is because no pooled variance is calculated for the unequal variance t test. Finally, there is a difference in the degrees of freedom between the two figures. The equation for the equal variance degrees of freedom is simply n1 + n2 − 2, whereas the equation for the unequal variance degrees of freedom is found by using Equation 9.12.

Here are the steps for using the Excel t test add-in:

  1. Go to the Data ribbon ⇨ Analysis Group ⇨ Data Analysis, and from the menu that appears select t Test: Two Sample Assuming Equal Variance.
  2. In the Variable 1 Range input box type or highlight $A$1:$A$26.
  3. In the Variable 2 Range input box type or highlight $B$1:$B$41.
  4. Check the Labels checkbox (we included header labels when we inputted the Variable 1 Range and the Variable 2 Range).
  5. Click the New Worksheet Ply radio button.
  6. Click OK, and a table that resembles Figure 9.10 will appear.

9.3 A t Test for Related Data

Thus far, the discussion of t tests has focused on two areas. These areas were on a population value determined a priori or on the question of whether two groups or samples are the same or different. This section addresses a t test that can be used when two sets of data are related. In particular, this is a test of whether two measurements for the same group of people (or organizations, or any other identifiable entities) are similar or different.

Calculation of the Test

To examine this t test, consider a group of persons with adult-onset diabetes who have come under care at a diabetes clinic. The clinic is confident that the symptoms of diabetes—especially high blood sugar—can be controlled through a combination of exercise, weight loss, and diet and lifestyle changes. To assess this belief, the clinic has enrolled a group of 40 volunteers, all with hemoglobin A1c (HbA1c) measures above 7.8 (in general, HbA1c levels above 7.8 indicate some difficulties in controlling blood sugar levels). The average HbA1c level for this group of 40 persons, at the time of enrollment in the study, was 8.49. The persons were counseled on diet, exercise, and lifestyle changes and had monthly meetings with their counselors to track progress toward agreed-upon goals. At the end of a six-month period, the HbA1c levels for these persons were again measured. The data for this study (entirely fabricated) are given in Chpt 9-2.xls. The mean HbA1c level for the entire group after the six-month study period was 8.25. The question is whether this reduction in HbA1c represents a statistically significant change.

The initial hypothesis for this study would typically be:

  1. H0: The combination of exercise, weight loss, and diet and lifestyle changes has no effect on HbA1c levels
  2. or
  3. H0: HbA1cb = HbA1ca, where b refers to before and a to after.

Because there is some expectation that the HbA1c levels will be lowered by the combination of exercise, weight loss, and diet and lifestyle changes, the alternative hypothesis is:

  1. H1: HbA1c levels will be lower after the intervention than before
  2. or
  3. c09-math-0138

The test of the hypothesis, H0, just given, is carried out using the formula in Equation 9.13.

where c09-math-0140 is the mean difference between the before and after measures, c09-math-0141 is the standard error of the differences, and the degrees of freedom are n − 1 where n is the number of difference scores.

A partial file of the data for this before and after t test is given in Figure 9.12. The figure shows the before-data measure in column A, the after-data measure in column B, and the difference between the before- and after-data measures in column C. The mean difference value is given in cell F2 as 0.24, and the standard error of the difference is given in cell F5. The t test is shown in cell F6 (its calculation is shown in the formula line at the top). The probability of the occurrence of a t value as large as that shown in cell F6 is given in cell F7. The =TDIST() function is used with the t value of 2.4, 39 degrees of freedom and one tail (because the only result of interest is a result where after is less than before). The probability of getting a t value as large as 2.4 with a one-tail test is about 0.01. This means that we would reject the hypothesis that the HbA1c levels are equal before and after the intervention, and we would, in fact, accept the alternative, which is that HbA1c levels are lower after the intervention. Table 9.2 displays the formulas for Figure 9.12.

Image described by surrounding text.

Figure 9.12 Calculation of before and after t test

Excel Add-In for Related Data

Excel provides an add-in for a t test of the difference between related observations of this type. It is called t Test: Paired Two Sample for Means. The test is invoked precisely the same way as the t tests discussed in the fourth subsection of Section 9.2. When the test is invoked, you first see the dialog box shown in Figure 9.9. The t test that results from the data partially shown in Figure 9.12 is shown in Figure 9.13. That figure shows the mean before and the mean after, although these are not actually used in the calculation of the t test. It also shows the before and after variance, again not used in the t test. Also shown is a Pearson correlation. The t statistic and the probability of t are the same as those given in Figure 9.12.

Screenshot of the t-test: Paired Two Sample for Means table listing before and after values, including Pearson correlation.

Figure 9.13 Excel add-in for before and after t test

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset