Chapter 13

Comparing Proportions and Analyzing Cross-Tabulations

In This Chapter

arrow Testing for association between categorical variables with the Pearson chi-square and Fisher Exact tests

arrow Adjusting for confounders with the Mantel-Haenszel test for stratified fourfold tables

arrow Spotting trends across ordinal (sequenced) categories with the Kendall tau test

arrow Estimating sample sizes for tests of association

Suppose you’re conducting a clinical trial of a new treatment for an acute disease with a high mortality rate, for which no effective treatment currently exists. You study 100 consecutive subjects with this condition and randomly assign 60 of them to receive the new treatment and 40 to receive a placebo or sham treatment. Then you record whether each subject lives or dies. Your data file has two dichotomous categorical variables: the treatment group (drug or placebo) and the outcome (lives or dies).

You find that 30 of the 40 untreated (placebo) subjects died (a 75 percent mortality rate), while only 27 of the 60 treated subjects died (45 percent mortality). The drug appears to reduce mortality by about 30 percentage points. But can you be sure this isn’t just a random sampling fluctuation?

Data from two (possibly associated) categorical variables is generally summarized as a cross-tabulation (also called a cross-tab or a two-way table). The rows of the cross-tab represent the different categories (or levels) of one variable, and the columns represent the different levels of the other variable. The cells of the table contain the count of the number of subjects with the indicated levels for the row and column variables. If one variable can be thought of as the “cause” or “predictor” of the other, the cause variable becomes the rows, and the “outcome” or “effect” variable becomes the columns. If the cause and outcome variables are both dichotomous (have only two levels), as they are in this example, then the cross-tab has two rows and two columns (and therefore four cells of counts) and is referred to as a 2-by-2 (or 2x2) cross-tab, or a fourfold table. Cross-tabs are usually displayed with an extra row at the bottom and an extra column at the right to contain the sums of the cells in the rows and columns of the table. These sums are called marginal totals, or just marginals.

Comparing proportions based on a fourfold table is the simplest example of testing the association between two categorical variables. More generally, the variables can have any number of categories, so the cross-tab can be larger than 2x2, with many rows and many columns. But the basic question to be answered is always the same: Is the spread of numbers across the columns so different from one row to the next that the numbers can’t be reasonably explained away as random fluctuations?

In this chapter, I describe a variety of tests you can use to answer this question: the Pearson chi-square test, the Fisher Exact test, the Mantel-Haenszel test, and the Kendall test. I also explain how to estimate power and sample sizes for the chi-square and Fisher Exact tests.

remember.eps You can run all the tests in this chapter either from case-level data in a database (one record per subject) or from data that has already been summarized in the form of a cross-tab:

check.png Most statistical software is set up to work with case-level data. Your file needs to have two categorical variables representing the row and column variables whose relationship you want to test. If you’re running a Mantel-Haenszel test, your file also needs to have another variable representing the stratum (see the Mantel-Haenszel section later in this chapter). You merely have to tell the software which test (or tests) you want to run and identify the variables to be used for the test. Flip to Chapter 4 for an introduction to statistical software.

check.png Most online calculators expect you to have already cross-tabulated the data. These calculators usually present a screen showing an empty table, and you enter the counts into the table’s cells.

Examining Two Variables with the Pearson Chi-Square Test

The most commonly used statistical test of association between two categorical variables is called the chi-square test of association. This classic test was developed around 1900 by Karl Pearson and has been a mainstay of practical statistical analysis ever since. It’s called the chi-square test because it involves calculating a number (a test statistic) that fluctuates in accordance with the chi-square distribution (see Chapter 25). Many other statistical tests also use the chi-square distribution, but the test of association is by far the most popular, so whenever I refer to a chi-square test without specifying which one, I’m referring to the Pearson chi-square test of association between two categorical variables.

In the following sections, I explain how the chi-square test works, list some pros and cons of the test, and describe a modification you can make to the test.

Understanding how the chi-square test works

You don’t have to know the details of the chi-square test if you have a computer do the calculations for you (which I always recommend), so technically, you don’t have to read this section. But you’ll have a better appreciation for the strengths and limitations of this test if you know how it works. Here, I walk you through the steps of conducting a chi-square test.

Calculating observed and expected counts

remember.eps All statistical significance tests start with a null hypothesis (H0) that asserts that no real effect is present in the population, and any effect you think you see in your sample is due only to random fluctuations. (See Chapter 3 for more information.) The H0 for the chi-square test asserts that there’s no association between the row variable and the column variable, so you should expect the relative spread of cell counts across the columns to be the same for each row.

Figure 13-1 shows how this works out for the observed data taken from the example in this chapter’s introduction. You can see from the marginal “Total” row that the overall mortality rate (for both treatment groups combined) is 57/100, or 57 percent.

9781118553992-fg1301.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-1: The observed results of a trial of a new drug for a high-mortality disease.

What if the true mortality rate for this condition is 57 percent, and the drug truly has no effect on mortality?

check.png In the drug-treated group, you’d expect about 34.2 deaths (57 percent of 60), with the remaining 25.8 subjects surviving. (Expected outcomes are usually not whole numbers.)

check.png In the placebo group, you’d expect about 22.8 deaths (57 percent of 40), with the remaining 17.2 subjects surviving.

These expected outcomes are displayed in Figure 13-2.

9781118553992-fg1302.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-2: Expected cell counts if the null hypothesis is true (the drug does not affect survival).

Notice that the expected counts table in Figure 13-2 has the same marginal totals (row totals and column totals) as the observed counts table in Figure 13-1; the difference is that under H0 (no association between row variable and column variable), the relative spread of expected counts across the columns is the same for each row (and the relative spread of counts down the rows is the same for each column). In other words, the “expected” numbers in both rows (drug and placebo) have the same relative spread between lived and died.

Now that you have observed and expected counts, you’re no doubt curious as to how they differ. You can subtract each expected count from the observed count in each cell to get a difference table (observed – expected), like Figure 13-3:

Because the observed and expected tables in Figures Figure 13-1 and Figure 13-2 always have the same marginal totals, the marginal totals in the difference table are all equal to zero. All four cells in the center of this difference table have the same absolute value (7.2), with a plus and a minus value in each row and each column.

remember.eps The pattern just described is always the case for 2x2 tables. For larger tables, the difference numbers aren’t all the same, but they always sum up to zero for each row and each column.

9781118553992-fg1303.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-3: Differences between observed and expected cell counts if the null hypothesis is true.

The values in the difference table in Figure 13-3 show how far off from H0 your observed data is. The question remains: Are those difference values larger than what may have arisen from random fluctuations alone if H0 is really true? You need some kind of “yardstick” by which to judge how unlikely those difference values are. Recall from Chapter 9 that the standard error (SE) expresses the general magnitude of random sampling , so the SE makes a good yardstick for judging the size of the differences you may expect to see from random fluctuations alone. It turns out that the SE of the differences is approximately equal to the square root of the expected counts. The rigorous proof of this is too complicated for most mortals to understand, but a pretty simple informal explanation is based on the idea that random event occurrences often follow the Poisson distribution, for which the SE of the event count equals the square root of the expected count (as I explain in Chapter 9).

Summarizing and combining scaled differences

For the upper-left cell in the cross-tab (drug-treated subjects who lived), you see the following:

check.png The observed count (Ob) is 33.

check.png The expected count (Ex) is 25.8.

check.png The difference (Diff) is 33 – 25.8, or +7.2.

check.png The SE of the difference is 9781118553992-eq13001.eps, or 5.08.

You can “scale” the Ob–Ex difference by dividing it by the SE yardstick, getting the ratio (Diff/SE) = +7.2/5.08, or 1.42. This means that the difference between the observed number of drug-treated subjects who lived and the number you would have expected if the drug had no effect on survival is about 1.42 times as large as you would have expected from random sampling fluctuations alone. You can do the same calculation for the other three cells and summarize these scaled differences, as shown in Figure 13-4.

9781118553992-fg1304.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-4: Differences between observed and expected cell counts, scaled according to the estimated standard errors of the differences.

The next step is to combine these individual scaled differences into an overall measure of the difference between what you observed and what you would have expected if the drug truly did not affect survival. You can’t just add them up, because the negative and positive differences would tend to cancel each other out. You want all differences (positive and negative) to contribute to the overall measure of how far your observations are from what you expected under H0. Statisticians love to sum the squares of differences (because squares are always positive), and that’s exactly what’s done in the chi-square test. Figure 13-5 shows the squared scaled differences, which are calculated from the observed and expected counts in Tables 13-1 and 13-2 using the formula (Ob – Ex)2/Ex, not by squaring the rounded-off numbers in Table 13-4, which would be less accurate.

9781118553992-fg1305.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-5: Components of the chi-square statistic: squares of the scaled differences.

remember.eps You then add up these squared scaled differences: 2.01 + 1.52 + 3.01 + 2.27 = 8.81. This sum is an excellent test statistic to measure the overall departure of your data from the null hypothesis:

check.png If the null hypothesis is true (the drug does not affect survival), this statistic should be quite small.

check.png The more effect the drug has on survival (in either direction), the larger this statistic should be.

Determining the p value

The only remaining task is to determine the p value — the probability that random fluctuations alone, in the absence of any true effect of the drug on survival, could lead to a value of 8.81 or greater for this test statistic. (I introduce p values in Chapter 3.) Once again, the rigorous proof is very complicated, but an informal explanation goes something like this:

When the expected cell counts are very large, the Poisson distribution becomes very close to a normal distribution (see Chapter 25 for more on the Poisson distribution). If the H0 is true, each scaled difference should be (approximately) a normally distributed random variable with a mean of zero (because you subtract the expected value from the observed value) and a standard deviation of 1 (because you divide by the standard error). The sum of the squares of one or more normally distributed random numbers is a number that follows the chi-square distribution (also covered in Chapter 25). So the test statistic from this test should follow the chi-square distribution (which is why this is called the chi-square test), and you should be able to look up your 8.81 in a chi-square table to get the p value for the test.

Now, the chi-square distribution is really a family of distributions, depending on a number called the degrees of freedom, usually abbreviated d.f. or df, or by the Greek lowercase letter nu (ν), which tells how many independent ­normally distributed numbers were squared and added up.

What’s the df for the chi-square test? It depends on the number of rows in the cross-tab. For the 2x2 cross-tab (fourfold table) in this example, you added up the four values in Figure 13-5, so you may think that you should look up the 8.81 chi-square value with 4 df. But you’d be wrong. Note the italicized word independent in the preceding paragraph. And keep in mind that the differences (Ob – Ex) in any row or column always add up to zero. The four terms making up the 8.81 total aren’t independent of each other. It turns out that the chi-square test statistic for a fourfold table has only 1 df, not 4. In general, an N-by-M table, with N rows, M columns, and therefore N × M cells, has only (N – 1)(M – 1) df because of the constraints on the row and column sums. Don’t feel bad if this wrinkle caught you by surprise — even Karl Pearson (the guy who invented the chi-square test) got that part wrong!

So, referring to a chi-square table (or, better yet, having the computer calculate the p value for you in any statistical software package), the p value for chi-square = 8.81, with 1 df, is 0.003. This means that there’s only a 0.003 probability, or about 1 chance in 333 (because 1/0.003 = 333), that random fluctuations could produce such an impressive apparent performance if the drug truly had no effect on survival. So your conclusion would be that the drug is associated with a significant reduction in mortality.

Putting it all together with some notation and formulas

tip.eps The calculations of the Pearson chi-square test can be summarized concisely using the cell-naming conventions in Figure 13-6, along with the standard summation notation described in Chapter 2.

9781118553992-fg1306.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-6: A general way of naming the cells of a cross-tab table.

Using these conventions, the basic formulas for the Pearson chi-square test are as follows:

check.png Expected values: 9781118553992-eq13002.eps, i = 1, 2, … N; j = 1, 2, … M

check.png Chi-square statistic: 9781118553992-eq13003.eps

check.png Degrees of freedom: df = (N – 1)(M – 1)

where i and j are array indices that indicate the row and column, respectively, of each cell.

Pointing out the pros and cons of the chi-square test

The Pearson chi-square test is so popular for several reasons:

check.png The calculations are fairly simple and can even be carried out by hand, although I’d never recommend that. They can easily be programmed in Excel; several web pages can perform the test; and it has been implemented on PDAs, smartphones, and tablets. Almost every statistical software package (including the ones in Chapter 4) can perform the ­chi-square test for cross-tabulated data.

check.png The test works for tables with any number of rows and columns, and it easily handles cell counts of any magnitude. The calculations are almost instantaneous on a computer for tables of any size and counts of any magnitude.

warning_bomb.eps But the chi-square test has some shortcomings:

check.png It’s not an exact test. The p value it produces is only approximate, so using p < 0.05 as your criterion for significance doesn’t necessarily guarantee that your Type I error rate (the chance of falsely claiming significance) will be only 5 percent (see Chapter 3 for an introduction to Type I errors). It’s quite accurate when all the cells in the table have large counts, but it becomes unreliable when one or more cell counts is very small (or zero). There are different suggestions as to how many counts you need in order to confidently use the chi-square test, but the simplest rule is that you should have at least five observations in each cell of your table (or better yet, at least five expected counts in each cell).

check.png The chi-square test isn’t good at detecting small but steady progressive trends across the successive categories of an ordinal variable (see Chapter 4 if you’re not sure what ordinal is). It may give a significant result if the trend is strong enough, but it’s not designed specifically to work with ordinal categorical data.

Modifying the chi-square test: The Yates continuity correction

For the special case of fourfold tables, a simple modification to the chi-square test, called the Yates continuity correction, gives more reliable p values. The correction consists of subtracting 0.5 from the magnitude of the (Ob – Ex) difference before squaring it.

remember.eps The Yates correction to the Pearson chi-square test should always be used for fourfold tables but should not be used for tables with more than two rows or more than two columns.

For the sample data in the earlier section Understanding how the chi-square test works, the application of the Yates correction changes the 7.20 (or –7.20) difference in each cell to 6.70 (or –6.70). This lowers the chi-square value from 8.81 down to 7.63 and increases the p value from 0.0030 to 0.0057, which is still very significant — the chance of random fluctuations producing such an apparent effect in your sample is only about 1 in 175 (because 1/0.0057 = 175).

Focusing on the Fisher Exact Test

The Pearson chi-square test that I describe in the earlier section Examining Two Variables with the Pearson chi-Square Test isn’t the only way to analyze cross-tabulated data. R. A. Fisher (probably the most famous statistician of all time) invented another test in the 1920s that gives the exact p value for tables with large or small cell counts (even cell counts of zero!). Not surprisingly, this test is called the Fisher Exact test. In the following sections, I show you how the Fisher Exact test works, and I note both its pros and cons.

Understanding how the Fisher Exact test works

You don’t have to know the details of the Fisher Exact test if you have a computer do the calculations for you (which I always recommend), so you don’t have to read this section. But you’ll have a better appreciation for the strengths and limitations of this test if you know how it works.

remember.eps This test is, conceptually, pretty simple. You look at every possible table that has the same marginal totals as your observed table. You calculate the exact probability (Pr) of getting each individual table using a formula that, for a fourfold table (using the notation for Figure 13-6), is

9781118553992-eq13004.eps

Those exclamation points indicate calculating the factorials of the cell counts (see Chapter 2). For the example in Figure 13-1, the observed table has a probability of

9781118553992-eq13005.eps

Other possible tables with the same marginal totals as the observed table have their own Pr values, which may be larger than, smaller than, or equal to the Pr value of the observed table. The Pr values for all possible tables with a specified set of marginal totals always add up to exactly 1.

The Fisher Exact test p value is obtained by adding up the Pr values for all tables that are at least as different from the H0 as your observed table. For a fourfold table, that means adding up all the Pr values that are less than (or equal to) the Pr value for your observed table.

For the example in Figure 13-1, the p value comes out to 0.00385, which means that there’s only 1 chance in 260 (because 1/0.00385 = 260) that random fluctuations could have produced such an apparent effect in your sample.

Noting the pros and cons of the Fisher Exact test

The big advantages of the Fisher Exact test are as follows:

check.png It gives the exact p value.

check.png It is exact for all tables, with large or small (or even zero) cell counts.

warning_bomb.eps So why do people still use the chi-square test, which is approximate and doesn’t work for tables with small cell counts? Why doesn’t everyone always use the Fisher test? Nowadays many statisticians are recommending that everyone use the Fisher test instead of the chi-square test whenever possible. But there are several problems with the Fisher Exact test:

check.png The calculations are a lot more complicated, especially for tables larger than 2x2. Many statistical software packages either don't offer the Fisher Exact test or offer it only for fourfold tables. Several interactive web pages perform the Fisher Exact test for fourfold tables (including StatPages.info/ctab2x2.html), but at this time I'm not aware of any web pages that offer that test for larger tables. Only the major statistical software packages (like SAS, SPSS, and R, described in Chapter 4) offer the Fisher Exact test for tables larger than 2x2.

check.png The calculations can become numerically unstable for large cell counts, even in a 2x2 table. The equations involve the factorials of the cell counts and marginal totals, and these can get very large — even for modest sample sizes — often exceeding the largest number that a computer is capable of dealing with. Many programs and web pages that offer the Fisher Exact test for fourfold tables fail with data from more than 100 subjects. (The web page StatPages.info/ctab2x2.html works with cell counts of any size.)

check.png The exact calculations can become impossibly time consuming for larger tables and larger cell counts. Even if a program can, in theory, do the calculations, it may take hours — or even centuries — to carry them out!

check.png The Fisher Exact test is no better than the chi-square test at detecting gradual trends across ordinal categories.

Calculating Power and Sample Size for Chi-Square and Fisher Exact Tests

Note: The basic ideas of power and sample-size calculations are described in Chapter 3, and you should review that information before going further here.

Suppose you’re planning a study to test whether giving a certain dietary supplement to a pregnant woman reduces her chances of developing morning sickness during the first trimester (the first three months) of pregnancy. This condition normally occurs in 80 percent of pregnant women, and if the supplement can reduce that incidence rate to only 60 percent, it’s certainly worth knowing about. So you plan to enroll a group of pregnant women and randomize them to receive either the dietary supplement or a placebo that looks, smells, and tastes exactly like the supplement. You’ll have them take the product during their first trimester, and you’ll record whether they experience morning sickness during that time (using explicit criteria for what constitutes morning sickness). Then you’ll tabulate the results in a 2x2 cross-tab (with “supplement” and “placebo” defining the two rows, and “did” and “did not” experience morning sickness heading the two columns). And you’ll test for a significant effect with a chi-square or Fisher Exact test. How many subjects must you enroll to have at least an 80 percent chance of getting p < 0.05 on the test if the supplement truly can reduce the incidence from 80 percent to 60 percent?

tip.eps You have several ways to estimate the required sample size. The quickest one is to refer to the sample-size table for comparison of proportions in this book's Cheat Sheet at www.dummies.com/cheatsheet/biostatistics. But the most general and most accurate way is to use power/sample-size software such as PS or GPower (see Chapter 4). Or, you can use the online sample-size calculator at StatPages.info/proppowr.html, which produces the same results. Using that web page, the calculation is set up as shown in Figure 13-7.

9781118553992-fg1307.tif

Screenshot courtesy of John C. Pezzullo, PhD

Figure 13-7: The online sample-size calculator for comparing two proportions with a chi-square or Fisher Exact test.

You fill in the five parameters in the upper block, hit the Compute button, and see the results in the lower block. This web page provides two different calculations:

check.png The classical calculation, which applies to the uncorrected chi-square test, says that 81 analyzable subjects in each group (162 analyzable subjects altogether) are required.

check.png The continuity-corrected calculation, which applies to the Yates chi-square or Fisher Exact test, says that 91 analyzable subjects in each group (182 analyzable subjects altogether) are required.

You should base your planning on the larger value from the continuity-­corrected calculation, just to be on the safe side.

tip.eps You need to enroll additional subjects to allow for possible attrition during the study. If you expect x percent of the subjects to drop out, your enrollment should be:

Enrollment = 100 × Analyzable Number/(100 – x)

So if you expect 15 percent of enrolled subjects to drop out and therefore be non-analyzable, you need to enroll 100 × 182/(100 – 15), or about 214, subjects.

Analyzing Ordinal Categorical Data with the Kendall Test

remember.eps Neither the chi-square nor the Fisher Exact test (I describe both earlier in this chapter) is designed for testing the association between two ordinal categorical variables — categories that can be put into a natural order. These tests are insensitive to the order of the categories; if you shuffle the columns (or rows) into some other sequence, they produce the same p value. This characteristic makes the chi-square and Fisher Exact tests insensitive to gradual trends across the ordinal categories.

As an example, consider a study in which you test a new drug for a chronic progressive disease (one that tends to get worse over time) at two different doses along with a placebo. You record the outcome after six months of treatment as a three-way classification: improved, unchanged, or worsened. You can think of treatment as an ordinal categorical variable — placebo < low-dose < high-dose — and outcome as an ordinal variable — worsened < unchanged < improved. A study involving 100 test subjects may produce the results shown in Figure 13-8.

9781118553992-fg1308.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-8: An association between two ordinal variables: dose level and response.

Notice that

check.png Most placebo subjects got worse, some stayed the same, and a few got better. This reflects the general downward course of an untreated progressive disease.

check.png The low-dose subjects didn’t seem to change much, on average, with roughly equal numbers getting better, getting worse, and remaining the same. The low-dose drug may at least be showing some tendency to counteract the general progressive nature of the disease.

check.png The high-dose subjects seemed to be getting better more often than getting worse, indicating that at higher doses, the drug may be able to actually reverse the usual downward course of the disease.

So an encouraging pattern does appear in the data. But both the chi-square and Fisher Exact tests conclude that there’s no significant association between dose level of the drug and outcome (p = 0.153 by chi-square test and 0.158 by Fisher Exact test). Why can’t the tests see what you can see by looking at the table? Because both of these tests, by the way they calculate their p value, are unable to notice a progressive trend across the three rows and the three columns.

Fortunately, other tests are designed specifically to spot trends in ordinal data. One of the most common ones involves calculating a test statistic called Kendall’s tau. The basic idea is to consider each possible pair of subjects, determining whether those two subjects are concordant or discordant with the hypothesis that the two variables are positively correlated. In this example, it’s like asking whether the subject who received a higher dose of the drug also had a better outcome.

For example, if one subject in the pair received the placebo and was unchanged while the other subject received the low dose and got better, that pair would be concordant. But if one subject received a low dose and got better while another subject received a high dose and remained unchanged, that pair would be considered discordant.

The Kendall test counts how many pairs are concordant, discordant, or noninformative (where both subjects are in the same category for one or both variables). The test statistic is based on the difference between the number of concordant and discordant pairs divided by a theoretical estimate of the standard error of that difference. The test statistic is then looked up in a table of the normal distribution to obtain a p value.

For the sample data in Figure 13-8, the Kendall test (using the R statistical software package) gives p = 0.010, which, being less than 0.05, indicates a significant association between dose level and outcome. The Kendall test can spot the slight but consistent trend across the columns and down the rows of the table, whereas the chi-square and Fisher Exact tests can’t.

Studying Stratified Data with the Mantel-Haenszel Chi-Square Test

All the tests I describe earlier in this chapter examine the relationship between two categorical variables. Sometimes, however, one or more “nuisance” variables can get in the way of your analysis. Building on the example I use at the beginning of this chapter, suppose you’ve tested your new drug in three countries. And suppose that because of differences in demographics, healthcare, climate, and so on, the mortality of the disease tends to be different in each of the three countries. Furthermore, suppose that there’s a slight imbalance between the number of drug and placebo subjects in each country. The country would be considered a confounder of the relationship between treatment and survival. Confounding can obscure real effects or produce spurious apparent effects when none are truly present. So you want some way to control for this confounder (that is, mathematically compensate for any effect it might have on the observed mortality) in your analysis.

The most general way to handle confounding variables is with multivariate regression techniques that I describe in Chapter 19. Another way is by stratification, in which you split your data file into two or more strata on the basis of the values of the confounder so that cases within each stratum have the same (or nearly the same) value for the confounder (or confounders). You then analyze the data within each stratum and pool the results for all the strata.

When you analyze the relationship between two dichotomous categorical variables, you can control for one or more confounders using the Mantel-Haenszel (MH) chi-square test. This test is simple to set up, and the results are usually easy to interpret, so it’s often the preferred way to analyze fourfold tables when you want to adjust for confounding variables.

To run an MH test, you first create a separate fourfold table for each stratum. Suppose that your data, broken down by country, looks like Figure 13-9.

9781118553992-fg1309.eps

Illustration by Wiley, Composition Services Graphics

Figure 13-9: Results of a trial of a new drug for a high-mortality disease, stratified by country.

Conceptually, the MH test works by estimating an odds ratio for each country, pooling those estimates into an overall odds ratio for all countries, and testing whether the pooled odds ratio is significantly different from 1. (An odds ratio is a measure of how much the spread of counts across the columns differs between the rows, with a value of 1 indicating no difference at all; see Chapter 14 for details.)

Using the R statistical package, an MH test on the data in Figure 13-9 produces a p value of 0.0068, which indicates that there’s only about 1 chance in 147 (because 1/0.0068 = 147) that random fluctuations could produce such an apparent effect in your sample.

Like the chi-square test, the Mantel-Haenszel test is only an approximation. It’s most commonly used for 2x2 tables, although some software can run an extended form of the test for tables larger than 2x2, provided the categorical variables are ordinal (see the section Analyzing Ordinal Categorical Data with the Kendall Test).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset