Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.4 Testing Categorical Probabilities: Two-Way (Contingency) Table

In Section 8.3, we introduced the multinomial probability distribution and considered data classified according to a single criterion. We now consider multinomial experiments in which the data are classified according to two criteria—that is, classification with respect to two qualitative factors.

Consider a study similar to one in the Journal of Marketing on the impact of using celebrities in television advertisements. The researchers investigated the relationship between the gender of a viewer and the viewer’s brand awareness. Three hundred TV viewers were randomly selected and each asked to identify products advertised by male celebrity spokespersons. The data are summarized in the two-way table shown in Table 8.5. This table, called a contingency table, presents multinomial count data classified on two scales, or dimensions, of classification: gender of viewer and brand awareness.

Alternate View

Gender

Male Female Totals

**Brand Awareness** **Could Identify Product** 95 41 136

**Could Not Identify Product** 50 114 164

**Totals** 145 155 300

Data Set: CELEB

		Gender
Brand Awareness	Could Identify Product	95	41	136
	Could Not Identify Product	50	114	164
	Totals	145	155	300

Teaching Tip

Point out that the data have been collected under two classifications. Point out the difference between this type of data collection and the type used in the one-way tables.

The symbols representing the cell counts for the multinomial experiment in Table 8.5 are shown in Table 8.6a, and the corresponding cell, row, and column probabilities are shown in Table 8.6b. Thus, $n_{11}$ $n_{11}$ represents the number of viewers who are male and could identify the brand, and $p_{1} 1$ $p_{1} 1$ represents the corresponding cell probability. Note the symbols for the row and column totals and also the symbols for the probability totals. The latter are called marginal probabilities for each row and column. The marginal probability $p_{r 1}$ $p_{r 1}$ is the probability that a TV viewer identifies the product; the marginal probability $p_{c 1}$ $p_{c 1}$ is the probability that a TV viewer is male. Thus,

p_{r 1} = p_{11} + p_{12} and p_{c 1} = p_{11} + p_{21}

$p_{r 1} = p_{11} + p_{12} and p_{c 1} = p_{11} + p_{21}$

		Gender
Brand Awareness	Could Identify Product	$n_{11}$ $n_{11}$	$n_{21}$ $n_{21}$	$R_{1}$ $R_{1}$
	Could Not Identify Product	$n_{2}_{1}$ $n_{2}_{1}$	$n_{2}_{2}$ $n_{2}_{2}$	$R_{2}$ $R_{2}$
	Totals	$C_{1}$ $C_{1}$	$C_{2}$ $C_{2}$	`n`

		Gender
Brand Awareness	Could Identify Product	$p_{1}_{1}$ $p_{1}_{1}$	$p_{1}_{2}$ $p_{1}_{2}$	$p_{r 1}$ $p_{r 1}$
	Could Not Identify Product	$p_{2}_{1}$ $p_{2}_{1}$	$p_{2}_{2}$ $p_{2}_{2}$	$p_{r}_{2}$ $p_{r}_{2}$
	Totals	$p_{c 1}$ $p_{c 1}$	$p_{c}_{2}$ $p_{c}_{2}$	1

We can see, then, that this really is a multinomial experiment with a total of 300 trials, $(2) (2) = 4$ $(2) (2) = 4$ cells or possible outcomes, and probabilities for each cell as shown in Table 8.6b. Since the 300 TV viewers are randomly chosen, the trials are considered independent and the probabilities are viewed as remaining constant from trial to trial.

Suppose we want to know whether the two classifications of gender and brand awareness are dependent. That is, if we know the gender of the TV viewer, does that information give us a clue about the viewer’s brand awareness? In a probabilistic sense, we know (Chapter 3) that the independence of events A and B implies that $P (A B) = P (A) P (B) .$ $P (A B) = P (A) P (B) .$ Similarly, in the contingency table analysis, if the two classifications are independent, the probability that an item is classified into any particular cell of the table is the product of the corresponding marginal probabilities. Thus, under the hypothesis of independence, in Table 8.6b we must have

\begin{array}{l} p_{11} = P_{r 1} P_{c 1} p_{12} = P_{r 2} P_{c 2} \\ p_{21} = P_{r 2} P_{c 1} p_{22} = P_{r 2} P_{c 2} \end{array}

$\begin{array}{l} p_{11} = P_{r 1} P_{c 1} p_{12} = P_{r 2} P_{c 2} \\ p_{21} = P_{r 2} P_{c 1} p_{22} = P_{r 2} P_{c 2} \end{array}$

Teaching Tip

Explain that the expected counts are derived under the assumption of independence. (H₀ is true.) Do some examples to show the students how to calculate the expected counts.

To test the hypothesis of independence, we use the same reasoning employed in the one-dimensional tests of Section 8.3. First, we calculate the expected, or mean, count in each cell, assuming that the null hypothesis of independence is true. We do this by noting that the expected count in a cell of the table is just the total number of multinomial trials, n, times the cell probability. Recall that $n_{i j}$ $n_{i j}$ represents the observed count in the cell located in the ith row and jth column. Then the expected cell count for the upper left-hand cell (first row, first column) is

E_{11} = n p_{11}

$E_{11} = n p_{11}$

or, when the null hypothesis (the classifications are independent) is true,

E_{11} = n p_{r 1} p_{c 1}

$E_{11} = n p_{r 1} p_{c 1}$

Since these true probabilities are not known, we estimate $p_{r 1}$ $p_{r 1}$ and $p_{c 1}$ $p_{c 1}$ by the same proportions ${\hat{p}}_{r 1} = R_{1 / n}$ ${\hat{p}}_{r 1} = R_{1 / n}$ and ${\hat{p}}_{c 1} = C_{1 / n}$ ${\hat{p}}_{c 1} = C_{1 / n}$ . Thus, the estimate of the expected value $E_{11}$ $E_{11}$ is

{\hat{E}}_{11} = n (\frac{R_{1}}{n}) (\frac{C_{1}}{n}) = \frac{R_{1} C_{1}}{n}

${\hat{E}}_{11} = n (\frac{R_{1}}{n}) (\frac{C_{1}}{n}) = \frac{R_{1} C_{1}}{n}$

Similarly, for each i, j,

{\hat{E}}_{i j} = \frac{(Row total) (Column total)}{Total sample size}

${\hat{E}}_{i j} = \frac{(Row total) (Column total)}{Total sample size}$

Hence,

\begin{array}{l} \begin{matrix} {\hat{E}}_{12} \end{matrix} & = & \frac{R_{1} C_{2}}{n} \\ {\hat{E}}_{21} & = & \frac{R_{2} C_{1}}{n} \\ {\hat{E}}_{22} & = & \frac{R_{2} C_{2}}{n} \end{array}

$\begin{array}{l} \begin{matrix} {\hat{E}}_{12} \end{matrix} & = & \frac{R_{1} C_{2}}{n} \\ {\hat{E}}_{21} & = & \frac{R_{2} C_{1}}{n} \\ {\hat{E}}_{22} & = & \frac{R_{2} C_{2}}{n} \end{array}$

Finding Expected Cell Counts for a Two-Way Contingency Table

The estimate of the expected number of observations falling into the cell in row i and column j is given by

{\hat{E}}_{i j} = \frac{R_{i} C_{i}}{n}

${\hat{E}}_{i j} = \frac{R_{i} C_{i}}{n}$

where $R_{i} = total$ $R_{i} = total$ for row i, $C_{j} = total$ $C_{j} = total$ for column j, and $n = sample$ $n = sample$ size.

Using the data in Table 8.5, we find that

\begin{array}{l} \begin{matrix} {\hat{E}}_{11} \end{matrix} & = & \frac{R_{1} C_{1}}{n} = \frac{(136) (145)}{300} = 65.73 \\ {\hat{E}}_{12} & = & \frac{R_{1} C_{2}}{n} = \frac{(136) (155)}{300} = 70.27 \\ {\hat{E}}_{21} & = & \frac{R_{2} C_{1}}{n} = \frac{(164) (145)}{300} = 79.27 \\ {\hat{E}}_{22} & = & \frac{R_{2} C_{2}}{n} = \frac{(164) (155)}{300} = 84.73 \end{array}

$\begin{array}{l} \begin{matrix} {\hat{E}}_{11} \end{matrix} & = & \frac{R_{1} C_{1}}{n} = \frac{(136) (145)}{300} = 65.73 \\ {\hat{E}}_{12} & = & \frac{R_{1} C_{2}}{n} = \frac{(136) (155)}{300} = 70.27 \\ {\hat{E}}_{21} & = & \frac{R_{2} C_{1}}{n} = \frac{(164) (145)}{300} = 79.27 \\ {\hat{E}}_{22} & = & \frac{R_{2} C_{2}}{n} = \frac{(164) (155)}{300} = 84.73 \end{array}$

These estimated expected values are more easily obtained using computer software. Figure 8.5 is a MINITAB printout of the analysis, with the expected values highlighted.

MINITAB contingency table analysis of data in Table 8.5

We now use the $χ^{2}$ $χ^{2}$ statistic to compare the observed and expected (estimated) counts in each cell of the contingency table:

\begin{array}{l} X^{2} & = & \frac{{[n_{11} - {\hat{E}}_{11}]}^{2}}{{\hat{E}}_{11}} + \frac{{[n_{12} - {\hat{E}}_{12}]}^{2}}{{\hat{E}}_{12}} + \frac{{[n_{21} - {\hat{E}}_{21}]}^{2}}{{\hat{E}}_{21}} + \frac{{[n_{22} - {\hat{E}}_{22}]}^{2}}{{\hat{E}}_{22}} \\ = & \sum \frac{{[n_{i j} - {\hat{E}}_{i j}]}^{2}}{{\hat{E}}_{i j}} \end{array}

$\begin{array}{l} X^{2} & = & \frac{{[n_{11} - {\hat{E}}_{11}]}^{2}}{{\hat{E}}_{11}} + \frac{{[n_{12} - {\hat{E}}_{12}]}^{2}}{{\hat{E}}_{12}} + \frac{{[n_{21} - {\hat{E}}_{21}]}^{2}}{{\hat{E}}_{21}} + \frac{{[n_{22} - {\hat{E}}_{22}]}^{2}}{{\hat{E}}_{22}} \\ = & \sum \frac{{[n_{i j} - {\hat{E}}_{i j}]}^{2}}{{\hat{E}}_{i j}} \end{array}$

(Note: The use of $Σ$ $Σ$ in the context of a contingency table analysis refers to a sum over all cells in the table.)

Teaching Tip

Point out the similarities between the test for independence and the one-way test in the last section. Give a computer example to illustrate how the p-value can be used in both types of problems.

Substituting the data of Table 8.6 and the expected values into this expression, we get

χ^{2} = \frac{(95 - 65.73)^{2}}{65.73} + \frac{(41 - 70.27)^{2}}{70.27} + \frac{(50 - 79.27)^{2}}{79.27} + \frac{(114 - 84.73)^{2}}{84.73} = 46.14

$χ^{2} = \frac{(95 - 65.73)^{2}}{65.73} + \frac{(41 - 70.27)^{2}}{70.27} + \frac{(50 - 79.27)^{2}}{79.27} + \frac{(114 - 84.73)^{2}}{84.73} = 46.14$

Note that this value is also shown (highlighted) in Figure 8.5.

Large values of $χ^{2}$ $χ^{2}$ imply that the observed counts do not closely agree and hence that the hypothesis of independence is false. To determine how large $χ^{2}$ $χ^{2}$ must be before it is too large to be attributed to chance, we make use of the fact that the sampling distribution of $χ^{2}$ $χ^{2}$ is approximately a $χ^{2}$ $χ^{2}$ probability distribution when the classifications are independent.

When testing the null hypothesis of independence in a two-way contingency table, the appropriate degrees of freedom will be $(r - 1) (c - 1),$ $(r - 1) (c - 1),$ where r is the number of rows and c is the number of columns in the table. For the brand awareness example, the number of degrees of freedom for $χ^{2}$ $χ^{2}$ is $(r - 1) (c - 1) = (2 - 1) (2 - 1) = 1.$ $(r - 1) (c - 1) = (2 - 1) (2 - 1) = 1.$ Then, for $α = .05,$ $α = .05,$ we reject the hypothesis of independence when

χ^{2} > χ_{.05}^{2} = 3.84146

$χ^{2} > χ_{.05}^{2} = 3.84146$

Since the computed $χ^{2} = 46.14$ $χ^{2} = 46.14$ exceeds the value 3.84146, we conclude that viewer gender and brand awareness are dependent events. This result may also be obtained by noting that the p-value of the test (highlighted on Figure 8.5) is approximately 0.

The pattern of dependence can be seen more clearly by expressing the data as percentages. We first select one of the two classifications to be used as the base variable. In the preceding example, suppose we select gender of the TV viewer as the classificatory variable to be the base. Next, we represent the responses for each level of the second categorical variable (brand awareness here) as a percentage of the subtotal for the base variable. For example, from Table 8.5, we convert the response for males who identify the brand (95) to a percentage of the total number of male viewers (145). That is,

(\frac{95}{145}) 100 % = 65.5 %

$(\frac{95}{145}) 100 % = 65.5 %$

All of the entries in Table 8.3 are similarly converted, and the values are shown in Table 8.7. The value shown at the right of each row is the row’s total, expressed as a percentage of the total number of responses in the entire table. Thus, the percentage of TV viewers who identify the product is $(\frac{136}{300}) 100 % = 45.3 %$ $(\frac{136}{300}) 100 % = 45.3 %$ (rounded to the nearest percent).

		Gender
Brand Awareness	Could Identify Product	65.5	26.5	45.3
	Could Not Identify Product	34.5	73.5	54.7
	Totals	100	100	100

If the gender and brand awareness variables are independent, then the percentages in the cells of the table are expected to be approximately equal to the corresponding row percentages. Thus, we would expect the percentage of viewers who identify the brand for each gender to be approximately 45% if the two variables are independent. The extent to which each gender’s percentage departs from this value determines the dependence of the two classifications, with greater variability of the row percentages meaning a greater degree of dependence. A plot of the percentages helps summarize the observed pattern. In the SPSS bar graph in Figure 8.6, we show the gender of the viewer (the base variable) on the horizontal axis and the percentage of TV viewers who identify the brand (green bars) on the vertical axis. The “expected” percentage under the assumption of independence is shown as a horizontal line.

Figure 8.6 clearly indicates the reason that the test resulted in the conclusion that the two classifications in the contingency table are dependent. The percentage of male TV viewers who identify the brand promoted by a male celebrity is more than twice as high as the percentage of female TV viewers who identify the brand. Statistical measures of the degree of dependence and procedures for making comparisons of pairs of levels for classifications are beyond the scope of this text, but can be found in the references. We will utilize descriptive summaries such as Figure 8.6 to examine the degree of dependence exhibited by the sample data.

SPSS bar graph showing percent of viewers who identify TV product

The general form of a two-way contingency table containing r rows and c columns (called an $r \times c$ $r \times c$ contingency table) is shown in Table 8.8. Note that the observed count in the ijth cell is denoted by $n_{i j},$ $n_{i j},$ the ith row total is $r_{i},$ $r_{i},$ the jth column total is $c_{j},$ $c_{j},$ and the total sample size is n. Using this notation, we give the general form of the contingency table test for independent classifications in the box.

		Column
	1	$n_{11}$ $n_{11}$	$n_{21}$ $n_{21}$	$\dots$ $\dots$	$n_{1 c}$ $n_{1 c}$	$R_{1}$ $R_{1}$
Row	2	$n_{21}$ $n_{21}$	$n_{22}$ $n_{22}$	$\dots$ $\dots$	$n_{2 c}$ $n_{2 c}$	$R_{2}$ $R_{2}$
	$⋮$ $⋮$	$⋮$ $⋮$	$⋮$ $⋮$		$⋮$ $⋮$	$⋮$ $⋮$
	r	$n_{r 1}$ $n_{r 1}$	$n_{r 2}$ $n_{r 2}$	$\dots$ $\dots$	$n_{r c}$ $n_{r c}$	$R_{r}$ $R_{r}$
Column Totals	$C_{1}$ $C_{1}$	$C_{2}$ $C_{2}$	$\dots$ $\dots$	$C_{c}$ $C_{c}$	`n`

General Form of a Two-Way (Contingency) Table Analysis: A Test for Independence

$H_{0} :$ $H_{0} :$ The two classifications are independent
$H_{a} :$ $H_{a} :$ The two classifications are dependent

$T e s t s t a t i s t i c : χ_{c}^{2} = \sum \frac{{[n_{i j} - {\hat{E}}_{i j}]}^{2}}{{\hat{E}}_{i j}}$ $T e s t s t a t i s t i c : χ_{c}^{2} = \sum \frac{{[n_{i j} - {\hat{E}}_{i j}]}^{2}}{{\hat{E}}_{i j}}$

where ${\hat{E}}_{i j} = \frac{R_{i} C_{j}}{n}$ ${\hat{E}}_{i j} = \frac{R_{i} C_{j}}{n}$

Rejection region: $χ_{c}^{2} > χ_{α}^{2},$ $χ_{c}^{2} > χ_{α}^{2},$ where $χ_{α}^{2}$ $χ_{α}^{2}$ has $(r - 1) (c - 1) d f$ $(r - 1) (c - 1) d f$

p-value: $P (χ^{2} < χ_{c}^{2})$ $P (χ^{2} < χ_{c}^{2})$

Conditions Required for a Valid $χ^{2}$ $χ^{2}$ Test: Contingency Tables

The n observed counts are a random sample from the population of interest. We may then consider this to be a multinomial experiment with $r \times c$ $r \times c$ possible outcomes.
The sample size n will be large enough so that, for every cell, the expected count $\hat{E} (n_{i j})$ $\hat{E} (n_{i j})$ will be equal to 5 or more.

Example 8.6 Conducting a Two-Way Analysis—Marital Status and Religion

Problem

A social scientist wants to determine whether the marital status (divorced or not divorced) of U.S. men is independent of their religious affiliation (or lack thereof). A sample of 500 U.S. men is surveyed, and the results are tabulated as shown in Table 8.9.

Table 8.9 Survey Results (Observed Counts), Example 8.6

Alternate View

Religious Affiliation

A B C D None Totals

Marital Status Divorced 39 19 12 28 18 116

Married, never divorced 172 61 44 70 37 384

Totals 211 80 56 98 55 500

Data Set: MARREL
1. Test to see whether there is sufficient evidence to indicate that the marital status of men who have been or are currently married is dependent on religious affiliation. Take $α = .01 .$ $α = .01 .$
2. Graph the data and describe the patterns revealed. Is the result of the test supported by the graph?

		Religious Affiliation
Marital Status	Divorced	39	19	12	28	18	116
	Married, never divorced	172	61	44	70	37	384
	Totals	211	80	56	98	55	500

Solution

SAS contingency table printout for Example 8.6

The first step is to calculate estimated expected cell frequencies under the assumption that the classifications are independent. Rather than compute these values by hand, we resort to a computer. The SAS printout of the analysis of Table 8.9 is displayed in Figure 8.7, each cell of which contains the observed (top) and expected (bottom) frequency in that cell. Note that ${\hat{E}}_{11},$ ${\hat{E}}_{11},$ the estimated expected count for the Divorced, A cell, is 48.952. Similarly, the estimated expected count for the Divorced, B cell, is ${\hat{E}}_{12} = 18.56.$ ${\hat{E}}_{12} = 18.56.$ Since all the estimated expected cell frequencies are greater than 5, the $χ^{2}$ $χ^{2}$ approximation for the test statistic is appropriate. Assuming that the men chosen were randomly selected from all married or previously married American men, the characteristics of the multinomial probability distribution are satisfied.

Figure 8.8

SAS side-by-side bar graphs showing percentage of divorced and never divorced males by religion

The null and alternative hypotheses we want to test are
$\begin{array}{l} H_{0} : The marital status of U . S . men and their religious affiliation are independent \\ H_{a} : The marital status of U . S . men and their religious affiliation are dependent \end{array}$ $\begin{array}{l} H_{0} : The marital status of U . S . men and their religious affiliation are independent \\ H_{a} : The marital status of U . S . men and their religious affiliation are dependent \end{array}$
The test statistic, $χ^{2} = 7.135,$ $χ^{2} = 7.135,$ is highlighted at the bottom of the printout, as is the observed significance level (p-value) of the test. Since $α = .01$ $α = .01$ is less than $p = .129,$ $p = .129,$ we fail to reject $H_{0};$ $H_{0};$ that is, we cannot conclude that the marital status of U.S. men depends on their religious affiliation. (Note that we could not reject $H_{0}$ $H_{0}$ even with $α = .10 .$ $α = .10 .$ )
The marital status frequencies can be expressed as percentages of the number of men in each religious affiliation category. The expected percentage of divorced men under the assumption of independence is $(\frac{116}{500}) 100 % = 23 % .$ $(\frac{116}{500}) 100 % = 23 % .$ A SAS graph of the percentages is shown in Figure 8.8. Note that the percentages of divorced men (see the bars in the “DIVORCED” block of the SAS graph) deviate only slightly from that expected under the assumption of independence, supporting the result of the test in part a. That is, neither the descriptive bar graph nor the statistical test provides evidence that the male divorce rate depends on (varies with) religious affiliation.

Now Work Exercises 8.61 & 8.62

Contingency Tables with Fixed Marginals

In the Journal of Marketing study on celebrities in TV ads, a single random sample was selected from the target population of all TV viewers and the outcomes—values of gender and brand awareness—were recorded for each viewer. For this type of study, the researchers had no a priori knowledge of how many observations would fall into the categories of the qualititative variables. In other words, prior to obtaining the sample, the researchers did not know how many males or how many brand identifiers would make up the sample. Oftentimes, it is advantageous to select a random sample from each of the levels of one of the qualitative variables.

For example, in the Journal of Marketing study, the researchers may want to be sure of an equivalent number of males and females in their sample. Consequently, they will select independent random samples of 150 males and 150 females. (In fact, this was the sampling plan for the actual study.) Summary data for this type of study yield a contingency table with fixed marginals since the column totals for one qualitative variable (e.g., gender) are known in advance.* The goal of the analysis does not change—determine whether the two qualitative variables (e.g., gender and brand awareness) are dependent.

The procedure for conducting a chi-square analysis for a contingency table with fixed marginals is identical to the one outlined above, since it can be shown (proof omitted) that the $χ^{2}$ $χ^{2}$ test statistic for this type of sampling also has an approximate chi-square distribution with $(r - 1) (c - 1)$ $(r - 1) (c - 1)$ degrees of freedom. One reason why you might choose this alternative sampling plan is to obtain sufficient observations in each cell of the contingency table to ensure that the chi-square approximation is valid. Remember, this will usually occur when the expected cell counts are all greater than or equal to 5. By selecting a large sample (150 observations) for each gender in the Journal of Marketing study, the researchers improved the odds of obtaining large expected cell counts in the contingency table.

Statistics in Action Revisited

Testing whether Likelihood of a Lawsuit Is Related to Recall Notice Sender

We return to the case involving tainted transplant tissue (see p. 450). Recall that a processor of the tainted tissue filed a lawsuit against a tissue distributor, claiming that the distributor was more responsible for paying damages to litigating transplant patients. Why? Because the distributor in question had sent recall notices (as required by the FTC) to hospitals and surgeons with unsolicited newspaper articles describing in graphic detail the “ghoulish” acts that had been committed. According to the processor, by including the articles in the recall package, this distributor inflamed the tissue recipients, increasing the likelihood that patients would file a lawsuit.

To prove its case in court, the processor needed to establish a statistical link between the likelihood of a lawsuit and the sender of the recall notice. More specifically, can the processor show that the probability of a lawsuit is higher for those patients of surgeons who received the recall notice with the inflammatory articles than for those patients of surgeons who received only the recall notice?

A statistician, serving as an expert consultant for the processor, reviewed data for the 7,914 patients who received recall notices (of which 708 filed suit). These data are saved in the GHOUL1 file. For each patient, the file contains information on the SENDER of the recall notice (Processor or Distributor) and whether a LAWSUIT was filed (Yes or No). Since both of these variables are qualitative and we want to know whether the probability of a LAWSUIT depends on the SENDER of the recall notice, a contingency table analysis is appropriate.

Figure SIA8.1 shows the MINITAB contingency table analysis. The null and alternative hypotheses for the test are

\begin{array}{l} H_{0} : Lawsuit and Sender are independent \\ H_{a} : Lawsuit and Sender are dependent \end{array}

$\begin{array}{l} H_{0} : Lawsuit and Sender are independent \\ H_{a} : Lawsuit and Sender are dependent \end{array}$

Both the chi-square test statistic (100.5) and p-value of the test (.000) are highlighted on the printout. If we conduct the test at $α = .01$ $α = .01$ , there is sufficient evidence to reject H₀. That is, the data provide evidence to indicate that the likelihood of a tainted transplant patient filing a lawsuit is associated with the sender of the recall notice.

To determine which sender had the higher percentage of patients to file a lawsuit, examine the row percentages (highlighted) in the contingency table of Figure SIA8.1 You can see that of the 1,751 patients sent recall notices by the processor, 51 (or 2.91%) filed lawsuits. In contrast, of the 6,163 patients sent recall notices by the distributor in question, 657 (or 10.66%) filed lawsuits. Thus, the probability of a patient filing a lawsuit is almost five times higher for the distributor’s patients than for the processor’s patients.

MINITAB Contingency Table Analysis—Likelihood of Lawsuit vs. Recall Notice Sender

Before testifying on these results in court, the statistician decided to do one additional analysis: He eliminated from the sample data any patients whose surgeon had been sent notices by both parties. Why? Since these patients’ surgeons received both recall notices, the underlying reason for filing a lawsuit would be unclear. Did the patient file simply because he or she received tainted transplant tissue, or was the filing motivated by the inflammatory articles that accompanied the recall notice? After eliminating these patients, the data looked like those shown in Table SIA8.2. A MINITAB contingency table analysis on this reduced data set (saved in the GHOUL2 file) is shown in Figure SIA8.2.

Like in the previous analysis, the chi-square test statistic (110.2) and p-value of the test (.000)—both highlighted on the printout—imply that the likelihood of a tainted transplant patient filing a lawsuit is associated with the sender of the recall notice, at $α = .01$ $α = .01$ . Also, the percentage of patients filing lawsuits when sent a recall notice by the distributor (10.62%) is again five times higher than the percentage of patients filing lawsuits when sent a recall notice by the processor (2.04%).

The results of both analyses were used to successfully support the processor’s claim in court. Nonetheless, we need to point out one caveat to the contingency table analyses. Be careful not to conclude that the data are proof that the inclusion of the inflammatory articles caused the probability of litigation to increase. Without controlling all possible variables that may be related to filing a lawsuit (e.g., a patient’s socioeconomic status, whether a patient has filed a lawsuit in the past), we can only say that the two qualitative variables, lawsuit status and recall notice sender, are statistically associated. However, the fact that the likelihood of a lawsuit is almost five times higher when the notice is sent by the distributor shifts the burden of proof to the distributor to explain why this occurred and to convince the court that it should not be held accountable for paying the majority of the damages.

Recall Notice Sender	Number of Patients	Number of Lawsuits
Processor/Other Distributor	1,522	31
Distributor in Question	5,705	606
Totals:	7,227	637

Alternative analysis: As mentioned in Section 8.3, a $2 \times 2$ $2 \times 2$ contingency table analysis is equivalent to a comparison of two population proportions. In the tainted tissue case, we want to compare p₁, the proportion of lawsuits filed by patients who were sent recall notices by the processor, to p₂, the proportion of lawsuits filed by patients who were sent recall notices by the distributor that included the inflammatory articles. Both a test of the null hypothesis, $H_{0} : (p_{1} - p_{2}) = 0$ $H_{0} : (p_{1} - p_{2}) = 0$ , and a 95% confidence interval for the difference, $(p_{1} - p_{2})$ $(p_{1} - p_{2})$ , using the reduced sample data are shown (highlighted) on the MINITAB printout, Figure SIA8.3.

MINITAB Contingency Table Analysis, with Dual Recall Notices Eliminated

The p-value for the test (.000) indicates that the two proportions are significantly different at $α = .05$ $α = .05$ . The 95% confidence interval, $(- .097, - .075)$ $(- .097, - .075)$ , shows that the proportion of lawsuits associated with patients who were sent recall notices from the distributor ranges between .075 and .097 higher than the corresponding proportion for the processor. Both results support the processor’s case, namely, that the patients who were sent recall notices with the inflammatory news articles were more likely to file a lawsuit than those who were sent only recall notices.

Exercises 8.54–8.78

Understanding the Principles

8.54 What is a two-way (contingency) table?
8.55 What is a contingency table with fixed marginals?
8.56 True or False. One goal of a contingency table analysis is to determine whether the two classifications are independent or dependent.

True
8.57 What conditions are required for a valid chi-square test of data from a contingency table?

Learning the Mechanics

8.58 Find the rejection region for a test of independence of two classifications for which the contingency table contains r rows and c columns and
1. $\begin{array}{l} r = 5, & c = 5, & α = .05 & χ^{2} > 26.2962 \end{array}$ $\begin{array}{l} r = 5, & c = 5, & α = .05 & χ^{2} > 26.2962 \end{array}$
2. $\begin{array}{l} r = 3, & c = 6, & α = .10 & χ^{2} > 15.9871 \end{array}$ $\begin{array}{l} r = 3, & c = 6, & α = .10 & χ^{2} > 15.9871 \end{array}$
3. $\begin{array}{l} r = 2, & c = 3, & α = .01 & χ^{2} > 9.21034 \end{array}$ $\begin{array}{l} r = 2, & c = 3, & α = .01 & χ^{2} > 9.21034 \end{array}$
8.59 Consider the following $2 \times 3$ $2 \times 3$ (i.e., $r = 2$ $r = 2$ and $c = 3$ $c = 3$ ) contingency table:

Alternate View

Column

1 2 3

Row 1 9 34 53

2 16 30 25
1. Specify the null and alternative hypotheses that should be used in testing the independence of the row and column classifications.
  
  $H_{0} : Row and Column are independent$ $H_{0} : Row and Column are independent$
2. Specify the test statistic and the rejection region that should be used in conducting the hypothesis test of part a. Use $α = .01 .$ $α = .01 .$
  
  $χ^{2} > 9.21034$ $χ^{2} > 9.21034$
3. Assuming that the row classification and the column classification are independent, find estimates for the expected cell counts.
4. Conduct the hypothesis test of part a. Interpret your result.
  
  $χ^{2} = 8.71$ $χ^{2} = 8.71$
8.60 Refer to Exercise 8.59 .
1. Convert the frequency responses to percentages by calculating the percentage of each column total falling in each row. Also, convert the row totals to percentages of the total number of responses. Display the percentages in a table.
2. Create a bar graph with row 1 percentage on the vertical axis and column number on the horizontal axis. Show the row 1 total percentage as a horizontal line on the graph.
3. What pattern do you expect to see if the rows and columns are independent? Does the plot support the result of the test of independence in Exercise 8.59 ?

		Column
Row	1	9	34	53
	2	16	30	25

8.61 Test the null hypothesis of independence of the two classifications A and B of the $3 \times 3$ $3 \times 3$ contingency table shown here. Use $α = .05 .$ $α = .05 .$

8.62 Refer to Exercise 8.61 . Convert the responses to percentages by calculating the percentage of each B class total falling into each A classification. Also, calculate the percentage of the total number of responses that constitute each of the A classification totals.
1. Create a bar graph with row $A_{1}$ $A_{1}$ percentage on the vertical axis and B classification on the horizontal axis. Does the graph support the result of the test of hypothesis in Exercise 8.61 ? Explain.
2. Repeat part a for the row $A_{2}$ $A_{2}$ percentages.
3. Repeat part a for the row $A_{3}$ $A_{3}$ percentages.

		B
	$A_{1}$ $A_{1}$	40	72	42
A	$A_{2}$ $A_{2}$	63	53	70
	$A_{3}$ $A_{3}$	31	38	30

Applying the Concepts—Basic

MAPDOG MAPTV 8.63 Children’s perceptions of their neighborhood. In Health Education Research (Feb. 2005), nutrition scientists at Deakin University (Australia) investigated children’s perceptions of their environments. Each in a sample of 147 ten-year-old children drew maps of their home and neighborhood environment. The researchers examined the maps for certain themes (e.g., presence of a pet, television in the bedroom, opportunities for physical activity). The results, broken down by gender, for two themes (presence of a dog and TV in the bedroom) are shown in the next two tables.
1. Find the sample proportion of boys who drew a dog on their maps.
  
  .078
2. Find the sample proportion of girls who drew a dog on their maps.
  
  .157
3. Compare the proportions you found in parts a and b. Does it appear that the likelihood of drawing a dog on the neighborhood map depends on gender?
4. Give the null hypothesis for testing whether the likelihood of a drawing a dog on the neighborhood map depends on gender.
  
  $H_{0} : Dog and Gender are independent$ $H_{0} : Dog and Gender are independent$
5. Use the MINITAB printout below to conduct the test mentioned in part d at $α = .05 .$ $α = .05 .$
  
  $χ^{2} = 2.25$ $χ^{2} = 2.25$
6. Conduct a test to determine whether the likelihood of drawing a TV in the bedroom is different for boys and girls. Use $α = .05 .$ $α = .05 .$
  
  $χ^{2} = .064; reject H_{0}$ $χ^{2} = .064; reject H_{0}$
Presence of a Dog Number of Boys Number of Girls

Yes 6 11

No 71 59

Total 77 70

Presence of TV in Bedroom Number of Boys Number of Girls

Yes 11 9

No 66 61

Total 77 70

Based on Hume, C., Salmon, J., and Ball, K. “Children’s perceptions of their home and neighborhood environments, and their association with objectively measured physical activity: A qualitative and quantitative study.” Health Education Research, Vol. 20, No. 1, Feb. 2005 (Table III).
8.64 Eyewitnesses and mug shots. Applied Psychology in Criminal Justice (Apr. 2010) published a study of mug shot choices by eyewitnesses to a crime. A sample of Exercise 10.107 (p. 570). A sample of 96 college students was shown a video of a simulated theft, then asked to select the mug shot that most closely resembled the thief. The students were randomly assigned to view either 3, 6, or 12 mug shots at a time, with 32 students in each group. The number of students in the 3-, 6-, or 12-photos-per-page groups who selected the target mugshot were 19, 19, and 15, respectively.
1. For each photo group, compute the proportion of students who selected the target mug shot. Which group yielded the lowest proportion?
  
  .594, .594, .469
2. Create a contingency table for these data, with photo group in the rows and whether or not the target mug shot was selected in the columns.
3. Refer to part b. Are there differences in the proportions who selected the target mug shot among the three photo groups? Test, using $α = .10$ $α = .10$ .
  
  $χ^{2} = 1.35; do not reject H_{0}$ $χ^{2} = 1.35; do not reject H_{0}$
NEWS 8.65 Stereotyping deceptive and authentic news stories. Major newspapers lose their credibility (and subscribers) when they are found to have published deceptive or misleading news stories. In Journalism and Mass Communication Quarterly (Summer 2007), University of Texas researchers investigated whether certain stereotypes (e.g., negative references to certain nationalities) occur more often in deceptive news stories than in authentic news stories. The researchers analyzed 183 news stories that were proven to be deceptive in nature and 128 news stories that were considered authentic. Specifically, the researchers determined whether each story was negative, neutral, or positive in tone. The accompanying table gives the number of news stories found in each tone category.

Authentic News Stories Deceptive News Stories

Negative Tone 59 111

Neutral Tone 49 61

Positive Tone 20 11

Total 128 183

Based on Lasorsa, D., and Dai, J. “When news reporters deceive: The production of stereotypes.” Journalism and Mass Communication Quarterly, Vol. 84, No. 2, Summer 2007 (Table 2).
1. Find the sample proportion of negative tone news stories that is deceptive.
  
  .653
2. Find the sample proportion of neutral news stories that is deceptive.
  
  .555
3. Find the sample proportion of positive news stories that is deceptive.
  
  .355
4. Compare the sample proportions, parts a–c. Does it appear that the proportion of news stories that is deceptive depends on story tone?
  
  Yes
5. Give the null hypothesis for testing whether the authenticity of a news story depends on tone.
6. Use the SPSS printout in the next column to conduct the test, part e. Test at $α = .05$ $α = .05$ .
  
  $χ^{2} = 10.43; reject H_{0}$ $χ^{2} = 10.43; reject H_{0}$
HEAL 8.66 Healing heart patients with music, imagery, touch, and prayer. “Frontier medicine” is a term used to describe medical therapies (e.g., energy healing, therapeutic prayer, spiritual healing) for which there is no plausible explanation. The Lancet (July 16, 2005) published the results of a study designed to test the effectiveness of two types of frontier medicine—music, imagery, and touch (MIT) therapy and therapeutic prayer—in healing cardiac care patients. Patients were randomly assigned to receive one of four types of treatment: (1) prayer, (2) MIT, (3) prayer and MIT, and (4) standard care (no prayer and no MIT). Six months after therapy, the patients were evaluated for a major adverse cardiovascular event (e.g., a heart attack). The results of the study are summarized in the accompanying table.

Alternate View

Therapy Number of Patients with Major Cardiovascular Events Number of Patients with No Events Total

Prayer 43 139 182

MIT 47 138 185

Prayer and MIT 39 150 189

Standard 50 142 192

Based on Krucoff, M. W., et al. “Music, imagery, touch, and prayer as adjuncts to interventional cardiac care: The Monitoring and Actualization of Noetic Trainings (MANTRA) II randomized study.” The Lancet, Vol. 366, July 16, 2005 (Table 4).
1. Identify the two qualitative variables (and associated levels) measured in the study.
2. State $H_{0}$ $H_{0}$ and $H_{a}$ $H_{a}$ for testing whether a major adverse cardiovascular event depends on type of therapy.
3. Use the MINITAB printout on p. 794 to conduct the test mentioned in part b at $α = .10 .$ $α = .10 .$ On the basis of this test, what can the researchers infer about the effectiveness of music, imagery, and touch therapy and the effectiveness of healing prayer in heart patients?
MINITAB Output for Exercise 8.66
FOODQL 8.67 Package design influences taste. Can the package design of a food product influence how the consumer will rate the taste of the product? A team of experimental psychologists reported on a study that examined how rounded or angular package shapes and high- or low-pitched sounds can convey information about the taste (sweetness and sourness) of a product (Food Quality and Preference, June 2014). Study participants were presented with one of two types of packaging displayed on a computer screen monitor: rounded shape with a low-pitched sound or angular shape with a high-pitched sound. Assume that half of the participants viewed the rounded packaging and half viewed the angular packaging. After viewing the product, each participant rated whether the packaging was more appropriate for either a sweet- or a sour-tasting food product. A summary of the results (numbers of participants) for a sample of 80 participants is shown in the following contingency table. (These data are simulated based on the results reported in the article.)

Alternate View

Package Design/Pitch

Angular/High Rounded/Low

Taste

Choice
Sweet 35 7

Sour 5 33
1. Specify the null and alternative hypotheses for testing whether the package design and sound pitch combination influences the consumer’s opinion on the product taste.
  
  $H_{0} : Design/Pitch and taste choice are independent$ $H_{0} : Design/Pitch and taste choice are independent$
2. Assuming the null hypothesis is true, find the expected number in each cell of the table.
3. Use the expected numbers and observed counts in the table to compute the chi-square test statistic.
  
  39.3
4. The observed significance level (p-value) of the test is approximately 0. (You can verify this result using statistical software.) For any reasonably chosen value of $α$ $α$ , give the appropriate conclusion.
  
  $Reject H_{0}$ $Reject H_{0}$

Presence of a Dog	Number of Boys	Number of Girls
Yes	6	11
No	71	59
Total	77	70

Presence of TV in Bedroom	Number of Boys	Number of Girls
Yes	11	9
No	66	61
Total	77	70

	Authentic News Stories	Deceptive News Stories
Negative Tone	59	111
Neutral Tone	49	61
Positive Tone	20	11
Total	128	183

Therapy	Number of Patients with Major Cardiovascular Events	Number of Patients with No Events	Total
Prayer	43	139	182
MIT	47	138	185
Prayer and MIT	39	150	189
Standard	50	142	192

		Package Design/Pitch
Taste Choice	Sweet	35	7
Sour	5	33

Applying the Concepts—Intermediate

ATC 8.68 “Cry wolf” effect in air traffic controlling. Researchers at Alion Science Corporation and New Mexico State University collaborated on a study of how air traffic controllers respond to false alarms (Human Factors, Aug. 2009). The researchers theorize that the high rate of false alarms regarding midair collisions leads to the “cry wolf” effect, i.e., the tendency for air traffic controllers to ignore true alerts in the future. The investigation examined data on a random sample of 437 conflict alerts. Each alert was first classified as a “true” or “false” alert. Then, each was classified according to whether there was a human controller response to the alert. The number of the 437 alerts that fall into each of the combined categories is given as follows: True alert/No response–3; True alert/Response–231; False alert/No response–37; False alert/Response–166. This summary information is saved in the ATC file. Do the data indicate that the response rate of air traffic controllers to midair collision alarms differs for true and false alerts? Test using $α = .05$ $α = .05$ . What inference can you make concerning the “cry wolf” effect?

$Yes; χ^{2} = 37.53$ $Yes; χ^{2} = 37.53$

Based on Wickens, C. D., Rice, S., Keller, D., Hutchins, S., Hughes, J., and Clayton, K., “False alerts in air traffic control conflict alerting system: Is there a ‘cry wolf’ effect?” Human Factors, Vol. 51, Issue 4, Aug. 2009 (Table 2).

ADD 8.69 Influencing performance in a serial addition task. Refer to the Advances in Cognitive Psychology (Jan. 2013) study of performance in a classic psychological test involving adding a set of numbers, Exercise 8.12 (p. 456). Recall that 300 undergraduate students were given a serial addition task, with the numbers $(1, 000 +$ $(1, 000 +$ $40 + 1, 000 + 30 + 1, 000 + 20 + 1, 000 + 10)$ $40 + 1, 000 + 30 + 1, 000 + 20 + 1, 000 + 10)$ presented on a computer screen. However, the students were divided into five groups of 60 students each, and the presentation of the numbers (total display time and color of the number 1,000) was varied in each group: Group 1—2 seconds/black, Group 2—2 seconds/bright red, Group 3—15 seconds/bright red, Group 4—1 second/black, and Group 5—15 seconds/black. The number of students who gave the correct response of 4,100 was determined as well as the number who gave an incorrect response. The results are summarized (number of responses) in the accompanying table. Does presentation of the numbers to be added influence performance (correct response rate) on the serial addition task? Test, using $α = .01$ $α = .01$ .

Alternate View

Presentation Responses of 4,100 Responses of 5,000 Other Incorrect Responses

**Group 1 (2 sec/black)** 17 33 10

**Group 2 (2 sec/red)** 8 42 10

**Group 3 (15 sec/red)** 12 36 12

**Group 4 (1 sec/black)** 13 42 5

**Group 5 (15 sec/black)** 20 31 9

Source: Giannouli, V. “Number perseveration in healthy subjects: Does prolonged stimulus exposure influence performance on a serial addition task?” Advances in Cognitive Psychology, Vol. 9, No. 1, Jan. 2013 (adapted from Table 3).

NAWIC 8.70 Job satisfaction of women in construction. The hiring of women in construction and construction-related jobs has steadily increased over the years. A study was conducted to provide employers with information designed to reduce the potential for turnover of female employees (Journal of Professional Issues in Engineering Education & Practice, Apr. 2013). A survey questionnaire was e-mailed to members of the National Association of Women in Construction (NAWIC). A total of 447 women responded to survey questions on job challenge and satisfaction with life as an employee. The results (number of females responding in the different categories) are summarized in the accompanying table. What conclusions can you draw from the data regarding the association between an NAWIC member’s satisfaction with life as an employee and her satisfaction with job challenge?

$χ^{2} = 73.98$ $χ^{2} = 73.98$

Alternate View

Life as an Employee

Satisfied Dissatisfied

Job Challenge Satisfied 364 33

Dissatisfied 24 26

Source: Malone, E. K., and Issa, R. A. “ Work-life balance and organizational commitment of women in the U.S. construction industry.” Journal of Professional Issues in Engineering Education & Practice, Vol. 139, No. 2, Apr. 2013 (Table 11).

Presentation	Responses of 4,100	Responses of 5,000	Other Incorrect Responses
Group 1 (2 sec/black)	17	33	10
Group 2 (2 sec/red)	8	42	10
Group 3 (15 sec/red)	12	36	12
Group 4 (1 sec/black)	13	42	5
Group 5 (15 sec/black)	20	31	9

		Life as an Employee
Job Challenge	Satisfied	364	33
Dissatisfied	24	26

E4ALL 8.71 Detecting Alzheimer’s disease at an early age. Refer to the Neuropsychology (Jan. 2007) study of whether the cognitive effects of Alzheimer’s disease can be detected at an early age, Exercise 8.49 (p. 468). Recall that a particular strand of DNA was classified into one of three genotypes: ${E4}^{+} /E 4^{+}, E 4^{+} /E 4^{-}$ ${E4}^{+} /E 4^{+}, E 4^{+} /E 4^{-}$ , and ${E4}^{-} /E 4^{-}$ ${E4}^{-} /E 4^{-}$ . In addition to a sample of 2,097 young adults (20–24 years), two other age groups were studied: a sample of 2,182 middle-aged adults (40–44 years) and a sample of 2,281 elderly adults (60–64 years). The accompanying table gives a breakdown of the number of adults with the three genotypes in each age category for the total sample of 6,560 adults. The researchers concluded that “there were no significant genotype differences across the three age groups” using $α = .05$ $α = .05$ . Do you agree?

Alternate View

Age Group ${E4}^{+} /E 4^{+}$ ${E4}^{+} /E 4^{+}$ Genotype ${E4}^{+} /E 4^{-}$ ${E4}^{+} /E 4^{-}$ Genotype ${E4}^{-} /E 4^{-}$ ${E4}^{-} /E 4^{-}$ Genotype Sample Size

20–24 56 517 1,524 2,097

40–44 45 566 1,571 2,182

60–64 48 564 1,669 2,281

Source: Jorm, A. F., et al. “APOE genotype and cognitive functioning in a large age-stratified population sample.” Neuropsychology, Vol. 21, No. 1, Jan. 2007 (Table 1). Copyright © 2007 by the American Psychological Association. Reprinted with permission.

Age Group	${E4}^{+} /E 4^{+}$ ${E4}^{+} /E 4^{+}$ Genotype	${E4}^{+} /E 4^{-}$ ${E4}^{+} /E 4^{-}$ Genotype	${E4}^{-} /E 4^{-}$ ${E4}^{-} /E 4^{-}$ Genotype	Sample Size
20–24	56	517	1,524	2,097
40–44	45	566	1,571	2,182
60–64	48	564	1,669	2,281

TEXT 8.72 Mobile device typing strategies. Refer to the Applied Ergonomics (Mar. 2012) study of mobile device typing strategies, Exercise 8.46 (p. 467). Recall that typing style of mobile device users was categorized as (1) device held with both hands/both thumbs typing, (2) device held with right hand/right thumb typing, (3) device held with left hand/left thumb typing, (4) device held with both hands/right thumb typing, (5) device held with left hand/right index finger typing, or (6) other. The researchers’ main objective was to determine if there are gender differences in typing strategies. Typing strategy and gender were observed for each in a sample of 859 college students observed typing on their mobile devices. The data are summarized in the accompanying table. Is this sufficient evidence to conclude that the proportions of mobile device users in the six texting style categories depend on whether a male or a female is texting? Use $α = .10$ $α = .10$ to answer the question.

Typing Strategy	Number of Males	Number of Females
Both hands hold/both thumbs type	161	235
Right hand hold/right thumb type	118	193
Left hand hold/left thumb type	29	41
Both hands hold/right thumb type	10	29
Left hand hold/right index type	6	12
Other	11	14

Source: Gold, J. E., et al. “Postures, typing strategies, and gender differences in mobile device usage: An observational study.” Applied Ergonomics, Vol. 43, No. 2, Mar. 2012 (Table 2).

TTBC 8.73 Classifying air threats with heuristics. The Journal of Behavioral Decision Making (Jan. 2007) published a study on the use of heuristics to classify the threat level of approaching aircraft. Of special interest was the use of a fast and frugal heuristic—a computationally simple procedure for making judgments with limited information—named “Take-the-Best-for-Classification” (TTB-C). The subjects were 48 men and women, some from a Canadian Forces reserve unit, others university students. Each subject was presented with a radar screen on which simulated approaching aircraft were identified with asterisks. By using the computer mouse to click on the asterisk, one could receive further information about the aircraft. The goal was to identify the aircraft as “friend” or “foe” as fast as possible. Half the subjects were given cue-based instructions for determining the type of aircraft, while the other half were given pattern-based instructions. The researcher also classified the heuristic strategy used by the subject as TTB-C, Guess, or Other. Data on the two variables Instruction type and Strategy, measured for each of the 48 subjects, are saved in the TTBC file. (Data on the first and last five subjects are shown in the accompanying table below). Do the data provide sufficient evidence at $α = .05$ $α = .05$ to indicate that choice of heuristic strategy depends on type of instruction provided? How about at $α = .01 ?$ $α = .01 ?$

Instruction	Strategy
Pattern	Other
Pattern	Other
Pattern	Other
Cue	TTBC
Cue	TTBC
$⋮$ $⋮$	$⋮$ $⋮$
Pattern	TTBC
Cue	Guess
Cue	TTBC
Cue	Guess
Pattern	Guess

Based on Bryant, D. J. “Classifying simulated air threats with fast and frugal heuristics.” Journal of Behavioral Decision Making. Vol. 20, Jan. 2007 (Appendix C).

TXEDUC 8.74 Reading comprehension of Texas students. An analysis of reading test scores of students at a rural Texas school district was carried out in Current Issues in Education (Jan. 2014). Students were classified as attending elementary, middle, or high school and whether they passed a reading comprehension test. The data for the sample of 1,012 students are summarized in the accompanying table. Does the passing rate on the reading comprehension test in Texas differ for elementary, middle, and high school students? Use $α = .10 .$ $α = .10 .$

$Yes, χ^{2} = 7.66$ $Yes, χ^{2} = 7.66$

Alternate View

Elementary School Middle School High School

Number Passing 372 418 143

Number Failing 44 25 10

Totals 416 443 153

Source: Bigham, G. D., and Riney, M. R. “Trend analysis techniques to assist school leaders in making critical curriculum and instruction decisions.” Current Issues in Education, Vol. 17, No. 1, Jan. 2014 (Table 6).

	Elementary School	Middle School	High School
Number Passing	372	418	143
Number Failing	44	25	10
Totals	416	443	153

CIRCUIT4 8.75 Versatility with resistor-capacitor circuits. Research published in the International Journal of Electrical Engineering Education (Oct. 2012) investigated the versatility of engineering students’ knowledge of circuits with one resistor and one capacitor connected in series. Students were shown four different configurations of a resistor-capacitor circuit and then given two tasks. First, each student was asked to state the voltage at the nodes on the circuit and, second, each student was asked to graph the dynamic behavior of the circuit. Suppose that in a sample of 160 engineering students, 40 were randomly assigned to analyze Circuit 1, 40 assigned to Circuit 2, 40 assigned to Circuit 3, and 40 assigned to Circuit 4. The researchers categorized task grades as follows: correct voltages and graph, incorrect voltages but correct graph, incorrect graph but correct voltages, incorrect voltages and incorrect graph. A summary of the results (based on information provided in the journal article) is shown in the table. Does any one circuit appear to be more difficult to analyze than any other circuit? Support your answer with a statistical test of hypothesis.

INTERACT 8.76 Interactions in a children’s museum. Refer to the Early Childhood Education Journal (Mar. 2014) study of interactions in a children’s museum, Exercise 8.51 (p. 469). Summary information for the 170 meaningful interactions sampled is reproduced in the following table. Do the proportions associated with the different types of interactions depend on whether the interaction was child-led or adult-led? Test, using $α = .01$ $α = .01$ .

$Yes, χ^{2} = 55.4$ $Yes, χ^{2} = 55.4$

Type of Interaction Child-Led Adult-Led

Show-and-tell 26 0

Learning/Teaching 21 64

Refocusing 21 10

Participatory Play 12 9

Advocating/Disciplining 1 6

Totals 81 89

Source: McMunn-Dooley, C., and Welch, M. M. “Nature of interactions among young children and adult caregivers in a children’s museum.” Early Childhood Education Journal, Vol. 42, No. 2, Mar. 2014 (adapted from Figure 2).

		Circuit 1	Circuit 2	Circuit 3	Circuit 4
Answer	Both Correct	31	10	5	4
Incorrect Voltage	0	3	11	12
Incorrect Graph	5	17	16	14
Both Incorrect	4	10	8	10
	Total Number of Students	40	40	40	40

Type of Interaction	Child-Led	Adult-Led
Show-and-tell	26	0
Learning/Teaching	21	64
Refocusing	21	10
Participatory Play	12	9
Advocating/Disciplining	1	6
Totals	81	89

Applying the Concepts—Advanced

HIVVAC1 HIVVAC2 HIVVAC3 8.77 Efficacy of an HIV vaccine. New, effective AIDS vaccines have been developed through the process of “sieving”—that is, sifting out infections with some strains of HIV. Consider a vaccine designed to eliminate a particular strain of the virus. To test the efficacy of the vaccine, a clinical trial was conducted. The trial consisted of 7 AIDS patients vaccinated with the new drug and 31 AIDS patients who were treated with a placebo (no vaccination). Of the 7 vaccinated patients, 2 tested positive and 5 tested negative at the end of a follow-up period. Of the 31 unvaccinated patients, 22 tested positive and 9 tested negative. (These data are saved in the HIVVAC1 file.)
1. Construct a contingency table for the data. Then, conduct a test to determine whether the vaccine is effective in treating this strain of HIV. Use $α = .05 .$ $α = .05 .$
  
  $χ^{2} = 4.41$ $χ^{2} = 4.41$
2. Are the assumptions for the test you carried out in part a satisfied? What are the consequences if the assumptions are violated?
  
  No
3. In the case of a $2 \times 2$ $2 \times 2$ contingency table, R. A. Fisher (1935) developed a procedure for computing the exact p-value for the test (called Fisher’s exact test). The method utilizes the hypergeometric probability distribution of Chapter 4 (p. 220). Consider the hypergeometric probability
  
  $\frac{(\begin{array}{l} 7 \\ 2 \end{array}) (\begin{array}{l} 31 \\ 22 \end{array})}{(\begin{array}{l} 38 \\ 24 \end{array})}$ $\frac{(\begin{array}{l} 7 \\ 2 \end{array}) (\begin{array}{l} 31 \\ 22 \end{array})}{(\begin{array}{l} 38 \\ 24 \end{array})}$
  
  which represents the probability that 2 out of 7 vaccinated AIDS patients test positive and 22 out of 31 unvaccinated patients test positive—that is, the probability of the result of the clinical trial, given that the null hypothesis of independence is true. Compute this probability (called the probability of the contingency table).
  
  $.0438$ $.0438$
4. Refer to part c. Now consider two other possible results from the clinical trial that are different from the original results. Result 2: Only 1 of the 7 vaccinated patients tests positive; 23 of the 31 unvaccinated patients test positive. Result 3: None of the 7 vaccinated patients tests positive; 24 of the 31 unvaccinated patients test positive. Create two contingency tables, one for each of these results. (These data are saved in the HIVVAC2 and HIVVAC3 files respectively.) Note that these two contingency tables have the same marginal totals as the original table in part a. Explain why each of these tables provides more evidence to reject the null hypothesis of independence than does the original table. Then, compute the probability of each table, using the hypergeometric formula.
5. The p-value of Fisher’s exact test is the probability of observing a result at least as unsupportive of the null hypothesis as is the observed contingency table, given the same marginal totals. Sum the probabilities of parts c and d to obtain the p-value of Fisher’s exact test. (To verify your calculations, check the p-value labeled Left-sided $P r < = F$ $P r < = F$ at the bottom of the SAS printout shown below.) Interpret this value in the context of the vaccine trial.
  
  $.0438, reject H_{0}$ $.0438, reject H_{0}$
MONTY 8.78 Examining the “Monty Hall Dilemma.”Consider In Exercise 3.145 (p. 164) you solved the game show problem of whether to switch your choice of three doors, one of which hides a prize, after the host reveals what is behind a door that is not chosen. (Despite the natural inclination of many to keep one’s first choice, the correct answer is that you should switch your choice of doors.) This problem is sometimes called the “Monty Hall Dilemma,” named for Monty Hall, the host of the popular TV game show Let’s Make a Deal. In Thinking & Reasoning (July, 2007), Wichita State University professors set up an experiment designed to influence subjects to switch their original choice of doors. Each subject participated in 23 trials. In trial 1, three (boxes) representing doors were presented on a computer screen; only one box hid a prize. In each subsequent trial, an additional box was presented, so that in trial 23, twenty-five boxes were presented. In each trial, after a box was selected, all of the remaining boxes except for one either (1) were shown to be empty (Empty condition), (2) disappeared (Vanish condition), (3) disappeared and the chosen box was enlarged (Steroids condition), or (4) disappeared and the remaining box not chosen was enlarged (Steroids2 condition). Twenty-seven subjects were assigned to each condition. The number of subjects who ultimately switched boxes is tallied, by condition, in the following table for both the first trial and the last trial.

Alternate View

First Trial (1) Last Trial (23)

Condition Switch Boxes No Switch Switch Boxes No Switch

Empty 10 17 23 4

Vanish 3 24 12 15

Steroids 5 22 21 6

Steroids2 8 19 19 8

Based on Howard, J. N., Lambdin, C. G., and Datteri, D. L. “Let’s make a deal: Quality and availability of second-stage information as a catalyst for change.” Thinking & Reasoning, Vol. 13, No. 3, July 2007 (Table 2).
1. For a selected trial, does the likelihood of switching boxes depend on condition?
  
  $1 st trial : χ^{2} = 5.88$ $1 st trial : χ^{2} = 5.88$
2. For a given condition, does the likelihood of switching boxes depend on trial number?
  
  $Empty : χ^{2} = 13.17$ $Empty : χ^{2} = 13.17$
3. On the basis of the results you obtained in parts a and b, what factors influence a subject to switch choices?

	First Trial (1)	Last Trial (23)
Empty	10	17	23	4
Vanish	3	24	12	15
Steroids	5	22	21	6
Steroids2	8	19	19	8

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
8.4 Testing Categorical Probabilities: Two-Way (Contingency) Table

8.4 Testing Categorical Probabilities: Two-Way (Contingency) Table

Table 8.5 Contingency Table for Marketing Example

Teaching Tip

Table 8.6a Observed Counts for Contingency Table 8.5

Table 8.6b Probabilities for Contingency Table 8.5

Teaching Tip

Finding Expected Cell Counts for a Two-Way Contingency Table

Figure 8.5

Teaching Tip

Table 8.7 Percentage of TV Viewers Who Identify Brand, by Gender

Figure 8.6

Table 8.8 General $r \times c$ $r \times c$ Contingency Table

General Form of a Two-Way (Contingency) Table Analysis: A Test for Independence

Conditions Required for a Valid $χ^{2}$ $χ^{2}$ Test: Contingency Tables

Example 8.6 Conducting a Two-Way Analysis—Marital Status and Religion

Problem

Table 8.9 Survey Results (Observed Counts), Example 8.6

Solution

Figure 8.7

Figure 8.8

Contingency Tables with Fixed Marginals

Statistics in Action Revisited

Figure SIA8.1

Table SIA8.2 Data for the Tainted Tissue Case, Dual Recall Notices Eliminated

Figure SIA8.2

Figure SIA8.3

Exercises 8.54–8.78

Understanding the Principles

Learning the Mechanics

Applying the Concepts—Basic

MINITAB Output for Exercise 8.66

Applying the Concepts—Intermediate

Applying the Concepts—Advanced

		B
		$B_{1}$ $B_{1}$	$B_{2}$ $B_{2}$	$B_{3}$ $B_{3}$
	$A_{1}$ $A_{1}$	40	72	42
A	$A_{2}$ $A_{2}$	63	53	70
	$A_{3}$ $A_{3}$	31	38	30

Table of Contents for 8.4 Testing Categorical Probabilities: Two-Way (Contingency) Table

Create new playlist

Sign In

Sign Up

Table of Contents for
8.4 Testing Categorical Probabilities: Two-Way (Contingency) Table