Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10

Nonparametric Tests

Abstract

This chapter studies the main types of nonparametric tests and identifies in which situations they must be applied. Nonparametric tests are an alternative to parametric ones when their hypotheses are violated or in cases in which the variables are qualitative. The main differences between parametric and nonparametric tests are presented in this chapter, as well as their respective advantages and disadvantages. The assumptions inherent to nonparametric hypotheses tests are also listed here. As a result, it is possible to identify when to use each one of the nonparametric tests. Each test is solved analytically and via IBM SPSS Statistics Software® and Stata Statistical Software®. The results obtained are also interpreted.

Keywords

Nonparametric tests; Binomial test; Chi-square test; Sign test; McNemar test; Wilcoxon test; Mann-Whitney U test; Cochran’s Q Test; Friedman’s test; Kruskal-Wallis test

Mathematics has wonderful strength that is capable of making us understand many mysteries of our faith.

Saint Jerome

10.1 Introduction

As studied in the previous chapter, hypotheses tests are divided into parametric and nonparametric. Applied to quantitative data, parametric tests, formulate hypotheses about population parameters, such as the population mean (μ), population standard deviation (σ), population variance (σ²), population proportion (p), etc.

Parametric tests require strong assumptions regarding the data distribution. For example, in many cases, we should assume that the samples are collected from populations whose data follow a normal distribution. Or, still, for comparison tests of two paired population means or k population means (k ≥ 3), the population variances must be homogeneous.

Conversely, nonparametric tests can formulate hypotheses about the qualitative characteristics of the population, then, they can be applied to qualitative data, in nominal or ordinal scales. Since assumptions regarding the data distribution are in smaller number and weaker than the parametric tests, they are also known as distribution-free tests.

Nonparametric tests are an alternative to parametric ones when their hypotheses are violated. Given that they require a smaller number of assumptions, they are simpler and easier to apply, but less robust when compared to parametric tests.

In short, the main advantages of nonparametric tests are:

(a) They can be applied in a wide variety of situations, because they do not require strict premises concerning the population, as parametric methods do. Notably, nonparametric methods do not require that the populations have a normal distribution.
(b) Differently from parametric methods, nonparametric methods can be applied to qualitative data, in nominal and ordinal scales.
(c) They are easy to apply because they require simpler calculations when compared to parametric methods.

The main disadvantages are:

(a) With regard to quantitative data, since they must be transformed into qualitative data for the application of nonparametric tests, we lose too much information.
(b) Since nonparametric tests are less efficient than parametric tests, we need greater evidence (a larger sample or one with greater differences) to reject the null hypothesis.

Thus, since parametric tests are more powerful than nonparametric ones, that is, they have a higher probability of rejecting the null hypothesis when it is really false, they must be chosen as long as all the assumptions are confirmed. On the other hand, nonparametric tests are an alternative to parametric ones when the hypotheses are violated or in cases in which the variables are qualitative.

Nonparametric tests are classified according to the variables’ level of measurement and to sample size. For a single sample, we will study the binomial, chi-square (χ²), and sign tests. The binomial test is applied to binary variables. The χ² test can be applied to nominal variables as well as to ordinal variables. While the sign test is only applied to ordinal variables.

In the case of two paired samples, the main tests are the McNemar test, the sign test, and the Wilcoxon test. The McNemar test is applied to qualitative variables that assume only two categories (binary), while the sign test and the Wilcoxon test are applied to ordinal variables.

Considering two independent samples, we can highlight the χ² test and the Mann-Whitney U test. The χ² test can be applied to nominal or ordinal variables, while the Mann-Whitney U test only considers ordinal variables.

For k paired samples (k ≥ 3), we have Cochran’s Q test that considers binary variables and Friedman’s test that considers ordinal variables.

Finally, in the case of more than two independent samples, we will study the χ² test for nominal or ordinal variables and the Kruskal-Wallis test for ordinal variables.

Table 10.1 shows this classification.

Table 10.1

Classification of Nonparametric Statistical Tests
Dimension	Level of Measurement	Nonparametric Test
One sample	Binary	Binomial
	Nominal or ordinal	χ²
	Ordinal	Sign test
Two paired samples	Binary	McNemar test
Two paired samples	Ordinal	Sign test Wilcoxon test
Two independent samples	Nominal or ordinal	χ²
Two independent samples	Ordinal	Mann-Whitney U
K paired samples	Binary	Cochran’s Q
K paired samples	Ordinal	Friedman’s test
K independent samples	Nominal or ordinal	χ²
K independent samples	Ordinal	Kruskal-Wallis test

Table 10.1

Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro.

Nonparametric tests in which the variables’ level of measurement is ordinal can also be applied to quantitative variables, but they must only be used in these cases, when the hypotheses of the parametric tests are rejected.

10.2 Tests for One Sample

In this case, a random sample is taken from the population and we test the hypothesis that the sample data have a certain characteristic or distribution. Among the nonparametric statistical tests for a single sample, we can highlight the binomial test, the χ² test, and the sign test. The binomial test is applied to binary data, the χ² test to nominal or ordinal data, while the sign test is applied to ordinal data.

10.2.1 Binomial Test

The binomial test is applied to an independent sample in which the variable that the researcher is interested in (X) is binary (dummy) or dichotomous, that is, it only has two possibilities: success or failure. We usually call result X = 1 a success and result X = 0 a failure, because it is more convenient. The probability of success in choosing a certain observation is represented by p and the probability of failure by q, that is:

$P [X = 1] = p and P [X = 0] = q = 1 - p$ $P [X = 1] = p and P [X = 0] = q = 1 - p$

For a bilateral test, we must consider the following hypotheses:

H₀: p = p₀
H₁: p ≠ p₀

According to Siegel and Castellan (2006), the number of successes (Y) or the number of results of type [X = 1] results in a sequence of N observations is:

$Y = \sum_{i = 1}^{N} X_{i}$ $Y = \sum_{i = 1}^{N} X_{i}$

si4_e

For the authors, in a sample of size N, the probability of obtaining k objects in a category and N − k objects in the other category is given by:

$P [Y = k] = (\begin{array}{l} N \\ k \end{array}) \cdot p^{k} \cdot q^{N - k} k = 0, 1, \dots, N$ $P [Y = k] = (\begin{array}{l} N \\ k \end{array}) \cdot p^{k} \cdot q^{N - k} k = 0, 1, \dots, N$

si5_e (10.1)

where:

p: probability of success;
q: probability of failure, where:

$(\begin{array}{l} N \\ k \end{array}) = \frac{N!}{k! (N - k)!}$ $(\begin{array}{l} N \\ k \end{array}) = \frac{N!}{k! (N - k)!}$

si6_e

Table F₁ in the Appendix provides the probability of P[Y = k] for several values of N, k, and p.

However, when we test hypotheses, we must use the probability of obtaining values that are greater than or equal to the value observed:

$P (Y \geq k) = \sum_{i = k}^{N} (\begin{array}{l} N \\ i \end{array}) \cdot p^{i} \cdot q^{N - i}$ $P (Y \geq k) = \sum_{i = k}^{N} (\begin{array}{l} N \\ i \end{array}) \cdot p^{i} \cdot q^{N - i}$

si7_e (10.2)

Or the probability of obtaining values that are less than or equal to the value observed:

$P (Y \leq k) = \sum_{i = 0}^{k} (\begin{array}{l} N \\ i \end{array}) \cdot p^{i} \cdot q^{N - i}$ $P (Y \leq k) = \sum_{i = 0}^{k} (\begin{array}{l} N \\ i \end{array}) \cdot p^{i} \cdot q^{N - i}$

si8_e (10.3)

According to Siegel and Castellan (2006), when p = q = ½, instead of calculating the probabilities based on the expressions presented, it is more convenient to use Table F₂ in the Appendix. This table provides the unilateral probabilities, under the null hypothesis H₀: p = 1/2, of obtaining values that are as extreme as or more extreme than k, where k is the lowest of the frequencies observed (P(Y ≤ k)). Due to the symmetry of a binomial distribution, when p = ½, we have P(Y ≥ k) = P(Y ≤ N − k). A unilateral test is used when we predict, in advance, which of both categories must contain the smallest number of cases. For a bilateral test (when the estimate simply refers to the fact that both frequencies will differ), we just need to double the values from Table F₂ in the Appendix.

This final value obtained is called P-value, which, according to what was discussed in Chapter 9, corresponds to the probability (unilateral or bilateral) associated to the value observed in the sample. P-value indicates the lowest significance level observed, which would lead to the rejection of the null hypothesis. Thus, we reject H₀ if P ≤ α.

In the case of large samples (N > 25), the sample distribution of variable Y is closer to a standard normal distribution, so, the probability can be calculated by the following statistic:

$Z_{cal} = \frac{|N \cdot \hat{p} - N \cdot p| - 0.5}{\sqrt{N \cdot p \cdot q}}$ $Z_{cal} = \frac{|N \cdot \hat{p} - N \cdot p| - 0.5}{\sqrt{N \cdot p \cdot q}}$

si9_e (10.4)

where $\hat{p}$ $\hat{p}$ refers to the sample estimate of the proportion of successes so that we can test H₀.

The value of Z_cal calculated by using Expression (10.4) must be compared to the critical value of the standard normal distribution (see Table E in the Appendix). This table provides the critical values of z_c where P(Z_cal > z_c) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Z_cal < − z_c) = α/2 = P(Z_cal > z_c).

Therefore, for a right-tailed unilateral test, the null hypothesis is rejected if Z_cal > z_c. Now, for a bilateral test, we reject H₀ if Z_cal < − z_c or Z_cal > z_c.

Example 10.1

Applying the Binomial Test to Small Samples

A group of 18 students took an intensive English course and were submitted to two different learning methods. At the end of the course, each student chose his/her favorite teaching method, as shown in Table 10.E.1. We believe there are no differences between both teaching methods. Test the null hypothesis with a significance level of 5%.

Table 10.E.1

Frequencies Obtained After Students Made Their Choice
Events	Method 1	Method 2	Total
Frequency	11	7	18
Proportion	0.611	0.389	1.0

Unlabelled Table

Solution

Before we start the general procedure to construct the hypotheses tests, we will explain a few parameters in order to facilitate the understanding.

Choosing the method that will be expressed as X = 1 (method 1) and X = 0 (method 2), the probability of choosing method 1 is represented by P[X = 1] = p and method 2 by P[X = 0] = q. The number of successes (Y = k) corresponds to the total number of type X = 1 results and k = 11.

Step 1: The most suitable test in this case is the binomial test because the data are categorized into two classes.

Step 2: The null hypothesis states that there are no differences in the probabilities of choosing between both methods:

H₀: p = q = ½

H₁: p ≠ q

Step 3: The significance level to be considered is 5%.

Step 4: We have N = 18, k = 11, p = ½, and q = ½. Due to the symmetry of the binomial distribution, when p = ½, P(Y ≥ k) = P(Y ≤ N − k), that is, P(Y ≥ 11) = P(Y ≤ 7). So, let’s calculate P(Y ≤ 7) by using Expression (10.3) and show how this probability can be obtained directly from Table F₂ in the Appendix.

The probability of a maximum of seven students choosing method 2 is given by:

$P (Y \leq 7) = P (Y = 0) + P (Y = 1) + \dots + P (Y = 7)$ $P (Y \leq 7) = P (Y = 0) + P (Y = 1) + \dots + P (Y = 7)$

$P (Y = 0) = \frac{18!}{0! 18!} \cdot {(\frac{1}{2})}^{0} \cdot {(\frac{1}{2})}^{18} = 3.815 \cdot E - 06$ $P (Y = 0) = \frac{18!}{0! 18!} \cdot {(\frac{1}{2})}^{0} \cdot {(\frac{1}{2})}^{18} = 3.815 \cdot E - 06$

si12_e

$P (Y = 1) = \frac{18!}{1! 17!} \cdot {(\frac{1}{2})}^{1} \cdot {(\frac{1}{2})}^{17} = 6.866 \cdot E - 05$ $P (Y = 1) = \frac{18!}{1! 17!} \cdot {(\frac{1}{2})}^{1} \cdot {(\frac{1}{2})}^{17} = 6.866 \cdot E - 05$

si13_e

$P (Y = 7) = \frac{18!}{7! 11!} \cdot {(\frac{1}{2})}^{7} \cdot {(\frac{1}{2})}^{11} = 0.121$ $P (Y = 7) = \frac{18!}{7! 11!} \cdot {(\frac{1}{2})}^{7} \cdot {(\frac{1}{2})}^{11} = 0.121$

si14_e

Therefore:

$P (Y \leq 7) = 3.815 \cdot E - 06 + \dots + 0.121 = 0.240$ $P (Y \leq 7) = 3.815 \cdot E - 06 + \dots + 0.121 = 0.240$

Since p = ½, probability P(Y ≤ 7) could be obtained directly from Table F₂ in the Appendix. For N = 18 and k = 7 (the lowest frequency observed), the associated unilateral probability is P₁ = 0.240.

Since it is a bilateral test, this value must be doubled (P = 2P₁), so, the associated bilateral probability is P = 0.480.

Note: In the general procedure of hypotheses tests, Step 4 corresponds to the calculation of the statistic based on the sample. On the other hand, Step 5 determines the probability associated to the value of the statistic obtained from Step 4. In the case of the binomial test, Step 4 calculates the probability associated to the occurrence in the sample directly.

Step 5: Decision: since the associated probability is greater than α (P = 0.480 > 0.05), we do not reject H₀, which allows us to conclude, with a 95% confidence level, that there are no differences in the probabilities of choosing method 1 or 2.

Example 10.2

Applying the Binomial Test to Large Samples

Redo the previous example considering the following results:

Table 10.E.2

Frequencies Obtained After Students Made Their Choice
Events	Method 1	Method 2	Total
Frequency	18	12	30
Proportion	0.6	0.4	1.0

Unlabelled Table

Solution

Step 1: Let’s apply the binomial test.

Step 2: The null hypothesis states that there are no differences between the probabilities of choosing both methods, that is:

H₀: p = q = ½

H₁: p ≠ q

Step 3: The significance level to be considered is 5%.

Step 4: Since N > 25, we can consider that the sample distribution of variable Y is similar to a standard normal distribution, so, the probability can be calculated from Z statistic:

$Z_{cal} = \frac{|N \cdot \hat{p} - N \cdot p| - 0.5}{\sqrt{N \cdot p \cdot q}} = \frac{|30 \cdot 0.6 - 30 \cdot 0.5| - 0.5}{\sqrt{30 \cdot 0.5 \cdot 0.5}} = 0.913$ $Z_{cal} = \frac{|N \cdot \hat{p} - N \cdot p| - 0.5}{\sqrt{N \cdot p \cdot q}} = \frac{|30 \cdot 0.6 - 30 \cdot 0.5| - 0.5}{\sqrt{30 \cdot 0.5 \cdot 0.5}} = 0.913$

si16_e

Step 5: The critical region of a standard normal distribution (Table E in the Appendix), for a bilateral test in which α = 5%, is shown in Fig. 10.1.

Fig. 10.1 Critical region of Example 10.2.

For a bilateral test, each one of the tails corresponds to half of significance level α.

Step 6: Decision: since the value calculated is not in the critical region, that is, − 1.96 ≤ Z_cal ≤ 1.96, the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that there are no differences in the probabilities of choosing between the methods (p = q = ½).

If we used P-value instead of the critical value of the statistic, Steps 5 and 6 would be:

Step 5: According to Table E in the Appendix, the unilateral probability associated to statistic Z_cal = 0.913 is P₁ = 0.1762. For a bilateral test, this probability must be doubled (P-value = 0.3564).
Step 6: Decision: since P > 0.05, we do not reject H₀.

10.2.1.1 Solving the Binomial Test Using SPSS Software

Example 10.1 will be solved using IBM SPSS Statistics Software®. The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data are available in the file Binomial_Test.sav. The procedure for solving the binomial test using SPSS is described. Let’s select Analyze → Nonparametric Tests → Legacy Dialogs → Binomial … (Fig. 10.2).

Fig. 10.2 Procedure for applying the binomial test on SPSS.

First, let’s insert variable Method into the Test Variable List. In Test Proportion, we must define p = 0.50, since the probability of success and failure is the same (Fig. 10.3).

Fig. 10.3 Selecting the variable and the proportion for the binomial test.

Finally, let’s click on OK. The results can be seen in Fig. 10.4.

The associated probability for a bilateral test is P = 0.481, similar to the value calculated in Example 10.1. Since P > α (0.481 > 0.05), we do not reject H₀, which allows us to conclude, with a 95% confidence level, that p = q = ½.'

10.2.1.2 Solving the Binomial Test Using Stata Software

Example 10.1 will also be solved using Stata Statistical Software®. The use of the images presented in this section has been authorized by Stata Corp LP©. The data are available in the file Binomial_Test.dta.

The syntax of the binomial test on Stata is:

bitest variable⁎ = #p

where the term variable⁎ must be replaced by the variable considered in the analysis and #p by the probability of success specified in the null hypothesis.

In Example 10.1, our studied variable is method and, through the null hypothesis, there are no differences in the choice between both methods, so, the command to be typed is:

bitest method = 0.5

The result of the binomial test is shown in Fig. 10.5. We can see that the associated probability for a bilateral test is P = 0.481, similar to the value calculated in Example 10.1, and also obtained via SPSS software. Since P > 0.05, we do not reject H₀, which allows us to conclude, with a 95% confidence level, that p = q = ½.

10.2.2 Chi-Square Test (χ²) for One Sample

The χ² test presented in this section is an extension of the binomial test and is applied to a single sample in which the variable being studied assumes two or more categories. The variables can be nominal or ordinal. The test compares the frequencies observed to the frequencies expected in each category.

The χ² test assumes the following hypotheses:

H₀: there is no significant difference between the frequencies observed and the ones expected
H₁: there is a significant difference between the frequencies observed and the ones expected

The statistic for the test, analogous to Expression (4.1) in Chapter 4, is given by:

$χ_{cal}^{2} = \sum_{i = 1}^{k} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}$ $χ_{cal}^{2} = \sum_{i = 1}^{k} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}$

si17_e (10.5)

where:

O_i: the number of observations in the ith category;
E_i: expected frequency of observations in the ith category when H₀ is not rejected;
k: the number of categories.

The values of χ_cal² approximately follow a χ² distribution with ν = k − 1 degrees of freedom. The critical values of the chi-square (χ_c²) statistic can be found in Table D in the Appendix, which provides the critical values of χ_c², where P(χ_cal² > χ_c²) = α (for a right-tailed unilateral test). In order for the null hypothesis H₀ to be rejected, the value of the χ_cal² statistic must be in the critical region (CR), that is, χ_cal² > χ_c². Otherwise, we do not reject H₀ (Fig. 10.6).

Fig. 10.6 χ² distribution, highlighting critical region (CR) and nonrejection of H₀ (NR) region.

P-value (the probability associated to the value of the χ_cal² statistic calculated from the sample) can also be obtained from Table D. In this case, we reject H₀ if P ≤ α.

Example 10.3

Applying the χ² Test to One Sample

A candy store would like to find out if the number of chocolate candies sold daily varies depending on the day of the week. In order to do that, a sample was collected throughout 1 week, chosen randomly, and the results can be seen in Table 10.E.3. Test the hypothesis that sales do not depend on the day of the week. Assume that α = 5%.

Table 10.E.3

Frequencies Observed Versus Frequencies Expected
Events	Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
Frequencies observed	35	24	27	32	25	36	31
Frequencies expected	30	30	30	30	30	30	30

Unlabelled Table

Solution

Step 1: The most suitable test to compare the frequencies observed to the ones expected from one sample with more than two categories is the χ² for a single sample.

Step 2: Through the null hypothesis, there are no significant differences between the sales observed and the ones expected for each day of the week. On the other hand, through the alternative hypothesis, there is a difference in at least one day of the week:

H₀: O_i = E_i

H₁: O_i ≠ E_i

Step 3: The significance level to be considered is 5%.
Step 4: The value of the statistic is given by:
$χ_{cal}^{2} = \sum_{i = 1}^{k} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(35 - 30)}^{2}}{30} + \frac{{(24 - 30)}^{2}}{30} + \dots + \frac{{(31 - 30)}^{2}}{30} = 4.533$ $χ_{cal}^{2} = \sum_{i = 1}^{k} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(35 - 30)}^{2}}{30} + \frac{{(24 - 30)}^{2}}{30} + \dots + \frac{{(31 - 30)}^{2}}{30} = 4.533$
Step 5: The critical region of the χ² test, considering α = 5% and ν = 6 degrees of freedom, is shown in Fig. 10.7.

Fig. 10.7 Critical Region of Example 10.3.
Step 6: Decision: since the value calculated is not in the critical region, that is, χ_cal² < 12.592, the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that the number of chocolate candies sold daily does not vary depending on the day of the week.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 of the construction of the hypotheses tests will be:

Step 5: According to Table D in the Appendix, for ν = 6 degrees of freedom, the probability associated to the statistic χ_cal² = 4.533 (P-value) is between 0.1 and 0.9.
Step 6: Decision: since P > 0.05, we do not reject the null hypothesis.

10.2.2.1 Solving the χ² Test for One Sample Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.3 are available in the file Chi-Square_One_Sample.sav. The procedure for applying the χ² test on SPSS is described. First, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → Chi-Square …, as shown in Fig. 10.8.

Fig. 10.8 Procedure for elaborating the χ² test on SPSS.

After that, we should insert the variable Day_week into the Test Variable List. The variable being studied has seven categories. The options Get from data and Use specified range (Lower = 1 and Upper = 7) in Expected Range generate the same results. The frequencies expected for the seven categories are exactly the same. Thus, we must select the option All categories equal in Expected Values, as shown in Fig. 10.9.

Fig. 10.9 Selecting the variable and the procedure to elaborate the χ² test.

Finally, let’s click on OK to obtain the results of the χ² test, as shown in Fig. 10.10.

Therefore, the value of the χ² statistic is 4.533, similar to the value calculated in Example 10.3. Since the P-value = 0.605 > 0.05 (in Example 10.3, we saw that 0.1 < P < 0.9), we do not reject H₀, which allows us to conclude, with a 95% confidence level, that the sales do not depend on the day of the week.

10.2.2.2 Solving the χ² Test for One Sample Using Stata Software

The use of the images presented in this section has been authorized by Stata Corp LP©.

The data in Example 10.3 are available in the file Chi-Square_One_Sample.dta. The variable being studied is day_week.

The χ² test for one sample on Stata can be obtained from the command csgof (chi-square goodness of fit), which allows us to compare the distribution of frequencies observed to the ones expected of a certain categorical variable with more than two categories.

In order for this command to be used, first, we must type:

findit csgof

and install it through the link csgof from http://www.ats.ucla.edu/stat/stata/ado/analysis.

After doing this, we can type the following command:

csgof day_week

The result is shown in Fig. 10.11. We can see that the result of the test is similar to the one calculated in Example 10.3 and on SPSS, as well as to the probability associated to the statistic.

10.2.3 Sign Test for One Sample

The sign test is an alternative to the t-test for a single random sample when the data distribution of the population does not follow a normal distribution. The only assumption required by the sign test is that the distribution of the variable be continuous.

The sign test is based on the population median (μ). The probability of obtaining a sample value that is less than the median and the probability of obtaining a sample value that is greater than the median are the same (p = ½). The null hypothesis of the test is that μ is equal to a certain value specified by the investigator (μ₀). For a bilateral test, we have:

H₀: μ = μ₀
H₁: μ ≠ μ₀

The quantitative data are converted into signs, (+) or (−), that is, values greater than the median (μ₀) start being represented by (+) and values less than μ₀ by (−). Data with values equal to μ₀ are excluded from the sample. Thus, the sign test is applied to ordinal data and offers little power to the researcher, since this conversion results in a considerable loss of information regarding the original data.

Small samples

Let’s establish that N is the number of positive and negative signs (sample size disregarding any ties) and k is the number of signs that corresponds to the lowest frequency.

For small samples (N ≤ 25), we will use the binomial test with p = ½ to calculate P(Y ≤ k). This probability can be obtained directly from Table F₂ in the Appendix.

Large samples

When N > 25, the binomial distribution is more similar to a normal distribution. The value of Z is given by:

$Z = \frac{(X \pm 0.5) - N / 2}{0.5 \sqrt{N}} ~ N (0, 1)$ $Z = \frac{(X \pm 0.5) - N / 2}{0.5 \sqrt{N}} ~ N (0, 1)$

si19_e (10.6)

where X corresponds to the lowest or highest frequency. If X represents the lowest frequency, we must calculate X + 0.5. On the other hand, if X represents the highest frequency, we must calculate X − 0.5.

Example 10.4

Applying the Sign Test to a Single Sample

We estimate that the median retirement age in a certain Brazilian city is 65. One random sample with 20 retirees was drawn from the population and the results can be seen in Table 10.E.4. Test the null hypothesis that μ = 65, at the significance level of 10%.

Table 10.E.4

Retirement Age
59	62	66	37	60	64	66	70	72	61
64	66	68	72	78	93	79	65	67	59

Unlabelled Table

Solution

Step 1: Since the data do not follow a normal distribution, the most suitable test for testing the population median is the sign test.

Step 2: The hypotheses of the test are:

H₀: μ = 65

H₁: μ ≠ 65

Step 3: The significance level to be considered is 10%.
Step 4: Let’s calculate P(Y ≤ k).

To facilitate our understanding, let’s sort the data in Table 10.E.4 in ascending order.

Table 10.E.5

Data From Table 10.E.4 Sorted in Ascending Order
37	59	59	60	61	62	64	64	65	66
66	66	67	68	70	72	72	78	79	93

Unlabelled Table

Excluding value 65 (a tie), we have the number of (−) signs is 8, the number of (+) signs is 11, and N = 19.

From Table F₂ in the Appendix, for N = 19, k = 8, and p = ½, the associated unilateral probability is P₁ = 0.324. Since we are using a bilateral test, this value must be doubled, so, the associated bilateral probability is 0.648 (P-value).

Step 5: Decision: since P > α (0.648 > 0.10), we do not reject H₀, a fact that allows us to conclude, with a 90% confidence level, that μ = 65.

10.2.3.1 Solving the Sign Test for One Sample Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

SPSS makes the sign test available only for two related samples (2 Related Samples). Thus, in order for us to use the test for a single sample, we must generate a new variable with n values (sample size including ties), all of them equal to μ₀. The data in Example 10.4 are available in the file Sign_Test_One_Sample.sav.

The procedure for applying the sign test on SPSS is shown. First of all, we must click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.12.

Fig. 10.12 Procedure for elaborating the sign test on SPSS.

After that, we must insert variable 1 (Age_pop) and variable 2 (Age_sample) into Test Pairs. Let’s select the option regarding the sign test (Sign) in Test Type, as shown in Fig. 10.13.

Fig. 10.13 Selecting the variables and the sign test.

Next, let’s click on OK to obtain the results of the sign test, as shown in Figs. 10.14 and 10.15.

Fig. 10.14 shows the frequencies of negative and positive signs, the total number of ties, and the total frequency.

Fig. 10.15 shows the associated probability for a bilateral test, which is similar to the value found in Example 10.4. Since P = 0.648 > 0.10, we do not reject the null hypothesis, which allows us to conclude, with a 90% confidence level, that the median retirement age is 65.

10.2.3.2 Solving the Sign Test for One Sample Using Stata Software

The use of the images presented in this section has been authorized by Stata Corp LP©.

Different from SPSS software, Stata makes the sign test for one sample available. On Stata, the sign test for a single sample as well as for two paired samples can be obtained from the command signtest.

The syntax of the test for one sample is:

signtest variable⁎ = #

where the term variable⁎ must be replaced by the variable considered in the analysis and # by the value of the population median to be tested.

The data in Example 10.4 are available in the file Sign_Test_One_Sample.dta. The variable analyzed is age and the main objective is to verify if the median retirement age is 65. The command to be typed is:

signtest age = 65

The result of the test is shown in Fig. 10.16. Analogous to the results presented in Example 10.4 and also generated on SPSS, the number of positive signs is 11, the number of negative signs is 8, and the associated probability for a bilateral test is 0.648. Since P > 0.10, we do not reject the null hypothesis, which allows us to conclude, with a 90% confidence level, that the median retirement age is 65.

10.3 Tests for Two Paired Samples

These tests investigate if two samples are somehow related. The most common examples analyze a situation before and after a certain event. We will study the following tests: the McNemar test for binary variables and the sign and Wilcoxon tests for ordinal variables.

10.3.1 McNemar Test

The McNemar test is applied to assess the significance of changes in two related samples with qualitative or categorical variables that assume only two categories (binary variables). The main goal of the test is to verify if there are any significant changes before and after the occurrence of a certain event. In order to do that, let’s use a 2 × 2 contingency table, as shown in Table 10.2.

Table 10.2

2 × 2 Contingency Table
Before	After
Before	+	−
+	A	B
−	C	D

Table 10.2

According to Siegel and Castellan (2006), the + and − signs are used to represent the possible changes in the answers before and after. The frequencies of each occurrence are represented in their respective cells in Table 10.2.

For example, if there are changes from the first answer (+) to the second answer (−), the result will be written in the right upper cell, so, B represents the total number of observations that presented changes in their behavior from (+) to (−).

Analogously, if there are changes from the first answer (−) to the second answer (+), the result will be written in the left lower cell, so, C represents the total number of observations that presented changes in their behavior from (−) to (+).

On the other hand, while A represents the total number of observations that remained with the same answer (+) before and after, D represents the total number of observations with the same answer (−) in both periods.

Thus, the total number of individuals that change their answer can be represented by B + C.

Through the null hypothesis of the test, the total number of changes in each direction is equally likely, that is:

H₀: P(B → C) = P(C → B)
H₁: P(B → C) ≠ P(C → B)

According to Siegel and Castellan (2006), McNemar statistic is calculated based on the chi-square (χ²) statistic presented in Expression (10.5), that is:

$χ_{cal}^{2} = \sum_{i = 1}^{2} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(B - (B + C) / 2)}^{2}}{(B + C) / 2} + \frac{{(C - (B + C) / 2)}^{2}}{(B + C) / 2} = \frac{{(B - C)}^{2}}{B + C} ~ χ_{1}^{2}$ $χ_{cal}^{2} = \sum_{i = 1}^{2} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} = \frac{{(B - (B + C) / 2)}^{2}}{(B + C) / 2} + \frac{{(C - (B + C) / 2)}^{2}}{(B + C) / 2} = \frac{{(B - C)}^{2}}{B + C} ~ χ_{1}^{2}$

si20_e (10.7)

According to the same authors, a correction factor must be used in order for a continuous χ² distribution to become more similar to a discrete χ² distribution, so:

$χ_{cal}^{2} = \frac{{(|B - C| - 1)}^{2}}{B + C} with 1 degree of freedom$ $χ_{cal}^{2} = \frac{{(|B - C| - 1)}^{2}}{B + C} with 1 degree of freedom$

si21_e (10.8)

The value calculated must be compared to the critical value of the χ² distribution (Table D in the Appendix). This table provides the critical values of χ_c² where P(χ_cal² > χ_c²) = α (for a right-tailed unilateral test). If the value of the statistic is in the critical region, that is, if χ_cal² > χ_c², we reject H₀. Otherwise, we should not reject H₀.

The probability associated to the χ_cal² statistic (P-value) can also be obtained from Table D. In this case, the null hypothesis is rejected if P ≤ α. Otherwise, we do not reject H₀.

Example 10.5

Applying the McNemar Test

A bill of law proposing the end of full retirement pensions for federal civil servants was being analyzed by the Senate. Aiming at verifying if this measure would bring any changes in the number of people taking public exams, an interview with 60 workers was carried out, before and after the reform, so that they could express their preference in working for a private or a public organization. The results can be seen in Table 10.E.6. Test the hypothesis that there were no significant changes in the workers’ answers before and after the social security reform. Assume that α = 5%.

Table 10.E.6

Contingency Table
Before the Reform	After the Reform
Before the Reform	Private	Public
Private	22	3
Public	21	14

Unlabelled Table

Solution

Step 1: McNemar is the most suitable test for evaluating the significance of before and after type changes in two related samples, applied to nominal or categorical variables.

Step 2: Through the null hypothesis, the reform would not be efficient in changing people’s preferences towards the private sector. In other words, among the workers who changed their preferences, the probability of them changing their preference from private to public organizations after the reform is the same as the probability of them changing from public to private organizations. That is:

H₀: P(Private → Public) = P(Public → Private)

H₁: P(Private → Public) ≠ P(Public → Private)

Step 3: The significance level to be considered is 5%.
Step 4: The value of the statistic, according to Expression (10.7), is:

$χ_{cal}^{2} = \frac{{(|B - C|)}^{2}}{B + C} = \frac{{(|3 - 21|)}^{2}}{3 + 21} = 13.5 with ν = 1$ $χ_{cal}^{2} = \frac{{(|B - C|)}^{2}}{B + C} = \frac{{(|3 - 21|)}^{2}}{3 + 21} = 13.5 with ν = 1$

If we use the correction factor, the value of the statistic from Expression (10.8) becomes:

$χ_{cal}^{2} = \frac{{(|B - C| - 1)}^{2}}{B + C} = \frac{{(|3 - 21| - 1)}^{2}}{3 + 21} = 12.042 with ν = 1$ $χ_{cal}^{2} = \frac{{(|B - C| - 1)}^{2}}{B + C} = \frac{{(|3 - 21| - 1)}^{2}}{3 + 21} = 12.042 with ν = 1$

Step 5: The value of the critical chi-square (χ_c²) obtained from Table D, in the Appendix, considering α = 5% and ν = 1 degree of freedom, is 3.841.
Step 6: Decision: since the value calculated is in the critical region, that is, χ_cal² > 3.841, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there were significant changes in the choice of working at a private or a public organization after the social security reform.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table D in the Appendix, for ν = 1 degree of freedom, the probability associated to statistic χ_cal² = 12.042 or 13.5 (P-value) is less than 0.005 (a probability of 0.005 is associated to statistic χ_cal² = 7.879).
Step 6: Decision: since P < 0.05, we must reject H₀.

10.3.1.1 Solving the McNemar Test Using SPSS Software

Example 10.5 will be solved using SPSS software. The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data are available in the file McNemar_Test.sav. The procedure for applying the McNemar test on SPSS is presented. Let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.17.

Fig. 10.17 Procedure for elaborating the McNemar test on SPSS.

After that, we should insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s select the McNemar test option in Test Type, as shown in Fig. 10.18.

Fig. 10.18 Selecting the variables and McNemar test.

Finally, we must click on OK to obtain Figs. 10.19 and 10.20. Fig. 10.19 shows the frequencies observed before and after the reform (Contingency Table). The result of the McNemar test is shown in Fig. 10.20.

According to Fig. 10.20, the significance level observed in the McNemar test is 0.000, value lower than 5%, so, the null hypothesis is rejected. Hence, we may conclude, with a 95% confidence level, that there was a significant change in choosing to work at a public or a private organization after the social security reform.

10.3.1.2 Solving the McNemar Test Using Stata Software

Example 10.5 will also be solved using Stata software. The use of the images presented in this section has been authorized by Stata Corp LP©. The data are available in the file McNemar_Test.dta.

The McNemar test can be calculated on Stata by using the command mcc followed by the paired variables. In our example, the paired variables are called before and after, so, the command to be typed is:

mcc before after

The result of the McNemar test is shown in Fig. 10.21. We can see that the value of the statistic is 13.5, similar to the value calculated by Expression (10.7), without the correction factor. The significance level observed from the test is 0.000, lower than 5%, which allows us to conclude, with a 95% confidence level, that there was a significant change before and after the reform.

The result of the McNemar test could have also been obtained by using the command mcci 14 21 3 22.

10.3.2 Sign Test for Two Paired Samples

The sign test can also be applied to two paired samples. In this case, the sign is given by the difference between the pairs, that is, if the difference results in a positive number, each pair of values is replaced by a (+) sign. On the other hand, if the result of the difference is negative, each pair of values is replaced by a (−) sign. In case of a tie, the data will be excluded from the sample.

Analogous to the sign test for a single sample, the sign test presented in this section is also an alternative to the t-test for comparing two related samples when the data distribution is not normal. In this case, the quantitative data are transformed into ordinal data. Thus, the sign test is much less powerful than the t-test, because it only uses the difference sign between the pairs as information.

Through the null hypothesis, the population median of the differences (μ_d) is zero. Therefore, for a bilateral test, we have:

H₀: μ_d = 0
H₁: μ_d ≠ 0

In other words, we tested the hypothesis that there are no differences between both samples (the samples come from populations with the same median and the same continuous distribution), that is, the number of (+) signs is the same as number of (−) signs.

The same procedure presented in Section 10.2.3 for a single sample will be used in order to calculate the sign statistic in the case of two paired samples.

Small samples

We say that N is the number of positive and negative signs (sample size disregarding the ties) and k is the number of signs that corresponds to the lowest frequency. If N ≤ 25, we will use the binomial test with p = ½ to calculate P(Y ≤ k). This probability can be obtained directly from Table F₂ in the Appendix.

Large samples

When N > 25, the binomial distribution is more similar to a normal distribution, and the value of Z is given by Expression (10.6):

$Z = \frac{(X \pm 0.5) - N / 2}{0.5 \sqrt{N}} ~ N (0, 1)$ $Z = \frac{(X \pm 0.5) - N / 2}{0.5 \sqrt{N}} ~ N (0, 1)$

si19_e

where X corresponds to the lowest or highest frequency. If X represents the lowest frequency, we must use X + 0.5. On the other hand, if X represents the highest frequency, we must use X − 0.5.

Example 10.6

Applying the Sign Test to Two Paired Samples

A group of 30 workers are submitted to a training course aiming at improving their productivity. The result, in terms of the average number of parts produced per hour per employee and before and after the training, is shown in Table 10.E.7. Test the null hypothesis that there were no alterations in productivity before and after the training course. Assume that α = 5%.

Table 10.E.7

Productivity Before and After the Training Course
Before	After	Difference Sign
36	40	+
39	41	+
27	29	+
41	45	+
40	39	−
44	42	−
38	39	+
42	40	−
40	42	+
43	45	+
37	35	−
41	40	−
38	38	0
45	43	−
40	40	0
39	42	+
38	41	+
39	39	0
41	40	−
36	38	+
38	36	−
40	38	−
36	35	−
40	42	+
40	41	+
38	40	+
37	39	+
40	42	+
38	36	−
40	40	0

Solution

Step 1: Since the data do not follow a normal distribution, the sign test can be an alternative to the t-test for two paired samples.

Step 2: The null hypothesis assumes that there is no difference in productivity before and after the training course, that is:

H₀: μ_d = 0

H₁: μ_d ≠ 0

Step 3: The significance level to be considered is 5%.
Step 4: Since N > 25, the binomial distribution is more similar to a normal distribution, and the value of Z is given by:

$Z = \frac{(X \pm 0.5) - N / 2}{0.5 \cdot \sqrt{N}} = \frac{(11 + 0.5) - 13}{0.5 \cdot \sqrt{26}} = - 0.588$ $Z = \frac{(X \pm 0.5) - N / 2}{0.5 \cdot \sqrt{N}} = \frac{(11 + 0.5) - 13}{0.5 \cdot \sqrt{26}} = - 0.588$

si25_e

Step 5: By using the standard normal distribution table (Table E in the Appendix), we must determine the critical region (CR) for a bilateral test, as shown in Fig. 10.22.

Fig. 10.22 Critical region of Example 10.6.
Step 6: Decision: since the value calculated is not in the critical region, that is, − 1.96 ≤ Z_cal ≤ 1.96, the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that there is no difference in productivity before and after the training course.

If instead of comparing the value calculated to the critical value of the standard normal distribution, we use the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table E in the Appendix, the unilateral probability associated to statistic Z_cal = − 0.59 is P₁ = 0.278. For a bilateral test, this probability must be doubled (P-value = 0.556).
Step 6: Decision: since P > 0.05, we reject the null hypothesis.

10.3.2.1 Solving the Sign Test for Two Paired Samples Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.6 can be found in the file Sign_Test_Two_Paired_Samples.sav. The procedure for applying the sign test to two paired samples on SPSS is shown. We have to click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.23.

Fig. 10.23 Procedure for elaborating the sign test on SPSS.

After that, let’s insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s also select the option regarding the sign test (Sign) in Test Type, as shown in Fig. 10.24.

Fig. 10.24 Selecting the variables and the sign test.

Finally, let’s click on OK to obtain the results of the sign test for two paired samples (Figs. 10.25 and 10.26).

Fig. 10.25 shows the frequencies of negative and positive signs, the total number of ties, and the total frequency.

Fig. 10.26 shows the result of the z test, besides the associated P probability for a bilateral test, values that are similar to the ones calculated in Example 10.6. Since P = 0.556 > 0.05, the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that there is no difference in productivity before and after the training course.

10.3.2.2 Solving the Sign Test for Two Paired Samples Using Stata Software

The use of the images presented in this section has been authorized by Stata Corp LP©.

The data in Example 10.6 also are available on Stata in the file Sign_Test_Two_Paired_Samples.dta. The paired variables are before and after.

As discussed in Section 10.2.3.2 for a single sample, the sign test on Stata is carried out from the command signtest. In the case of two paired samples, we must use the same command. However, it must be followed by the names of the paired variables, with the equal sign between them, since the objective is to test the equality of the respective medians. Thus, the command to be typed for our example is:

signtest after = before

The result of the test is shown in Fig. 10.27 and includes the number of positive signs (15), the number of negative signs (11), as well as the probability associated to the statistic for a bilateral test (P = 0.557). These values are similar to the ones calculated in Example 10.6 and also generated on SPSS. Since P > 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is no difference in productivity before and after the training course.

10.3.3 Wilcoxon Test

Analogous to the sign test for two paired samples, the Wilcoxon test is an alternative to the t-test when the data distribution does not follow a normal distribution.

The Wilcoxon test is an extension of the sign test; however, it is more powerful. Besides the information about the direction of the differences for each pair, the Wilcoxon test considers the magnitude of the difference within the pairs (Fávero et al., 2009). The logical foundations and the method used in the Wilcoxon test are described, based on Siegel and Castellan (2006).

Let’s assume that d_i is the difference between the values for each pair of data. First of all, we have to place all of the d_i’s in ascending order according to their absolute value (without considering the sign) and calculate the respective ranks using this order. For example, position 1 is attributed to the lowest | d_i |, position 2 to the second lowest, and so on. At the end, we must attribute the d_i difference sign for each rank. The sum of all positive ranks is represented by S_p and the sum of all negative ranks by S_n.

Occasionally, the values for a certain pair of data are the same (d_i = 0). In this case, they are excluded from the sample. It is the same procedure used in the sign test, so, the value of N represents the sample size disregarding these ties.

Another type of tie may happen, in which two or more differences have the same absolute value. In this case, the same rank will be attributed to the ties, which will correspond to the mean of the ranks that would have been attributed if the differences had been different. For example, suppose that three pairs of data indicate the following differences: − 1, 1, and 1. Rank 2 is attributed to each pair, which corresponds to the average between 1, 2, and 3. In order, the next value will receive rank 4, since ranks 1, 2, and 3 have already been used.

The null hypothesis assumes that the median of the differences in the population (μ_d) is zero, that is, the populations do not differ in location. For a bilateral test, we have:

H₀: μ_d = 0
H₁: μ_d ≠ 0

In other words, we must test the hypothesis that there are no differences between both samples (the samples come from populations with the same median and the same continuous distribution), that is, the sum of the positive ranks (S_p) is the same as the sum of the negative ranks (S_n).

Small samples

If N ≤ 15, Table I in the Appendix shows the unilateral probabilities associated to the several critical values of S_c (P(S_p > S_c) = α). For a bilateral test, this value must be doubled. If the probability obtained (P-value) is less than or equal to α, we must reject H₀.

Large samples

As N grows, the Wilcoxon distribution becomes more similar to a standard normal distribution. Thus, for N > 15, we must calculate the value of variable z that, according to Siegel and Castellan (2006), Fávero et al. (2009), and Maroco (2014), is:

$Z_{cal} = \frac{min (S_{p}, S_{n}) - \frac{N \cdot (N + 1)}{4}}{\sqrt{\frac{N \cdot (N + 1) \cdot (2 N + 1)}{24} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}}}$ $Z_{cal} = \frac{min (S_{p}, S_{n}) - \frac{N \cdot (N + 1)}{4}}{\sqrt{\frac{N \cdot (N + 1) \cdot (2 N + 1)}{24} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}}}$

si26_e (10.9)

where:

$\frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}$ $\frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}$ is a correction factor whenever there are ties;
g: the number of groups of tied ranks;
t_j: the number of tied observations in group j.

The value calculated must be compared to the critical value of the standard normal distribution (Table E in the Appendix). This table provides the critical values of z_c where P(Z_cal > z_c) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Z_cal < − z_c) = P(Z_cal > z_c) = α/2. The null hypothesis H₀ of a bilateral test is rejected if the value of the Z_cal statistic is in the critical region, that is, if Z_cal < − z_c or Z_cal > z_c. Otherwise, we do not reject H₀.

The unilateral probabilities associated to statistic Z_cal (P₁) can also be obtained from Table E. For a unilateral test, we consider P = P₁. For a bilateral test, this probability must be doubled (P = 2P₁). Thus, for both tests, we reject H₀ if P ≤ α.

Example 10.7

Applying the Wilcoxon Test

A group of 18 students from the 12th grade took an English proficiency exam, without ever having taken an extracurricular course. The same group of students was submitted to an intensive English course for 6 months and, at the end, they took the proficiency exam again. The results can be seen in Table 10.E.8. Test the hypothesis that there was no improvement before and after the course. Assume that α = 5%.

Table 10.E.8

Students' Grades Before and After the Intensive Course
Before	After
56	60
65	62
70	74
78	79
47	53
52	59
64	65
70	75
72	75
78	88
80	78
26	26
55	63
60	59
71	71
66	75
60	71
17	24

Solution

Step 1: Since the data do not follow a normal distribution, the Wilcoxon test can be applied, because it is more powerful than the sign test for two paired samples.

Step 2: Through the null hypothesis, there is no difference in the students’ performance before and after the course, that is:

H₀: μ_d = 0

H₁: μ_d ≠ 0

Step 3: The significance level to be considered is 5%.
Step 4: Since N > 15, the Wilcoxon distribution is more similar to a normal distribution. In order to calculate the value of z, first of all, we have to calculate d_i and the respective ranks, as shown in Table 10.E.9.

Table 10.E.9

Calculation of d_i and the Respective Ranks
Before	After	d_i	d_i’s Rank
56	60	4	7.5
65	62	− 3	− 5.5
70	74	4	7.5
78	79	1	2
47	53	6	10
52	59	7	11.5
64	65	1	2
70	75	5	9
72	75	3	5.5
78	88	10	15
80	78	− 2	− 4
26	26	0
55	63	8	13
60	59	− 1	− 2
71	71	0
66	75	9	14
60	71	11	16
17	24	7	11.5

Unlabelled Table

Since there are two pairs of data with equal values (d_i = 0), they are excluded from the sample, so, N = 16. The sum of the positive ranks is S_p = 2 + ⋯ + 16 = 124.5. The sum of the negative ranks is S_n = 2 + 4 + 5.5 = 11.5.

Thus, we can calculate the value of z by using Expression (10.9):

$Z_{cal} = \frac{min (S_{p}, S_{n}) - \frac{N \cdot (N + 1)}{4}}{\sqrt{\frac{N \cdot (N + 1) \cdot (2 N + 1)}{24} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}}} = \frac{11.5 - \frac{16 \cdot 17}{4}}{\sqrt{\frac{16 \cdot 17 \cdot 33}{24} - \frac{59 - 11}{48}}} = - 2.925$ $Z_{cal} = \frac{min (S_{p}, S_{n}) - \frac{N \cdot (N + 1)}{4}}{\sqrt{\frac{N \cdot (N + 1) \cdot (2 N + 1)}{24} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{48}}} = \frac{11.5 - \frac{16 \cdot 17}{4}}{\sqrt{\frac{16 \cdot 17 \cdot 33}{24} - \frac{59 - 11}{48}}} = - 2.925$

si28_e

Step 5: By using the standard normal distribution table (Table E in the Appendix), we determine the critical region (CR) for the bilateral test, as shown in Fig. 10.28.

Fig. 10.28 Critical region of Example 10.7.
Step 6: Decision: since the value calculated is in the critical region, that is, Z_cal < − 1.96, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that there is a difference in the students’ performance before and after the course.

If instead of comparing the value calculated to the critical value of the standard normal distribution, we use the calculation of the P-value, Steps 5 and 6 will be:

Step 5: According to Table E in the Appendix, the unilateral probability associated to statistic Z_cal = − 2.925 is p₁ = 0.0017. For a bilateral test, this probability must be doubled (P-value = 0.0034).
Step 6: Decision: since P < 0.05, we must reject the null hypothesis.

10.3.3.1 Solving the Wilcoxon Test Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.7 are available in the file Wilcoxon_Test.sav. The procedure for applying the Wilcoxon test to two paired samples on SPSS is shown. Let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.29.

Fig. 10.29 Procedure for elaborating the Wilcoxon test on SPSS.

First of all, let’s insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s also select the option related to the Wilcoxon test in Test Type, as shown in Fig. 10.30.

Fig. 10.30 Selecting the variables and Wilcoxon test.

Finally, let’s click on OK to obtain the results of the Wilcoxon test for two paired samples (Figs. 10.31 and 10.32).

Fig. 10.31 shows the number of negative, positive, and tied ranks, besides the mean and the sum of all positive and negative ranks.

Fig. 10.32 shows the result of the z test, besides the associated P probability for a bilateral test, values similar to the ones found in Example 10.7. Since P = 0.003 < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in the students’ performance before and after the course.

10.3.3.2 Solving the Wilcoxon Test Using Stata Software

The use of the images presented in this section has been authorized by Stata Corp LP©.

The data in Example 10.7 are available in the file Wilcoxon_Test.dta. The paired variables are called before and after.

The Wilcoxon test on Stata is carried out from the command signrank followed by the name of the paired variables with an equal sign between them. For our example, we must type the following command:

signrank before = after

The result of the test is shown in Fig. 10.33. Since P < 0.05, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in the students’ performance before and after the course.

10.4 Tests for Two Independent Samples

In these tests, we try to compare two populations represented by their respective samples. Different from the tests for two paired samples, here, it is not necessary for the samples to have the same size. Among the tests for two independent samples, we can highlight the chi-square test (for nominal or ordinal variables) and the Mann-Whitney test for ordinal variables.

10.4.1 Chi-Square Test (χ²) for Two Independent Samples

In Section 10.2.2, the χ² test was applied to a single sample in which the variable being studied was qualitative (nominal or ordinal). Here the test will be applied to two independent samples, from nominal or ordinal qualitative variables. This test has already been studied in Chapter 4 (Section 4.2.2), in order to verify if there is an association between two qualitative variables, and it will be described once again in this section.

The test compares the frequencies observed in each one of the cells of a contingency table to the frequencies expected. The χ² test for two independent samples assumes the following hypotheses:

H₀: there is no significant difference between the frequencies observed and the ones expected
H₁: there is a significant difference between the frequencies observed and the ones expected

Therefore, the χ² statistic measures the discrepancy between a table with the contingency observed and a table with the contingency expected, starting from the hypothesis that there is no connection between the categories of both variables studied. If the distribution of frequencies observed is exactly the same as the distribution of frequencies expected, the result of the χ² statistic is zero. Thus, a low value of χ² indicates independence between the variables.

As already presented in Expression (4.1) in Chapter 4, the χ² statistic for two independent samples is given by:

$χ^{2} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{{(O_{ij} - E_{ij})}^{2}}{E_{ij}}$ $χ^{2} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{{(O_{ij} - E_{ij})}^{2}}{E_{ij}}$

si29_e (10.10)

where:

O_ij: the number of observations in the ith category of variable X and in the jth category of variable Y;
E_ij: frequency expected of observations in the ith category of variable X and in the jth category of variable Y;
I: the number of categories (rows) of variable X;
J: the number of categories (columns) of variable Y.

The values of χ_cal² approximately follow an χ² distribution with ν = (I − 1)·(J − 1) degrees of freedom. The critical values of the chi-square statistic (χ_c²) can be found in Table D, in the Appendix. This table provides the critical values of χ_c² where P(χ_cal² > χ_c²) = α (for a right-tailed unilateral test). In order for the null hypothesis H₀ to be rejected, the value of the χ_cal² statistic must be in the critical region, that is, χ_cal² > χ_c². Otherwise, we do not reject H₀ (Fig. 10.34).

Example 10.8

Applying the χ² Test to Two Independent Samples

Let’s consider Example 4.1 in Chapter 4 once again, which refers to a study carried out with 200 individuals aiming at analyzing the joint behavior of variable X (Health insurance agency) with variable Y (Level of satisfaction). The contingency table showing the joint distribution of the variables’ absolute frequencies, besides the marginal totals, is presented in Table 10.E.10. Test the hypothesis that there is no association between the categories of both variables, considering α = 5%.

Table 10.E.10

Joint Distribution of the Absolute Frequencies of the Variables Being Studied
Agency	Level of Satisfaction
Agency	Dissatisfied	Neutral	Satisfied	Total
Total Health	40	16	12	68
Live Life	32	24	16	72
Mena Health	24	32	4	60
Total	96	72	32	200

Unlabelled Table

Solution

Step 1: The most suitable test to compare the frequencies observed in each cell of a contingency table to the frequencies expected is the χ² for two independent samples.

Step 2: The null hypothesis states that there are no connections between the categories of variables Agency and Level of satisfaction, that is, the frequencies observed and expected are the same for each pair of variable categories. The alternative hypothesis states that there are differences in at least one pair of categories:

H₀: O_ij = E_ij

H₁: O_ij ≠ E_ij

Step 3: The significance level to be considered is 5%.
Step 4: In order to calculate the statistic, it is necessary to compare the values observed and the values expected. Table 10.E.11 presents the distribution’s values observed with their respective relative frequencies in relation to the row’s general total. The calculation could also be done in relation to the column’s general total, achieving the same result as the χ² statistic.

Table 10.E.11

Values Observed in Each Category With Their Respective Proportions in Relation to the Row’s General Total
Agency	Level of Satisfaction			Total
Agency	Dissatisfied	Neutral	Satisfied	Total
Total Health	40 (58.8%)	16 (23.5%)	12 (17.6%)	68 (100%)
Live Life	32 (44.4%)	24 (33.3%)	16 (22.2%)	72 (100%)
Mena Health	24 (40%)	32 (53.3%)	4 (6.7%)	60 (100%)
Total	96 (48%)	72 (36%)	32 (16%)	200 (100%)

Unlabelled Table

The data in Table 10.E.11 demonstrate a dependence between the variables. Supposing that there was no connection between the variables, we would expect a proportion of 48% in relation to the total of the Dissatisfied row for all three agencies, 36% in the Neutral level, and 16% in the Satisfied level. The calculations of the values expected can be found in Table 10.E.12. For example, the calculation of the first cell is 0.48 × 68 = 32.6.

Table 10.E.12

Values Expected From Table 10.E.11 Assuming a Nonassociation Between the Variables
Agency	Level of Satisfaction			Total
Agency	Dissatisfied	Neutral	Satisfied	Total
Total Health	32.6 (48%)	24.5 (36%)	10.9 (16%)	68 (100%)
Live Life	34.6 (48%)	25.9 (36%)	11.5 (16%)	72 (100%)
Mena Health	28.8 (48%)	21.6 (36%)	9.6 (16%)	60 (100%)
Total	96 (48%)	72 (36%)	32 (16%)	200 (100%)

Unlabelled Table

In order to calculate the χ² statistic, we must apply Expression (10.10) to the data in Tables 10.E.11 and 10.E.12. The calculation of each term $\frac{{(O_{ij} - E_{ij})}^{2}}{E_{ij}}$ $\frac{{(O_{ij} - E_{ij})}^{2}}{E_{ij}}$ is represented in Table 10.E.13, jointly with the resulting χ_cal² measure of the sum of the categories.

Step 5: The critical region (CR) of the χ² distribution (Table D in the Appendix), considering α = 5% and ν = (I − 1)·(J − 1) = 4 degrees of freedom, is shown in Fig. 10.35.

Fig. 10.35 Critical region of Example 10.8.
Step 6: Decision: since the value calculated is in the critical region, that is, χ_cal² > 9.488, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is an association between the variable categories.

Table 10.E.13

Calculation of the χ² Statistic
Agency	Level of Satisfaction
Agency	Dissatisfied	Neutral	Satisfied
Total Health	1.66	2.94	0.12
Live Life	0.19	0.14	1.74
Mena Health	0.80	5.01	3.27
Total	χ_cal² = 15.861

Unlabelled Table

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table D, in the Appendix, the probability associated to the χ_cal² statistic = 15.861, for ν = 4 degrees of freedom, is less than 0.005.
Step 6: Decision: since P < 0.05, we reject H₀.

10.4.1.1 Solving the χ² Statistic Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.8 are available in the file HealthInsurance.sav. In order to calculate the χ² statistic for two independent samples, we must click on Analyze → Descriptive Statistics → Crosstabs … Let’s insert variable Agency in Row(s) and variable Satisfaction in Column(s), as shown in Fig. 10.36.

In Statistics …, let’s select option Chi-square, as shown in Fig. 10.37. Then, we must finally click on Continue and OK. The result is shown in Fig. 10.38.

From Fig. 10.38, we can see that the value of χ² is 15.861, similar to what was calculated in Example 10.8. For the confidence level of 95%, as P = 0.003 < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is an association between the variable categories, that is, the frequencies observed differ from the frequencies expected in at least one pair of categories.

10.4.1.2 Solving the χ² Statistic by Using Stata Software

The use of the images presented in this section has been authorized by Stata Corp LP©.

As presented in Chapter 4, the calculation of the χ² statistic on Stata is done by using the command tabulate, or simply tab, followed by the name of the variables being studied, using option chi2, or simply ch. The syntax of the test is:

tab variable1⁎ variable2⁎, ch

The data in Example 10.8 are also available in the file HealthCareInsurance.dta. The variables being studied are agency and satisfaction. Thus, we must type the following command:

tab agency satisfaction, ch

The results can be seen in Fig. 10.39 and are similar to the ones presented in Example 10.8 and on Stata.

10.4.2 Mann-Whitney U Test

The Mann-Whitney U test is one of the most powerful nonparametric tests, applied to quantitative or qualitative variables in an ordinal scale, and it aims at verifying if two nonpaired or independent samples are drawn from the same population. It is an alternative to Student’s t-test when the normality hypothesis is violated or when the sample is small. In addition, it may be considered a nonparametric version of the t-test for two independent samples.

Since the original data are transformed into ranks (orders), we lose some information, so, the Mann-Whitney U test is not as powerful as the t-test.

Different from the t-test that verifies the equality of the means of two independent populations and with continuous data, the Mann-Whitney U test verifies the equality of the medians. For a bilateral test, the null hypothesis is that the median of both populations is equal, that is:

H₀: μ₁ = μ₂

H₁: μ₁ ≠ μ₂

The calculation of the Mann-Whitney U statistic is specified, for small and large samples.

Small samples

Method:

(a) Let’s consider N₁ the size of the sample with the smallest number of observations and N₂ the size of the sample with the largest number of observations. We assume that both samples are independent.
(b) In order to apply the Mann-Whitney U test, we must join both samples into a single combined sample that will be formed by N = N₁ + N₂ elements. However, we must identify the original sample of each observation in the combined sample. The combined sample must be ordered in ascending order and the ranks are attributed to each observation. For example, rank 1 is attributed to the lowest observation and rank N to the highest observation. If there are ties, we attribute the mean of the corresponding ranks.
(c) After that, we must calculate the sum of the ranks for each sample, that is, calculate R₁, which corresponds to the sum of the ranks in the sample with the smallest number of observations, and R₂, which corresponds to sum of the ranks in the sample with the largest number of observations.
(d) Thus, we can calculate quantities U₁ and U₂ as follows:

$U_{1} = N_{1} \cdot N_{2} + \frac{N_{1} \cdot (N_{1} + 1)}{2} - R_{1}$ $U_{1} = N_{1} \cdot N_{2} + \frac{N_{1} \cdot (N_{1} + 1)}{2} - R_{1}$

si31_e (10.11)

$U_{2} = N_{1} \cdot N_{2} + \frac{N_{2} \cdot (N_{2} + 1)}{2} - R_{2}$ $U_{2} = N_{1} \cdot N_{2} + \frac{N_{2} \cdot (N_{2} + 1)}{2} - R_{2}$

si32_e (10.12)

(e) The Mann-Whitney U statistic is given by:

$U_{cal} = min (U_{1}, U_{2})$ $U_{cal} = min (U_{1}, U_{2})$

Table J in the Appendix shows the critical values of U in a way that P(U_cal < U_c) = α (for a left-tailed unilateral test), for values of N₂ ≤ 20 and significance levels of 0.05, 0.025, 0.01, and 0.005. In order for the null hypothesis H₀ of the left-tailed unilateral test to be rejected, the value of the U_cal statistic must be in the critical region, that is, U_cal < U_c. Otherwise, we do not reject H₀. For a bilateral test, we must consider P(U_cal < U_c) = α/2, since P(U_cal < U_c) +P(U_cal > U_c) = α.

The unilateral probabilities associated to the U_cal statistic (P₁) can also be obtained from Table J. For a unilateral test, we have P = P₁. For a bilateral test, this probability must be doubled (P = 2P₁). Thus, we reject H₀ if P ≤ α.

Large samples

As the sample size grows (N₂ > 20), the Mann-Whitney distribution becomes more similar to a standard normal distribution.

The real value of the Z statistic is given by:

$Z_{cal} = \frac{(U - N_{1} \cdot N_{2} / 2)}{\sqrt{\frac{N_{1} \cdot N_{2}}{N \cdot (N - 1)} \cdot (\frac{N^{3} - N}{12} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12})}}$ $Z_{cal} = \frac{(U - N_{1} \cdot N_{2} / 2)}{\sqrt{\frac{N_{1} \cdot N_{2}}{N \cdot (N - 1)} \cdot (\frac{N^{3} - N}{12} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12})}}$

si34_e (10.13)

where:

$\frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12}$ $\frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12}$ is a correction factor when there are ties;
g: the number of groups with tied ranks;
t_j: the number of tied observations in group j.

The value calculated must be compared to the critical value of the standard normal distribution (see Table E in the Appendix). This table provides the critical values of z_c where P(Z_cal > z_c) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Z_cal < − z_c) = P(Z_cal > z_c) = α/2. Therefore, for a bilateral test, the null hypothesis is rejected if Z_cal < − z_c or Z_cal > z_c.

Unilateral probabilities associated to the Z_cal (P₁ = P) statistic can also be obtained from Table E. For a bilateral test, this probability must be doubled (P = 2P₁). Thus, the null hypothesis is rejected if P ≤ α.

Example 10.9

Applying the Mann-Whitney U Test to Small Samples

Aiming at assessing the quality of two machines, the diameters of the parts produced (in mm) in each one of them are compared, as shown in Table 10.E.14. Use the most suitable test, at a significance level of 5%, to test if both samples come from or do not come from populations with the same medians.

Table 10.E.14

Diameter of Parts Produced in Two Machines
Mach. A	48.50	48.65	48.58	48.55	48.66	48.64	48.50	48.72
Mach. B	48.75	48.64	48.80	48.85	48.78	48.79	49.20

Unlabelled Table

Solution

Step 1: By applying the normality test to both samples, we can see that the data from machine B do not follow a normal distribution. So, the most suitable test to compare the medians of two independent populations is the Mann-Whitney U test.

Step 2: Through the null hypothesis, the median diameters of the parts in both machines are the same, so:

H₀: μ_A = μ_B

H₁: μ_A ≠ μ_B

Step 3: The significance level to be considered is 5%.
Step 4: Calculation of the U statistic:

(a) N₁ = 7 (sample size from machine B)

N₂ = 8 (sample size from machine A)

(b) Combined sample and respective ranks (Table 10.E.15):

(c) R₁ = 80.5 (sum of the ranks from machine B with the smallest number of observations); R₂ = 39.5 (sum of the ranks from machine A with the largest number of observations).
(d) Calculation of U₁ and U₂:

$U_{1} = N_{1} \cdot N_{2} + \frac{N_{1} \cdot (N_{1} + 1)}{2} - R_{1} = 7 \cdot 8 + \frac{7 \cdot 8}{2} - 80.5 = 3.5$ $U_{1} = N_{1} \cdot N_{2} + \frac{N_{1} \cdot (N_{1} + 1)}{2} - R_{1} = 7 \cdot 8 + \frac{7 \cdot 8}{2} - 80.5 = 3.5$

si36_e

$U_{2} = N_{1} \cdot N_{2} + \frac{N_{2} \cdot (N_{2} + 1)}{2} - R_{2} = 7 \cdot 8 + \frac{8 \cdot 9}{2} - 39.5 = 52.5$ $U_{2} = N_{1} \cdot N_{2} + \frac{N_{2} \cdot (N_{2} + 1)}{2} - R_{2} = 7 \cdot 8 + \frac{8 \cdot 9}{2} - 39.5 = 52.5$

si37_e

(e) Calculation of the Mann-Whitney U statistic:

$U_{cal} = min (U_{1}, U_{2}) = 3.5$ $U_{cal} = min (U_{1}, U_{2}) = 3.5$

Step 5: According to Table J, in the Appendix, for N₁ = 7, N₂ = 8, and P(U_cal < U_c) = α/2 = 0.025 (bilateral test), the critical value of the Mann-Whitney U statistic is U_c = 10.
Step 6: Decision: since the calculated statistic is in the critical region, that is, U_cal < 10, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the medians of both populations are different.

Table 10.E.15

Combined Data
Data	Machine	Ranks
48.50	A	1.5
48.50	A	1.5
48.55	A	3
48.58	A	4
48.64	A	5.5
48.64	B	5.5
48.65	A	7
48.66	A	8
48.72	A	9
48.75	B	10
48.78	B	11
48.79	B	12
48.80	B	13
48.85	B	14
49.20	B	15

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table J, in the Appendix, unilateral probability P₁ associated to U_cal = 3.5, for N₁ = 7 and N₂ = 8, is less than 0.005. For a bilateral test, this probability must be doubled (P < 0.01).
Step 6: Decision: since P < 0.05, we must reject H₀.

Example 10.10

Applying the Mann-Whitney U Test to Large Samples

As described previously, as the sample size grows (N₂ > 20), the Mann-Whitney distribution becomes more similar to a standard normal distribution. Even though the data in Example 10.9 represent a small sample (N₂ = 8), which would be the value of z in this case, by using Expression (10.13)? Interpret the result.

Solution

$Z_{cal} = \frac{(U - N_{1} \cdot N_{2} / 2)}{\sqrt{\frac{N_{1} \cdot N_{2}}{N \cdot (N - 1)} \cdot (\frac{N^{3} - N}{12} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12})}} = \frac{(3.5 - 7 \cdot 8 / 2)}{\sqrt{\frac{7 \cdot 8}{15 \cdot 14} (\frac{15^{3} - 15}{12} - \frac{16 - 4}{12})}} - 2.840$ $Z_{cal} = \frac{(U - N_{1} \cdot N_{2} / 2)}{\sqrt{\frac{N_{1} \cdot N_{2}}{N \cdot (N - 1)} \cdot (\frac{N^{3} - N}{12} - \frac{\sum_{j = 1}^{g} t_{j}^{3} - \sum_{j = 1}^{g} t_{j}}{12})}} = \frac{(3.5 - 7 \cdot 8 / 2)}{\sqrt{\frac{7 \cdot 8}{15 \cdot 14} (\frac{15^{3} - 15}{12} - \frac{16 - 4}{12})}} - 2.840$

si39_e

The critical value of the z_c statistic for a bilateral test, at the significance level of 5%, is − 1.96 (see Table E in the Appendix). Since Z_cal < − 1.96, the null hypothesis would also be rejected by the Z statistic, which allows us to conclude, with a 95% confidence level, that the population medians are different.

Instead of comparing the value calculated to the critical value, we could obtain the value of P-value directly from Table E. Thus, the unilateral probability associated to statistic Z_cal = − 2.840 is P₁ = 0.0023. For a bilateral test, this probability must be doubled (P-value = 0.0046).

10.4.2.1 Solving the Mann-Whitney Test Using SPSS Software

The data in Example 10.9 are available in the file Mann-Whitney_Test.sav. Since group 1 is the one with the smallest number of observations, in Data → Define Variable Properties …, we assign value 1 to group B and value 2 to group A for variable Machine.

In order to elaborate the Mann-Whitney test on SPSS, we must click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Independent Samples …, as shown in Fig. 10.40.

Fig. 10.40 Procedure to elaborate the Mann-Whitney test on SPSS.

After that, we should insert the variable Diameter in the box Test Variable List and the variable Machine in Grouping Variable, defining the respective groups. Let’s select the option Mann-Whitney U in Test Type, as shown in Fig. 10.41.

Fig. 10.41 Selecting the variables and Mann-Whitney test.

Finally, let’s click on OK to obtain Figs. 10.42 and 10.43. Fig. 10.42 shows the mean and the sum of the ranks for each group, while Fig. 10.43 shows the statistic of the test.

The results in Fig. 10.42 are similar to the ones calculated in Example 10.9. According to Fig. 10.43, the result of the Mann-Whitney U statistic is 3.50, similar to the value calculated in Example 10.9. The bilateral probability associated to the U statistic is P = 0.002 (we saw in Example 10.9 that this probability is less than 0.01). For the same data in Example 10.9, if we had to calculate the Z statistic and the respective associated bilateral probability, the result would be Z_cal = − 2.840 and P = 0.005, similar to the values calculated in Example 10.10. For both tests, as the associated bilateral probability is less than 0.05, the null hypothesis is rejected, which allows us to conclude that the medians of both populations are different.

10.4.2.2 Solving the Mann-Whitney Test Using Stata Software

The Mann-Whitney test is elaborated on Stata from the command ranksum (equality test for nonpaired data), by using the following syntax:

ranksum variable⁎, by (groups⁎)

where the term variable⁎ must be replaced by the quantitative variable studied and the term groups⁎ by the categorical variable that represents the groups.

Let’s open the file Mann-Whitney_Test.dta that contains the data from Examples 10.9 and 10.10. Both groups are represented by the variable machine and the quality characteristic by the variable diameter. Thus, the command to be typed is:

ranksum diameter, by (machine)

The results obtained are shown in Fig. 10.44. We can see that the calculated value of the statistic (2.840) corresponds to the value calculated in Example 10.10, for large samples, from Expression (10.13). The probability associated to the statistic for a bilateral test is 0.0045. Since P < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the population medians are different.

10.5 Tests for k Paired Samples

These tests analyze the differences between k (three or more) paired or related samples. According to Siegel and Castellan (2006), the null hypothesis to be tested is that k samples have been drawn from the same population. The main tests for k paired samples are Cochran’s Q test (for binary variables) and Friedman’s test (for ordinal variables).

10.5.1 Cochran’s Q Test

Cochran’s Q test for k paired samples is an extension of the McNemar test for two samples, and it aims to test the hypothesis that the frequency in which or proportion of three or more related groups differ significantly from one another. In the same way as in the McNemar test, the data are binary.

According to Siegel and Castellan (2006), Cochran’s Q test compares the characteristics of several individuals or characteristics of the same individual observed under different conditions. For example, we can analyze if k items differ significantly for N individuals. Or, we may have only one item to analyze and the objective is to compare the answer of N individuals under k different conditions.

Let’s suppose that the study data are organized in one table with N rows and k columns, in which N is the number of cases and k is the number of groups or conditions. Through the null hypothesis of Cochran’s Q test, there are no differences between the frequencies or proportions of success (p) of the k related groups, that is, the proportion of a desired answer (success) is the same in each column. Through the alternative hypothesis, there are differences between at least two groups, so:

H₀: p₁ = p₂ = … = p_k
H₁: ∃_(i,j) p_i ≠ p_j, i ≠ j

Cochran’s Q statistic is given by:

$Q_{cal} = \frac{k \cdot (k - 1) \cdot \sum_{j = 1}^{k} {(G_{j} - \bar{G})}^{2}}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}} = \frac{(k - 1) \cdot [k \cdot \sum_{j = 1}^{k} G_{j}^{2} - {(\sum_{j = 1}^{k} G_{j})}^{2}]}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}}$ $Q_{cal} = \frac{k \cdot (k - 1) \cdot \sum_{j = 1}^{k} {(G_{j} - \bar{G})}^{2}}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}} = \frac{(k - 1) \cdot [k \cdot \sum_{j = 1}^{k} G_{j}^{2} - {(\sum_{j = 1}^{k} G_{j})}^{2}]}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}}$

si40_e (10.14)

which approximately follows a χ² distribution with k − 1 degrees of freedom, where:

G_j: the total number of successes in the jth column;
$\bar{G}$ $\bar{G}$ : mean of the G_j;
L_i: the total number of successes in the ith row.

The value calculated must be compared to the critical value of the χ² distribution (Table D in the Appendix). This table provides the critical values of χ_c² where P(χ_cal² > χ_c²) = α (for a right-tailed unilateral test). If the value of the statistic is in the critical region, that is, if Q_cal > χ_c², we must reject H₀. Otherwise, we do not reject H₀.

The probability associated to the calculated value of the statistic (P-value) can also be obtained from Table D. In this case, the null hypothesis is rejected if P ≤ α; otherwise we do not reject H₀.

Example 10.11

Applying Cochran’s Q Test

We are interested in assessing 20 consumers’ level of satisfaction regarding three supermarkets, trying to investigate if their clients are satisfied (score 1) or not (score 0) with the quality, variety and price of their products—for each supermarket. Check the hypothesis that the probability of receiving a good evaluation from clients is the same for all three supermarkets, considering a significance level of 10%. Table 10.E.16 shows the results of the evaluation.

Table 10.E.16

Results of the Evaluation for All Three Supermarkets
Consumer	A	B	C	L_i	L_i²
1	1	1	1	3	9
2	1	0	1	2	4
3	0	1	1	2	4
4	0	0	0	0	0
5	1	1	0	2	4
6	1	1	1	3	9
7	0	0	1	1	1
8	1	0	1	2	4
9	1	1	1	3	9
10	0	0	1	1	1
11	0	0	0	0	0
12	1	1	0	2	4
13	1	0	1	2	4
14	1	1	1	3	9
15	0	1	1	2	4
16	0	1	1	2	4
17	1	1	1	3	9
18	1	1	1	3	9
19	0	0	1	1	1
20	0	0	1	1	1
Total	G₁ = 11	G₂ = 11	G₃ = 16	$\sum_{i = 1}^{20} L_{i} = 38$ $\sum_{i = 1}^{20} L_{i} = 38$	$\sum_{i = 1}^{20} L_{i}^{2} = 90$ $\sum_{i = 1}^{20} L_{i}^{2} = 90$

Unlabelled Table

Solution

Step 1: The most suitable test to compare proportions of three or more paired groups is Cochran’s Q test.

Step 2: Through the null hypothesis, the proportion of successes (score 1) is the same for all three supermarkets. Through the alternative hypothesis, the proportion of satisfied clients differs for at least two supermarkets, so:

H₀: p₁ = p₂ = p₃

H₁: ∃_(i,j) p_i ≠ p_j, i ≠ j

Step 3: The significance level to be considered is 10%.
Step 4: The calculation of Cochran’s Q statistic from Expression (10.14), is given by:

$Q_{cal} = \frac{(k - 1) \cdot [k \cdot \sum_{j = 1}^{k} G_{j}^{2} - {(\sum_{j = 1}^{k} G_{j})}^{2}]}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}} = \frac{(3 - 1) \cdot [3 \cdot (11^{2} + 11^{2} + 16^{2}) - 38^{2}]}{3 \cdot 38 - 90} = 4.167$ $Q_{cal} = \frac{(k - 1) \cdot [k \cdot \sum_{j = 1}^{k} G_{j}^{2} - {(\sum_{j = 1}^{k} G_{j})}^{2}]}{k \cdot \sum_{i = 1}^{N} L_{i} - \sum_{i = 1}^{N} L_{i}^{2}} = \frac{(3 - 1) \cdot [3 \cdot (11^{2} + 11^{2} + 16^{2}) - 38^{2}]}{3 \cdot 38 - 90} = 4.167$

si42_e

Step 5: The critical region (CR) of the χ² distribution (Table D in the Appendix), considering α = 10% and ν = k − 1 = 2 degrees of freedom, is shown in Fig. 10.45.

Fig. 10.45 Critical region of Example 10.11.
Step 6: Decision: since the value calculated is not in the critical region, that is, Q_cal < 4.605, the null hypothesis is not rejected, which allows us to conclude, with a 90% level of confidence, that the proportion of satisfied clients is equal for all three supermarkets.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table D, in the Appendix, for ν = 2 degrees of freedom, the probability associated to statistic Q_cal = 4.167 is greater than 0.10 (P-value > 0.10).
Step 6: Decision: since P > 0.10, we should not reject H₀.

10.5.1.1 Solving Cochran’s Q Test by Using SPSS Software

The data in Example 10.11 are available in the file Cochran_Q_Test.sav. The procedure for elaborating Cochran’s Q test on SPSS is shown. First of all, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → K Related Samples …, as shown in Fig. 10.46.

Fig. 10.46 Procedure for elaborating Cochran’s Q test on SPSS.

After that, we must insert variables A, B, and C in the box Test Variables, and select option Cochran’s Q in Test Type, as shown in Fig. 10.47.

Finally, let’s click on OK to obtain the results of the test. Fig. 10.48 shows the frequencies of each group and Fig. 10.49 shows the result of the statistic.

The value of Cochran’s Q statistic is 4.167, similar to the value calculated in Example 10.11. The probability associated to the statistic is 0.125 (we saw in Example 10.11 that P > 0.10). Since P > α, the null hypothesis is not rejected, which allows us to conclude, with a 90% level of confidence, that there are no differences in the proportion of satisfied clients for all three supermarkets.

10.5.1.2 Solution of Cochran’s Q Test on Stata Software

The data from Example 10.11 are also available in the file Cochran_Q_Test.dta. The command used to elaborate the test is cochran followed by the k paired variables. In our case, the variables that represent the three groups of supermarkets, a, b, and c, respectively. So, the command to be typed is:

cochran a b c

The results of Cochran’s Q test on Stata are in Fig. 10.50. We can verify that the result of the statistic and the respective associated probability are similar to the results calculated in Example 10.11, and also generated on SPSS, which allows us to conclude, with a 90% level of confidence, that the proportion of dissatisfied clients is the same for all three supermarkets.

10.5.2 Friedman’s Test

Friedman’s test is applied to quantitative or qualitative variables in an ordinal scale, and has as its main objective to verify if k paired samples are drawn from the same population. It is an extension of the Wilcoxon test for three or more paired samples. It is also an alternative to the analysis of variance when its hypotheses (normality of data and homogeneity of variances) are violated or when the sample size is too small.

The data are represented in one table with double entry, with N rows and k columns, in which the rows represent the several individuals or corresponding sets of individuals, and the columns represent the different conditions.

Therefore, the null hypothesis of Friedman’s test assumes that the k samples (columns) come from the same population or from populations with the same median (μ). For a bilateral test, we have:

H₀: μ₁ = μ₂ = … = μ_k
H₁: ∃_(i,j) μ_i ≠ μ_j, i ≠ j

To apply Friedman’s statistic, we must attribute ranks from 1 to k to each element of each row. For example, position 1 is attributed to the lowest observation on the row and position N to the highest observation. If there are ties, we attribute the mean of the corresponding ranks. Friedman’s statistic is given by:

$F_{cal} = \frac{12}{N \cdot k \cdot (k + 1)} \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N \cdot (k + 1)$ $F_{cal} = \frac{12}{N \cdot k \cdot (k + 1)} \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N \cdot (k + 1)$

si43_e (10.15)

where:

N: the number of rows;
k: the number of columns;
R_j: sum of the ranks in column j.

However, according to Siegel and Castellan (2006), whenever there are ties between the ranks of the same group or row, Friedman’s statistic must be corrected in a way that considers the changes in the sample distribution, as follows:

$F_{cal}^{'} = \frac{12 \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N^{2} \cdot k \cdot {(k + 1)}^{2}}{N \cdot k \cdot (k + 1) + \frac{(N \cdot k - \sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3})}{(k - 1)}}$ $F_{cal}^{'} = \frac{12 \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N^{2} \cdot k \cdot {(k + 1)}^{2}}{N \cdot k \cdot (k + 1) + \frac{(N \cdot k - \sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3})}{(k - 1)}}$

si44_e (10.16)

where:

g_i: the number of sets with tied observations in the ith group, including the sets of size 1;
t_ij: size of the jth set of ties in the ith group.

The value calculated must be compared to the critical value of the sample distribution. When N and k are small (k = 3 and 3 < N < 13, or k = 4 and 2 < N < 8, or k = 5 and 3 < N < 5), we must use Table K in the Appendix, which shows the critical values of Friedman’s statistic (F_c), where P(F_cal > F_c) = α (for a right-tailed unilateral test). For high values of N and k, the sample distribution can be approximated by the χ² distribution with ν = k − 1 degrees of freedom.

Therefore, if the value of the F_cal statistic is in the critical region, that is, if F_cal > F_c for a small N and K or F_cal > χ_c² for a high N and K, we must reject the null hypothesis. Otherwise, we do not reject H₀.

Example 10.12

Applying Friedman’s Test

A research is carried out in order to verify the efficacy that breakfast has in weight loss and, in order to do that, 15 patients were followed up for 3 months. Data regarding patients’ weight were collected during three different periods, as shown in Table 10.E.17: before the treatment (BT), after the treatment (AT), and after 3 months of treatment (A3M). Check and see if the treatment had any results. Assume that α = 5%.

Table 10.E.17

Patients’ Weight in Each Period
Patient	Period
Patient	BT	AT	A3M
1	65	62	58
2	89	85	80
3	96	95	95
4	90	84	79
5	70	70	66
6	72	65	62
7	87	84	77
8	74	74	69
9	66	64	62
10	135	132	132
11	82	75	71
12	76	73	67
13	94	90	88
14	80	80	77
15	73	70	68

Unlabelled Table

Solution

Step 1: Since the data do not follow a normal distribution, Friedman’s test is an alternative to ANOVA to verify if the three paired samples are drawn from the same population.

Step 2: Through the null hypothesis, there is no difference among the treatments. Through the alternative hypothesis, the treatment had some results, so:

H₀: μ₁ = μ₂ = μ₃

H₁: ∃_(i,j) μ_i ≠ μ_j, i ≠ j

Step 3: The significance level to be considered is 5%.
Step 4: In order to calculate Friedman’s statistic, first, we must attribute ranks from 1 to 3 to each element in each row, as shown in Table 10.E.18. If there are ties, we attribute the mean of the corresponding ranks.

Table 10.E.18

Attributing Ranks
Patient	Period
Patient	BT	AT	A3M
1	3	2	1
2	3	2	1
3	3	1.5	1.5
4	3	2	1
5	2.5	2.5	1
6	3	2	1
7	3	2	1
8	2.5	2.5	1
9	3	2	1
10	3	1.5	1.5
11	3	2	1
12	3	2	1
13	3	2	1
14	2.5	2.5	1
15	3	2	1
R_j	43.5	30.5	16
Mean of the ranks	2.900	2.030	1.067

Unlabelled Table

As shown in Table 10.E.18, there are two ties in patient 3, two in patient 5, two in patient 8, two in patient 10, and two in patient 14. Therefore, the total number of size 2 ties is 5 and the total number of size 1 ties is 35. Thus:

$\sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3} = 35 \times 1 + 5 \times 2^{3} = 75$ $\sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3} = 35 \times 1 + 5 \times 2^{3} = 75$

si45_e

Since there are ties, the real value of Friedman’s statistic is calculated from Expression (10.16), as follows:

$F_{cal}^{'} = \frac{12 \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N^{2} \cdot k \cdot {(k + 1)}^{2}}{N \cdot k \cdot (k + 1) + \frac{(N \cdot k - \sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3})}{(k - 1)}} = \frac{12 \cdot ({43.5}^{2} + {30.5}^{2} + 16^{2}) - 3 \cdot 15^{2} \cdot 3 \cdot 4^{2}}{15 \cdot 3 \cdot 4 + \frac{(15 \cdot 3 - 75)}{2}}$ $F_{cal}^{'} = \frac{12 \cdot \sum_{j = 1}^{k} {(R_{j})}^{2} - 3 \cdot N^{2} \cdot k \cdot {(k + 1)}^{2}}{N \cdot k \cdot (k + 1) + \frac{(N \cdot k - \sum_{i = 1}^{N} \sum_{j = 1}^{g_{i}} t_{ij}^{3})}{(k - 1)}} = \frac{12 \cdot ({43.5}^{2} + {30.5}^{2} + 16^{2}) - 3 \cdot 15^{2} \cdot 3 \cdot 4^{2}}{15 \cdot 3 \cdot 4 + \frac{(15 \cdot 3 - 75)}{2}}$

si46_e

$F_{cal}^{'} = 27.527$ $F_{cal}^{'} = 27.527$

If we applied Expression (10.15) without the correction factor, the result of Friedman’s test would be 25.233.

Step 5: Since k = 3 and N = 15, let’s use the χ² distribution. The critical region (CR) of the χ² distribution (Table D in the Appendix), considering α = 5% and ν = k − 1 = 2 degrees of freedom, is shown in Fig. 10.51.

Fig. 10.51 Critical region of Example 10.12.
Step 6: Decision: since the value calculated is in the critical region, that is, F′_cal > 5.991, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the treatment has good results.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table D in the Appendix, for ν = 2 degrees of freedom, the probability associated to statistic F′_cal = 27.527 is less than 0.005 (P-value < 0.005).
Step 6: Decision: since P < 0.005, we must reject H₀.

10.5.2.1 Solving Friedman’s Test by Using SPSS Software

The data from Example 10.12 are available in the file Friedman_Test.sav. To elaborate Friedman’s test on SPSS, let’s first click on Analyze → Nonparametric Tests → Legacy Dialogs → K Related Samples …, as shown in Fig. 10.52.

Fig. 10.52 Procedure for elaborating Friedman’s test on SPSS.

After that, we must insert variables BT, AT, and A3M in the box Test Variables and select the option Friedman in Test Type, as shown in Fig. 10.53.

Finally, let’s click on OK to obtain the results of Friedman’s test. Fig. 10.54 shows the means of the ranks, similar to the values calculated in Table 10.E.18.

The value of Friedman’s statistic and the significance level of the test are in Fig. 10.55.

The value of the test is 27.527, similar to the one calculated in Example 10.12. The probability associated to the statistic is 0.000 (we saw in Example 10.12 that this probability is less than 0.005). Since P < 0.05, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the treatment has good results.

10.5.2.2 Solving Friedman’s Test by Using Stata Software

The data in Example 10.12 are available in the file Friedman_Test.dta. The variables being studied are bt, at, and a3m. Friedman’s test on Stata is elaborated from the command friedman. In order for this command to be used, first, we must type:

findit friedman

and install it on the link snp2 from http://www.stata.com/stb/stb3.

Elaborating Friedman’s test on Stata requires that the data be transposed. However, before transposing them, we can preserve the original dataset, typing preserve.

After that, let’s type the command xpose that transposes all the variables into observations and all the observations into variables:

xpose, clear

After the command xpose, we can see that the data were transformed into n variables (number of initial observations). Let’s now type the following command:

friedman v1–v15

since the current dataset now contains 15 variables after the transposition. Through Fig. 10.56, we can verify that Friedman’s statistic on Stata (25.233) is calculated from Expression (10.15), without the correction factor. The probability associated to the statistic is 0.000 (the null hypothesis is rejected), which allows us to conclude, with a 95% confidence level, that there are differences between the treatments. To restore the original dataset, we must type restore.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10: Nonparametric Tests

Create new playlist

Sign In

Sign Up

10.1 Introduction

10.2 Tests for One Sample

10.2.1 Binomial Test

10.2.1.1 Solving the Binomial Test Using SPSS Software

10.2.1.2 Solving the Binomial Test Using Stata Software

10.2.2 Chi-Square Test (χ2) for One Sample

10.2.2.1 Solving the χ2 Test for One Sample Using SPSS Software

10.2.2.2 Solving the χ2 Test for One Sample Using Stata Software

10.2.3 Sign Test for One Sample

10.2.3.1 Solving the Sign Test for One Sample Using SPSS Software

10.2.3.2 Solving the Sign Test for One Sample Using Stata Software

10.3 Tests for Two Paired Samples

10.3.1 McNemar Test

10.3.1.1 Solving the McNemar Test Using SPSS Software

10.3.1.2 Solving the McNemar Test Using Stata Software

10.3.2 Sign Test for Two Paired Samples

10.3.2.1 Solving the Sign Test for Two Paired Samples Using SPSS Software

10.3.2.2 Solving the Sign Test for Two Paired Samples Using Stata Software

10.3.3 Wilcoxon Test

10.3.3.1 Solving the Wilcoxon Test Using SPSS Software

10.3.3.2 Solving the Wilcoxon Test Using Stata Software

10.4 Tests for Two Independent Samples

10.4.1 Chi-Square Test (χ2) for Two Independent Samples

10.4.1.1 Solving the χ2 Statistic Using SPSS Software

10.4.1.2 Solving the χ2 Statistic by Using Stata Software

10.4.2 Mann-Whitney U Test

10.4.2.1 Solving the Mann-Whitney Test Using SPSS Software

10.4.2.2 Solving the Mann-Whitney Test Using Stata Software

10.5 Tests for k Paired Samples

10.5.1 Cochran’s Q Test

10.5.1.1 Solving Cochran’s Q Test by Using SPSS Software

10.5.1.2 Solution of Cochran’s Q Test on Stata Software

10.5.2 Friedman’s Test

10.5.2.1 Solving Friedman’s Test by Using SPSS Software

10.5.2.2 Solving Friedman’s Test by Using Stata Software

Table of Contents for
Chapter 10: Nonparametric Tests

10.2.2 Chi-Square Test (χ²) for One Sample

10.2.2.1 Solving the χ² Test for One Sample Using SPSS Software

10.2.2.2 Solving the χ² Test for One Sample Using Stata Software

10.4.1 Chi-Square Test (χ²) for Two Independent Samples

10.4.1.1 Solving the χ² Statistic Using SPSS Software

10.4.1.2 Solving the χ² Statistic by Using Stata Software