This chapter studies the main types of nonparametric tests and identifies in which situations they must be applied. Nonparametric tests are an alternative to parametric ones when their hypotheses are violated or in cases in which the variables are qualitative. The main differences between parametric and nonparametric tests are presented in this chapter, as well as their respective advantages and disadvantages. The assumptions inherent to nonparametric hypotheses tests are also listed here. As a result, it is possible to identify when to use each one of the nonparametric tests. Each test is solved analytically and via IBM SPSS Statistics Software® and Stata Statistical Software®. The results obtained are also interpreted.
Nonparametric tests; Binomial test; Chi-square test; Sign test; McNemar test; Wilcoxon test; Mann-Whitney U test; Cochran’s Q Test; Friedman’s test; Kruskal-Wallis test
Mathematics has wonderful strength that is capable of making us understand many mysteries of our faith.
Saint Jerome
As studied in the previous chapter, hypotheses tests are divided into parametric and nonparametric. Applied to quantitative data, parametric tests, formulate hypotheses about population parameters, such as the population mean (μ), population standard deviation (σ), population variance (σ2), population proportion (p), etc.
Parametric tests require strong assumptions regarding the data distribution. For example, in many cases, we should assume that the samples are collected from populations whose data follow a normal distribution. Or, still, for comparison tests of two paired population means or k population means (k ≥ 3), the population variances must be homogeneous.
Conversely, nonparametric tests can formulate hypotheses about the qualitative characteristics of the population, then, they can be applied to qualitative data, in nominal or ordinal scales. Since assumptions regarding the data distribution are in smaller number and weaker than the parametric tests, they are also known as distribution-free tests.
Nonparametric tests are an alternative to parametric ones when their hypotheses are violated. Given that they require a smaller number of assumptions, they are simpler and easier to apply, but less robust when compared to parametric tests.
In short, the main advantages of nonparametric tests are:
The main disadvantages are:
Thus, since parametric tests are more powerful than nonparametric ones, that is, they have a higher probability of rejecting the null hypothesis when it is really false, they must be chosen as long as all the assumptions are confirmed. On the other hand, nonparametric tests are an alternative to parametric ones when the hypotheses are violated or in cases in which the variables are qualitative.
Nonparametric tests are classified according to the variables’ level of measurement and to sample size. For a single sample, we will study the binomial, chi-square (χ2), and sign tests. The binomial test is applied to binary variables. The χ2 test can be applied to nominal variables as well as to ordinal variables. While the sign test is only applied to ordinal variables.
In the case of two paired samples, the main tests are the McNemar test, the sign test, and the Wilcoxon test. The McNemar test is applied to qualitative variables that assume only two categories (binary), while the sign test and the Wilcoxon test are applied to ordinal variables.
Considering two independent samples, we can highlight the χ2 test and the Mann-Whitney U test. The χ2 test can be applied to nominal or ordinal variables, while the Mann-Whitney U test only considers ordinal variables.
For k paired samples (k ≥ 3), we have Cochran’s Q test that considers binary variables and Friedman’s test that considers ordinal variables.
Finally, in the case of more than two independent samples, we will study the χ2 test for nominal or ordinal variables and the Kruskal-Wallis test for ordinal variables.
Table 10.1 shows this classification.
Table 10.1
Dimension | Level of Measurement | Nonparametric Test |
---|---|---|
One sample | Binary | Binomial |
Nominal or ordinal | χ2 | |
Ordinal | Sign test | |
Two paired samples | Binary | McNemar test |
Ordinal | Sign test Wilcoxon test | |
Two independent samples | Nominal or ordinal | χ2 |
Ordinal | Mann-Whitney U | |
K paired samples | Binary | Cochran’s Q |
Ordinal | Friedman’s test | |
K independent samples | Nominal or ordinal | χ2 |
Ordinal | Kruskal-Wallis test |
Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro.
Nonparametric tests in which the variables’ level of measurement is ordinal can also be applied to quantitative variables, but they must only be used in these cases, when the hypotheses of the parametric tests are rejected.
In this case, a random sample is taken from the population and we test the hypothesis that the sample data have a certain characteristic or distribution. Among the nonparametric statistical tests for a single sample, we can highlight the binomial test, the χ2 test, and the sign test. The binomial test is applied to binary data, the χ2 test to nominal or ordinal data, while the sign test is applied to ordinal data.
The binomial test is applied to an independent sample in which the variable that the researcher is interested in (X) is binary (dummy) or dichotomous, that is, it only has two possibilities: success or failure. We usually call result X = 1 a success and result X = 0 a failure, because it is more convenient. The probability of success in choosing a certain observation is represented by p and the probability of failure by q, that is:
For a bilateral test, we must consider the following hypotheses:
According to Siegel and Castellan (2006), the number of successes (Y) or the number of results of type [X = 1] results in a sequence of N observations is:
For the authors, in a sample of size N, the probability of obtaining k objects in a category and N − k objects in the other category is given by:
where:
Table F1 in the Appendix provides the probability of P[Y = k] for several values of N, k, and p.
However, when we test hypotheses, we must use the probability of obtaining values that are greater than or equal to the value observed:
Or the probability of obtaining values that are less than or equal to the value observed:
According to Siegel and Castellan (2006), when p = q = ½, instead of calculating the probabilities based on the expressions presented, it is more convenient to use Table F2 in the Appendix. This table provides the unilateral probabilities, under the null hypothesis H0: p = 1/2, of obtaining values that are as extreme as or more extreme than k, where k is the lowest of the frequencies observed (P(Y ≤ k)). Due to the symmetry of a binomial distribution, when p = ½, we have P(Y ≥ k) = P(Y ≤ N − k). A unilateral test is used when we predict, in advance, which of both categories must contain the smallest number of cases. For a bilateral test (when the estimate simply refers to the fact that both frequencies will differ), we just need to double the values from Table F2 in the Appendix.
This final value obtained is called P-value, which, according to what was discussed in Chapter 9, corresponds to the probability (unilateral or bilateral) associated to the value observed in the sample. P-value indicates the lowest significance level observed, which would lead to the rejection of the null hypothesis. Thus, we reject H0 if P ≤ α.
In the case of large samples (N > 25), the sample distribution of variable Y is closer to a standard normal distribution, so, the probability can be calculated by the following statistic:
where refers to the sample estimate of the proportion of successes so that we can test H0.
The value of Zcal calculated by using Expression (10.4) must be compared to the critical value of the standard normal distribution (see Table E in the Appendix). This table provides the critical values of zc where P(Zcal > zc) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Zcal < − zc) = α/2 = P(Zcal > zc).
Therefore, for a right-tailed unilateral test, the null hypothesis is rejected if Zcal > zc. Now, for a bilateral test, we reject H0 if Zcal < − zc or Zcal > zc.
Example 10.1 will be solved using IBM SPSS Statistics Software®. The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data are available in the file Binomial_Test.sav. The procedure for solving the binomial test using SPSS is described. Let’s select Analyze → Nonparametric Tests → Legacy Dialogs → Binomial … (Fig. 10.2).
First, let’s insert variable Method into the Test Variable List. In Test Proportion, we must define p = 0.50, since the probability of success and failure is the same (Fig. 10.3).
Finally, let’s click on OK. The results can be seen in Fig. 10.4.
The associated probability for a bilateral test is P = 0.481, similar to the value calculated in Example 10.1. Since P > α (0.481 > 0.05), we do not reject H0, which allows us to conclude, with a 95% confidence level, that p = q = ½.'
Example 10.1 will also be solved using Stata Statistical Software®. The use of the images presented in this section has been authorized by Stata Corp LP©. The data are available in the file Binomial_Test.dta.
The syntax of the binomial test on Stata is:
where the term variable⁎ must be replaced by the variable considered in the analysis and #p by the probability of success specified in the null hypothesis.
In Example 10.1, our studied variable is method and, through the null hypothesis, there are no differences in the choice between both methods, so, the command to be typed is:
The result of the binomial test is shown in Fig. 10.5. We can see that the associated probability for a bilateral test is P = 0.481, similar to the value calculated in Example 10.1, and also obtained via SPSS software. Since P > 0.05, we do not reject H0, which allows us to conclude, with a 95% confidence level, that p = q = ½.
The χ2 test presented in this section is an extension of the binomial test and is applied to a single sample in which the variable being studied assumes two or more categories. The variables can be nominal or ordinal. The test compares the frequencies observed to the frequencies expected in each category.
The χ2 test assumes the following hypotheses:
The statistic for the test, analogous to Expression (4.1) in Chapter 4, is given by:
where:
The values of χcal2 approximately follow a χ2 distribution with ν = k − 1 degrees of freedom. The critical values of the chi-square (χc2) statistic can be found in Table D in the Appendix, which provides the critical values of χc2, where P(χcal2 > χc2) = α (for a right-tailed unilateral test). In order for the null hypothesis H0 to be rejected, the value of the χcal2 statistic must be in the critical region (CR), that is, χcal2 > χc2. Otherwise, we do not reject H0 (Fig. 10.6).
P-value (the probability associated to the value of the χcal2 statistic calculated from the sample) can also be obtained from Table D. In this case, we reject H0 if P ≤ α.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.3 are available in the file Chi-Square_One_Sample.sav. The procedure for applying the χ2 test on SPSS is described. First, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → Chi-Square …, as shown in Fig. 10.8.
After that, we should insert the variable Day_week into the Test Variable List. The variable being studied has seven categories. The options Get from data and Use specified range (Lower = 1 and Upper = 7) in Expected Range generate the same results. The frequencies expected for the seven categories are exactly the same. Thus, we must select the option All categories equal in Expected Values, as shown in Fig. 10.9.
Finally, let’s click on OK to obtain the results of the χ2 test, as shown in Fig. 10.10.
Therefore, the value of the χ2 statistic is 4.533, similar to the value calculated in Example 10.3. Since the P-value = 0.605 > 0.05 (in Example 10.3, we saw that 0.1 < P < 0.9), we do not reject H0, which allows us to conclude, with a 95% confidence level, that the sales do not depend on the day of the week.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data in Example 10.3 are available in the file Chi-Square_One_Sample.dta. The variable being studied is day_week.
The χ2 test for one sample on Stata can be obtained from the command csgof (chi-square goodness of fit), which allows us to compare the distribution of frequencies observed to the ones expected of a certain categorical variable with more than two categories.
In order for this command to be used, first, we must type:
and install it through the link csgof from http://www.ats.ucla.edu/stat/stata/ado/analysis.
After doing this, we can type the following command:
The result is shown in Fig. 10.11. We can see that the result of the test is similar to the one calculated in Example 10.3 and on SPSS, as well as to the probability associated to the statistic.
The sign test is an alternative to the t-test for a single random sample when the data distribution of the population does not follow a normal distribution. The only assumption required by the sign test is that the distribution of the variable be continuous.
The sign test is based on the population median (μ). The probability of obtaining a sample value that is less than the median and the probability of obtaining a sample value that is greater than the median are the same (p = ½). The null hypothesis of the test is that μ is equal to a certain value specified by the investigator (μ0). For a bilateral test, we have:
The quantitative data are converted into signs, (+) or (−), that is, values greater than the median (μ0) start being represented by (+) and values less than μ0 by (−). Data with values equal to μ0 are excluded from the sample. Thus, the sign test is applied to ordinal data and offers little power to the researcher, since this conversion results in a considerable loss of information regarding the original data.
Let’s establish that N is the number of positive and negative signs (sample size disregarding any ties) and k is the number of signs that corresponds to the lowest frequency.
For small samples (N ≤ 25), we will use the binomial test with p = ½ to calculate P(Y ≤ k). This probability can be obtained directly from Table F2 in the Appendix.
When N > 25, the binomial distribution is more similar to a normal distribution. The value of Z is given by:
where X corresponds to the lowest or highest frequency. If X represents the lowest frequency, we must calculate X + 0.5. On the other hand, if X represents the highest frequency, we must calculate X − 0.5.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
SPSS makes the sign test available only for two related samples (2 Related Samples). Thus, in order for us to use the test for a single sample, we must generate a new variable with n values (sample size including ties), all of them equal to μ0. The data in Example 10.4 are available in the file Sign_Test_One_Sample.sav.
The procedure for applying the sign test on SPSS is shown. First of all, we must click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.12.
After that, we must insert variable 1 (Age_pop) and variable 2 (Age_sample) into Test Pairs. Let’s select the option regarding the sign test (Sign) in Test Type, as shown in Fig. 10.13.
Next, let’s click on OK to obtain the results of the sign test, as shown in Figs. 10.14 and 10.15.
Fig. 10.14 shows the frequencies of negative and positive signs, the total number of ties, and the total frequency.
Fig. 10.15 shows the associated probability for a bilateral test, which is similar to the value found in Example 10.4. Since P = 0.648 > 0.10, we do not reject the null hypothesis, which allows us to conclude, with a 90% confidence level, that the median retirement age is 65.
The use of the images presented in this section has been authorized by Stata Corp LP©.
Different from SPSS software, Stata makes the sign test for one sample available. On Stata, the sign test for a single sample as well as for two paired samples can be obtained from the command signtest.
The syntax of the test for one sample is:
where the term variable⁎ must be replaced by the variable considered in the analysis and # by the value of the population median to be tested.
The data in Example 10.4 are available in the file Sign_Test_One_Sample.dta. The variable analyzed is age and the main objective is to verify if the median retirement age is 65. The command to be typed is:
The result of the test is shown in Fig. 10.16. Analogous to the results presented in Example 10.4 and also generated on SPSS, the number of positive signs is 11, the number of negative signs is 8, and the associated probability for a bilateral test is 0.648. Since P > 0.10, we do not reject the null hypothesis, which allows us to conclude, with a 90% confidence level, that the median retirement age is 65.
These tests investigate if two samples are somehow related. The most common examples analyze a situation before and after a certain event. We will study the following tests: the McNemar test for binary variables and the sign and Wilcoxon tests for ordinal variables.
The McNemar test is applied to assess the significance of changes in two related samples with qualitative or categorical variables that assume only two categories (binary variables). The main goal of the test is to verify if there are any significant changes before and after the occurrence of a certain event. In order to do that, let’s use a 2 × 2 contingency table, as shown in Table 10.2.
According to Siegel and Castellan (2006), the + and − signs are used to represent the possible changes in the answers before and after. The frequencies of each occurrence are represented in their respective cells in Table 10.2.
For example, if there are changes from the first answer (+) to the second answer (−), the result will be written in the right upper cell, so, B represents the total number of observations that presented changes in their behavior from (+) to (−).
Analogously, if there are changes from the first answer (−) to the second answer (+), the result will be written in the left lower cell, so, C represents the total number of observations that presented changes in their behavior from (−) to (+).
On the other hand, while A represents the total number of observations that remained with the same answer (+) before and after, D represents the total number of observations with the same answer (−) in both periods.
Thus, the total number of individuals that change their answer can be represented by B + C.
Through the null hypothesis of the test, the total number of changes in each direction is equally likely, that is:
According to Siegel and Castellan (2006), McNemar statistic is calculated based on the chi-square (χ2) statistic presented in Expression (10.5), that is:
According to the same authors, a correction factor must be used in order for a continuous χ2 distribution to become more similar to a discrete χ2 distribution, so:
The value calculated must be compared to the critical value of the χ2 distribution (Table D in the Appendix). This table provides the critical values of χc2 where P(χcal2 > χc2) = α (for a right-tailed unilateral test). If the value of the statistic is in the critical region, that is, if χcal2 > χc2, we reject H0. Otherwise, we should not reject H0.
The probability associated to the χcal2 statistic (P-value) can also be obtained from Table D. In this case, the null hypothesis is rejected if P ≤ α. Otherwise, we do not reject H0.
Example 10.5 will be solved using SPSS software. The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data are available in the file McNemar_Test.sav. The procedure for applying the McNemar test on SPSS is presented. Let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.17.
After that, we should insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s select the McNemar test option in Test Type, as shown in Fig. 10.18.
Finally, we must click on OK to obtain Figs. 10.19 and 10.20. Fig. 10.19 shows the frequencies observed before and after the reform (Contingency Table). The result of the McNemar test is shown in Fig. 10.20.
According to Fig. 10.20, the significance level observed in the McNemar test is 0.000, value lower than 5%, so, the null hypothesis is rejected. Hence, we may conclude, with a 95% confidence level, that there was a significant change in choosing to work at a public or a private organization after the social security reform.
Example 10.5 will also be solved using Stata software. The use of the images presented in this section has been authorized by Stata Corp LP©. The data are available in the file McNemar_Test.dta.
The McNemar test can be calculated on Stata by using the command mcc followed by the paired variables. In our example, the paired variables are called before and after, so, the command to be typed is:
The result of the McNemar test is shown in Fig. 10.21. We can see that the value of the statistic is 13.5, similar to the value calculated by Expression (10.7), without the correction factor. The significance level observed from the test is 0.000, lower than 5%, which allows us to conclude, with a 95% confidence level, that there was a significant change before and after the reform.
The result of the McNemar test could have also been obtained by using the command mcci 14 21 3 22.
The sign test can also be applied to two paired samples. In this case, the sign is given by the difference between the pairs, that is, if the difference results in a positive number, each pair of values is replaced by a (+) sign. On the other hand, if the result of the difference is negative, each pair of values is replaced by a (−) sign. In case of a tie, the data will be excluded from the sample.
Analogous to the sign test for a single sample, the sign test presented in this section is also an alternative to the t-test for comparing two related samples when the data distribution is not normal. In this case, the quantitative data are transformed into ordinal data. Thus, the sign test is much less powerful than the t-test, because it only uses the difference sign between the pairs as information.
Through the null hypothesis, the population median of the differences (μd) is zero. Therefore, for a bilateral test, we have:
In other words, we tested the hypothesis that there are no differences between both samples (the samples come from populations with the same median and the same continuous distribution), that is, the number of (+) signs is the same as number of (−) signs.
The same procedure presented in Section 10.2.3 for a single sample will be used in order to calculate the sign statistic in the case of two paired samples.
We say that N is the number of positive and negative signs (sample size disregarding the ties) and k is the number of signs that corresponds to the lowest frequency. If N ≤ 25, we will use the binomial test with p = ½ to calculate P(Y ≤ k). This probability can be obtained directly from Table F2 in the Appendix.
When N > 25, the binomial distribution is more similar to a normal distribution, and the value of Z is given by Expression (10.6):
where X corresponds to the lowest or highest frequency. If X represents the lowest frequency, we must use X + 0.5. On the other hand, if X represents the highest frequency, we must use X − 0.5.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.6 can be found in the file Sign_Test_Two_Paired_Samples.sav. The procedure for applying the sign test to two paired samples on SPSS is shown. We have to click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.23.
After that, let’s insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s also select the option regarding the sign test (Sign) in Test Type, as shown in Fig. 10.24.
Finally, let’s click on OK to obtain the results of the sign test for two paired samples (Figs. 10.25 and 10.26).
Fig. 10.25 shows the frequencies of negative and positive signs, the total number of ties, and the total frequency.
Fig. 10.26 shows the result of the z test, besides the associated P probability for a bilateral test, values that are similar to the ones calculated in Example 10.6. Since P = 0.556 > 0.05, the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that there is no difference in productivity before and after the training course.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data in Example 10.6 also are available on Stata in the file Sign_Test_Two_Paired_Samples.dta. The paired variables are before and after.
As discussed in Section 10.2.3.2 for a single sample, the sign test on Stata is carried out from the command signtest. In the case of two paired samples, we must use the same command. However, it must be followed by the names of the paired variables, with the equal sign between them, since the objective is to test the equality of the respective medians. Thus, the command to be typed for our example is:
The result of the test is shown in Fig. 10.27 and includes the number of positive signs (15), the number of negative signs (11), as well as the probability associated to the statistic for a bilateral test (P = 0.557). These values are similar to the ones calculated in Example 10.6 and also generated on SPSS. Since P > 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is no difference in productivity before and after the training course.
Analogous to the sign test for two paired samples, the Wilcoxon test is an alternative to the t-test when the data distribution does not follow a normal distribution.
The Wilcoxon test is an extension of the sign test; however, it is more powerful. Besides the information about the direction of the differences for each pair, the Wilcoxon test considers the magnitude of the difference within the pairs (Fávero et al., 2009). The logical foundations and the method used in the Wilcoxon test are described, based on Siegel and Castellan (2006).
Let’s assume that di is the difference between the values for each pair of data. First of all, we have to place all of the di’s in ascending order according to their absolute value (without considering the sign) and calculate the respective ranks using this order. For example, position 1 is attributed to the lowest | di |, position 2 to the second lowest, and so on. At the end, we must attribute the di difference sign for each rank. The sum of all positive ranks is represented by Sp and the sum of all negative ranks by Sn.
Occasionally, the values for a certain pair of data are the same (di = 0). In this case, they are excluded from the sample. It is the same procedure used in the sign test, so, the value of N represents the sample size disregarding these ties.
Another type of tie may happen, in which two or more differences have the same absolute value. In this case, the same rank will be attributed to the ties, which will correspond to the mean of the ranks that would have been attributed if the differences had been different. For example, suppose that three pairs of data indicate the following differences: − 1, 1, and 1. Rank 2 is attributed to each pair, which corresponds to the average between 1, 2, and 3. In order, the next value will receive rank 4, since ranks 1, 2, and 3 have already been used.
The null hypothesis assumes that the median of the differences in the population (μd) is zero, that is, the populations do not differ in location. For a bilateral test, we have:
In other words, we must test the hypothesis that there are no differences between both samples (the samples come from populations with the same median and the same continuous distribution), that is, the sum of the positive ranks (Sp) is the same as the sum of the negative ranks (Sn).
If N ≤ 15, Table I in the Appendix shows the unilateral probabilities associated to the several critical values of Sc (P(Sp > Sc) = α). For a bilateral test, this value must be doubled. If the probability obtained (P-value) is less than or equal to α, we must reject H0.
As N grows, the Wilcoxon distribution becomes more similar to a standard normal distribution. Thus, for N > 15, we must calculate the value of variable z that, according to Siegel and Castellan (2006), Fávero et al. (2009), and Maroco (2014), is:
where:
The value calculated must be compared to the critical value of the standard normal distribution (Table E in the Appendix). This table provides the critical values of zc where P(Zcal > zc) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Zcal < − zc) = P(Zcal > zc) = α/2. The null hypothesis H0 of a bilateral test is rejected if the value of the Zcal statistic is in the critical region, that is, if Zcal < − zc or Zcal > zc. Otherwise, we do not reject H0.
The unilateral probabilities associated to statistic Zcal (P1) can also be obtained from Table E. For a unilateral test, we consider P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Thus, for both tests, we reject H0 if P ≤ α.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.7 are available in the file Wilcoxon_Test.sav. The procedure for applying the Wilcoxon test to two paired samples on SPSS is shown. Let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Related Samples …, as shown in Fig. 10.29.
First of all, let’s insert variable 1 (Before) and variable 2 (After) into Test Pairs. Let’s also select the option related to the Wilcoxon test in Test Type, as shown in Fig. 10.30.
Finally, let’s click on OK to obtain the results of the Wilcoxon test for two paired samples (Figs. 10.31 and 10.32).
Fig. 10.31 shows the number of negative, positive, and tied ranks, besides the mean and the sum of all positive and negative ranks.
Fig. 10.32 shows the result of the z test, besides the associated P probability for a bilateral test, values similar to the ones found in Example 10.7. Since P = 0.003 < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in the students’ performance before and after the course.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data in Example 10.7 are available in the file Wilcoxon_Test.dta. The paired variables are called before and after.
The Wilcoxon test on Stata is carried out from the command signrank followed by the name of the paired variables with an equal sign between them. For our example, we must type the following command:
The result of the test is shown in Fig. 10.33. Since P < 0.05, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in the students’ performance before and after the course.
In these tests, we try to compare two populations represented by their respective samples. Different from the tests for two paired samples, here, it is not necessary for the samples to have the same size. Among the tests for two independent samples, we can highlight the chi-square test (for nominal or ordinal variables) and the Mann-Whitney test for ordinal variables.
In Section 10.2.2, the χ2 test was applied to a single sample in which the variable being studied was qualitative (nominal or ordinal). Here the test will be applied to two independent samples, from nominal or ordinal qualitative variables. This test has already been studied in Chapter 4 (Section 4.2.2), in order to verify if there is an association between two qualitative variables, and it will be described once again in this section.
The test compares the frequencies observed in each one of the cells of a contingency table to the frequencies expected. The χ2 test for two independent samples assumes the following hypotheses:
Therefore, the χ2 statistic measures the discrepancy between a table with the contingency observed and a table with the contingency expected, starting from the hypothesis that there is no connection between the categories of both variables studied. If the distribution of frequencies observed is exactly the same as the distribution of frequencies expected, the result of the χ2 statistic is zero. Thus, a low value of χ2 indicates independence between the variables.
As already presented in Expression (4.1) in Chapter 4, the χ2 statistic for two independent samples is given by:
where:
The values of χcal2 approximately follow an χ2 distribution with ν = (I − 1)·(J − 1) degrees of freedom. The critical values of the chi-square statistic (χc2) can be found in Table D, in the Appendix. This table provides the critical values of χc2 where P(χcal2 > χc2) = α (for a right-tailed unilateral test). In order for the null hypothesis H0 to be rejected, the value of the χcal2 statistic must be in the critical region, that is, χcal2 > χc2. Otherwise, we do not reject H0 (Fig. 10.34).
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.8 are available in the file HealthInsurance.sav. In order to calculate the χ2 statistic for two independent samples, we must click on Analyze → Descriptive Statistics → Crosstabs … Let’s insert variable Agency in Row(s) and variable Satisfaction in Column(s), as shown in Fig. 10.36.
In Statistics …, let’s select option Chi-square, as shown in Fig. 10.37. Then, we must finally click on Continue and OK. The result is shown in Fig. 10.38.
From Fig. 10.38, we can see that the value of χ2 is 15.861, similar to what was calculated in Example 10.8. For the confidence level of 95%, as P = 0.003 < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is an association between the variable categories, that is, the frequencies observed differ from the frequencies expected in at least one pair of categories.
The use of the images presented in this section has been authorized by Stata Corp LP©.
As presented in Chapter 4, the calculation of the χ2 statistic on Stata is done by using the command tabulate, or simply tab, followed by the name of the variables being studied, using option chi2, or simply ch. The syntax of the test is:
The data in Example 10.8 are also available in the file HealthCareInsurance.dta. The variables being studied are agency and satisfaction. Thus, we must type the following command:
The results can be seen in Fig. 10.39 and are similar to the ones presented in Example 10.8 and on Stata.
The Mann-Whitney U test is one of the most powerful nonparametric tests, applied to quantitative or qualitative variables in an ordinal scale, and it aims at verifying if two nonpaired or independent samples are drawn from the same population. It is an alternative to Student’s t-test when the normality hypothesis is violated or when the sample is small. In addition, it may be considered a nonparametric version of the t-test for two independent samples.
Since the original data are transformed into ranks (orders), we lose some information, so, the Mann-Whitney U test is not as powerful as the t-test.
Different from the t-test that verifies the equality of the means of two independent populations and with continuous data, the Mann-Whitney U test verifies the equality of the medians. For a bilateral test, the null hypothesis is that the median of both populations is equal, that is:
H0: μ1 = μ2
H1: μ1 ≠ μ2
The calculation of the Mann-Whitney U statistic is specified, for small and large samples.
Method:
Table J in the Appendix shows the critical values of U in a way that P(Ucal < Uc) = α (for a left-tailed unilateral test), for values of N2 ≤ 20 and significance levels of 0.05, 0.025, 0.01, and 0.005. In order for the null hypothesis H0 of the left-tailed unilateral test to be rejected, the value of the Ucal statistic must be in the critical region, that is, Ucal < Uc. Otherwise, we do not reject H0. For a bilateral test, we must consider P(Ucal < Uc) = α/2, since P(Ucal < Uc) +P(Ucal > Uc) = α.
The unilateral probabilities associated to the Ucal statistic (P1) can also be obtained from Table J. For a unilateral test, we have P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Thus, we reject H0 if P ≤ α.
As the sample size grows (N2 > 20), the Mann-Whitney distribution becomes more similar to a standard normal distribution.
The real value of the Z statistic is given by:
where:
The value calculated must be compared to the critical value of the standard normal distribution (see Table E in the Appendix). This table provides the critical values of zc where P(Zcal > zc) = α (for a right-tailed unilateral test). For a bilateral test, we have P(Zcal < − zc) = P(Zcal > zc) = α/2. Therefore, for a bilateral test, the null hypothesis is rejected if Zcal < − zc or Zcal > zc.
Unilateral probabilities associated to the Zcal (P1 = P) statistic can also be obtained from Table E. For a bilateral test, this probability must be doubled (P = 2P1). Thus, the null hypothesis is rejected if P ≤ α.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.9 are available in the file Mann-Whitney_Test.sav. Since group 1 is the one with the smallest number of observations, in Data → Define Variable Properties …, we assign value 1 to group B and value 2 to group A for variable Machine.
In order to elaborate the Mann-Whitney test on SPSS, we must click on Analyze → Nonparametric Tests → Legacy Dialogs → 2 Independent Samples …, as shown in Fig. 10.40.
After that, we should insert the variable Diameter in the box Test Variable List and the variable Machine in Grouping Variable, defining the respective groups. Let’s select the option Mann-Whitney U in Test Type, as shown in Fig. 10.41.
Finally, let’s click on OK to obtain Figs. 10.42 and 10.43. Fig. 10.42 shows the mean and the sum of the ranks for each group, while Fig. 10.43 shows the statistic of the test.
The results in Fig. 10.42 are similar to the ones calculated in Example 10.9. According to Fig. 10.43, the result of the Mann-Whitney U statistic is 3.50, similar to the value calculated in Example 10.9. The bilateral probability associated to the U statistic is P = 0.002 (we saw in Example 10.9 that this probability is less than 0.01). For the same data in Example 10.9, if we had to calculate the Z statistic and the respective associated bilateral probability, the result would be Zcal = − 2.840 and P = 0.005, similar to the values calculated in Example 10.10. For both tests, as the associated bilateral probability is less than 0.05, the null hypothesis is rejected, which allows us to conclude that the medians of both populations are different.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The Mann-Whitney test is elaborated on Stata from the command ranksum (equality test for nonpaired data), by using the following syntax:
ranksum variable⁎, by (groups⁎)
where the term variable⁎ must be replaced by the quantitative variable studied and the term groups⁎ by the categorical variable that represents the groups.
Let’s open the file Mann-Whitney_Test.dta that contains the data from Examples 10.9 and 10.10. Both groups are represented by the variable machine and the quality characteristic by the variable diameter. Thus, the command to be typed is:
ranksum diameter, by (machine)
The results obtained are shown in Fig. 10.44. We can see that the calculated value of the statistic (2.840) corresponds to the value calculated in Example 10.10, for large samples, from Expression (10.13). The probability associated to the statistic for a bilateral test is 0.0045. Since P < 0.05, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the population medians are different.
These tests analyze the differences between k (three or more) paired or related samples. According to Siegel and Castellan (2006), the null hypothesis to be tested is that k samples have been drawn from the same population. The main tests for k paired samples are Cochran’s Q test (for binary variables) and Friedman’s test (for ordinal variables).
Cochran’s Q test for k paired samples is an extension of the McNemar test for two samples, and it aims to test the hypothesis that the frequency in which or proportion of three or more related groups differ significantly from one another. In the same way as in the McNemar test, the data are binary.
According to Siegel and Castellan (2006), Cochran’s Q test compares the characteristics of several individuals or characteristics of the same individual observed under different conditions. For example, we can analyze if k items differ significantly for N individuals. Or, we may have only one item to analyze and the objective is to compare the answer of N individuals under k different conditions.
Let’s suppose that the study data are organized in one table with N rows and k columns, in which N is the number of cases and k is the number of groups or conditions. Through the null hypothesis of Cochran’s Q test, there are no differences between the frequencies or proportions of success (p) of the k related groups, that is, the proportion of a desired answer (success) is the same in each column. Through the alternative hypothesis, there are differences between at least two groups, so:
Cochran’s Q statistic is given by:
which approximately follows a χ2 distribution with k − 1 degrees of freedom, where:
The value calculated must be compared to the critical value of the χ2 distribution (Table D in the Appendix). This table provides the critical values of χc2 where P(χcal2 > χc2) = α (for a right-tailed unilateral test). If the value of the statistic is in the critical region, that is, if Qcal > χc2, we must reject H0. Otherwise, we do not reject H0.
The probability associated to the calculated value of the statistic (P-value) can also be obtained from Table D. In this case, the null hypothesis is rejected if P ≤ α; otherwise we do not reject H0.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.11 are available in the file Cochran_Q_Test.sav. The procedure for elaborating Cochran’s Q test on SPSS is shown. First of all, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → K Related Samples …, as shown in Fig. 10.46.
After that, we must insert variables A, B, and C in the box Test Variables, and select option Cochran’s Q in Test Type, as shown in Fig. 10.47.
Finally, let’s click on OK to obtain the results of the test. Fig. 10.48 shows the frequencies of each group and Fig. 10.49 shows the result of the statistic.
The value of Cochran’s Q statistic is 4.167, similar to the value calculated in Example 10.11. The probability associated to the statistic is 0.125 (we saw in Example 10.11 that P > 0.10). Since P > α, the null hypothesis is not rejected, which allows us to conclude, with a 90% level of confidence, that there are no differences in the proportion of satisfied clients for all three supermarkets.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data from Example 10.11 are also available in the file Cochran_Q_Test.dta. The command used to elaborate the test is cochran followed by the k paired variables. In our case, the variables that represent the three groups of supermarkets, a, b, and c, respectively. So, the command to be typed is:
The results of Cochran’s Q test on Stata are in Fig. 10.50. We can verify that the result of the statistic and the respective associated probability are similar to the results calculated in Example 10.11, and also generated on SPSS, which allows us to conclude, with a 90% level of confidence, that the proportion of dissatisfied clients is the same for all three supermarkets.
Friedman’s test is applied to quantitative or qualitative variables in an ordinal scale, and has as its main objective to verify if k paired samples are drawn from the same population. It is an extension of the Wilcoxon test for three or more paired samples. It is also an alternative to the analysis of variance when its hypotheses (normality of data and homogeneity of variances) are violated or when the sample size is too small.
The data are represented in one table with double entry, with N rows and k columns, in which the rows represent the several individuals or corresponding sets of individuals, and the columns represent the different conditions.
Therefore, the null hypothesis of Friedman’s test assumes that the k samples (columns) come from the same population or from populations with the same median (μ). For a bilateral test, we have:
To apply Friedman’s statistic, we must attribute ranks from 1 to k to each element of each row. For example, position 1 is attributed to the lowest observation on the row and position N to the highest observation. If there are ties, we attribute the mean of the corresponding ranks. Friedman’s statistic is given by:
where:
However, according to Siegel and Castellan (2006), whenever there are ties between the ranks of the same group or row, Friedman’s statistic must be corrected in a way that considers the changes in the sample distribution, as follows:
where:
The value calculated must be compared to the critical value of the sample distribution. When N and k are small (k = 3 and 3 < N < 13, or k = 4 and 2 < N < 8, or k = 5 and 3 < N < 5), we must use Table K in the Appendix, which shows the critical values of Friedman’s statistic (Fc), where P(Fcal > Fc) = α (for a right-tailed unilateral test). For high values of N and k, the sample distribution can be approximated by the χ2 distribution with ν = k − 1 degrees of freedom.
Therefore, if the value of the Fcal statistic is in the critical region, that is, if Fcal > Fc for a small N and K or Fcal > χc2 for a high N and K, we must reject the null hypothesis. Otherwise, we do not reject H0.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data from Example 10.12 are available in the file Friedman_Test.sav. To elaborate Friedman’s test on SPSS, let’s first click on Analyze → Nonparametric Tests → Legacy Dialogs → K Related Samples …, as shown in Fig. 10.52.
After that, we must insert variables BT, AT, and A3M in the box Test Variables and select the option Friedman in Test Type, as shown in Fig. 10.53.
Finally, let’s click on OK to obtain the results of Friedman’s test. Fig. 10.54 shows the means of the ranks, similar to the values calculated in Table 10.E.18.
The value of Friedman’s statistic and the significance level of the test are in Fig. 10.55.
The value of the test is 27.527, similar to the one calculated in Example 10.12. The probability associated to the statistic is 0.000 (we saw in Example 10.12 that this probability is less than 0.005). Since P < 0.05, we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the treatment has good results.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data in Example 10.12 are available in the file Friedman_Test.dta. The variables being studied are bt, at, and a3m. Friedman’s test on Stata is elaborated from the command friedman. In order for this command to be used, first, we must type:
and install it on the link snp2 from http://www.stata.com/stb/stb3.
Elaborating Friedman’s test on Stata requires that the data be transposed. However, before transposing them, we can preserve the original dataset, typing preserve.
After that, let’s type the command xpose that transposes all the variables into observations and all the observations into variables:
After the command xpose, we can see that the data were transformed into n variables (number of initial observations). Let’s now type the following command:
since the current dataset now contains 15 variables after the transposition. Through Fig. 10.56, we can verify that Friedman’s statistic on Stata (25.233) is calculated from Expression (10.15), without the correction factor. The probability associated to the statistic is 0.000 (the null hypothesis is rejected), which allows us to conclude, with a 95% confidence level, that there are differences between the treatments. To restore the original dataset, we must type restore.