These tests aim to assess if k independent samples come from the same population. Among the most common tests for more than two independent samples, we have the χ2 test for nominal or ordinal variables and the Kruskal-Wallis test for ordinal variables.
While in Section 10.2.2, the χ2 test was applied to a single sample, in Section 10.4.1 this test was applied to two independent samples. In both cases, the variable(s) is(are) qualitative (nominal or ordinal). The χ2 test for k independent samples (k ≥ 3) is a direct extension of the test for two independent samples.
The data are made available in a contingency table I × J. While the rows represent the different categories of a certain variable, the columns represent the different groups. The null hypothesis of the test is that the frequencies or proportions in each one of the categories of the variable analyzed are the same in each group, so:
The chi-square statistic is given by Expression (10.10), which is not shown here again.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data from Example 10.13 are available in the file Chi-Square_k_Independent_Samples.sav. Let’s click on Analyze → Descriptive Statistics → Crosstabs … After that, we should insert the variable Productivity in Row(s) and the variable Shift in Column(s), as shown in Fig. 10.58.
In Statistics …, let’s select the option Chi-square, as shown in Fig. 10.59. If we wish to obtain the observed and expected frequency distribution table, in Cells …, we must select the options Observed and Expected in Counts, as shown in Fig. 10.60. Finally, let’s click on Continue and OK. The results can be seen in Figs. 10.61 and 10.62.
From Fig. 10.62, we can see that the value of χ2 is 13.143, similar to the one calculated in Example 10.13. For a confidence level of 95%, since P = 0.041 < 0.05 (we saw in Example 10.13 that this probability is between 0.025 and 0.05), we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in productivity among the four shifts.
The use of the images presented in this section has been authorized by Stata Corp LP©.
The data in Example 10.13 are available in the file Chi-Square_k_Independent_Samples.dta. The variables being studied are productivity and shift. The syntax of the χ2 test for k independent samples is similar to the one presented in Section 10.4.1 for two independent samples. Thus, we must use the command tabulate, or simply tab, followed by the name of the variables being studied, besides the option chi2, or simply ch. The difference is that, in this case, the categorical variable that represents the groups has more than two categories. Therefore, the syntax of the test for the data in Example 10.13 is:
tabulate productivity shift, chi2
or simply:
The results can be seen in Fig. 10.63. The value of the χ2 statistic as well as the probability associated to it is similar to the results presented in Example 10.13, and also generated on SPSS.
The Kruskal-Wallis test aims at verifying if k independent samples (k > 2) come from the same population. It is an alternative to the analysis of variance when the hypotheses of data normality and equality of variances are violated, or when the sample is small, or even when the variable is measured in an ordinal scale. For k = 2, the Kruskal-Wallis test is equivalent to the Mann-Whitney test.
The data are represented in a table with double entry with N rows and k columns, in which the rows represent the observations and the columns represent the different samples or groups.
The null hypothesis of the Kruskal-Wallis test assumes that all k samples come from the same population or from identical populations with the same median (μ). For a bilateral test, we have:
In the Kruskal-Wallis test, all N observations (N is the total number of observations in the global sample) are organized in a single series, and we attribute ranks to each element in the series. Thus, position 1 is attributed to the lowest observation in the global sample, position 2 to the second lowest observation, and so on, and so forth, up to position N. If there are ties, we attribute the mean of the corresponding ranks. The Kruskal-Wallis statistic (H) is given by:
where:
However, according to Siegel and Castellan (2006), whenever there are ties between two or more ranks, regardless of the group, the Kruskal-Wallis statistic must be corrected in a way that considers the changes in the sample distribution, so:
where:
According to Siegel and Castellan (2006), the main objective for correcting these ties is to increase the value of H, making the result more significant.
The value calculated must be compared to the critical value of the sample distribution. If k = 3 and n1, n2, n3 ≤ 5, we must use Table L in the Appendix, which shows the critical values of the Kruskal-Wallis statistic (Hc), whereP(Hcal > Hc) = α (for a right-tailed unilateral test). Otherwise, the sample distribution can be approximated by the χ2 distribution with ν = k − 1 degrees of freedom.
Therefore, if the value of the Hcal statistic is in the critical region, that is, if Hcal > Hc for k = 3 and n1, n2, n3 ≤ 5, or Hcal > χc2 for other values, the null hypothesis is rejected, which allows us to conclude that there is no difference between the samples. Otherwise, we do not reject H0.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
The data in Example 10.14 are available in the file Kruskal-Wallis_Test.sav. In order to elaborate the Kruskal-Wallis test on SPSS, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → K Independent Samples …, as shown in Fig. 10.65.
After that, we should insert the variable Result in the box Test Variable List, define the groups of the variable Treatment and select the Kruskal-Wallis test, as shown in Fig. 10.66.
Let’s click on OK to obtain the results of the Kruskal-Wallis test. Fig. 10.67 shows the mean of the ranks for each group, similar to the values calculated in Table 10.E.21.
The value of the Kruskal-Wallis statistic and the significance level of the test are in Fig. 10.68.
The value of the test is 22.662, similar to the value calculated in Example 10.14. The probability associated to the statistic is 0.000 (we saw in Example 10.14 that this probability is less than 0.005). Since P < 0.01, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is a difference among the treatments.
The use of the images presented in this section has been authorized by Stata Corp LP©.
On Stata, the Kruskal-Wallis test is elaborated through the command kwallis, using the following syntax:
kwallis variable⁎, by(groups⁎)
where the term variable⁎ must be replaced by the quantitative or ordinal variable being studied and the term groups⁎ by the categorical variable that represents the groups.
Let’s open the file Kruskal-Wallis_Test.dta that contains the data from Example 10.14. All three groups are represented by the variable treatment and the characteristic analyzed by the variable result. Thus, the command to be typed is:
The result of the test can be seen in Fig. 10.69. Analogous to the results presented in Example 10.14 and generated on SPSS, Stata calculates the original value of the statistic (22.181) and with the correction factor whenever there are ties (22.662). Since the probability associated to the statistic is 0.000, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is no difference among the treatments.
In the previous chapter, we studied parametric tests. This chapter, however, was totally dedicated to the study of nonparametric tests.
Nonparametric tests are classified according to the variables’ level of measurement and to the sample size. So, for each situation, the main types of nonparametric tests were studied. In addition, the advantages and disadvantages of each test as well as their assumptions were also established.
For each nonparametric test, the main inherent concepts, the null and alternative hypotheses, the respective statistics, and the solution of the examples proposed on SPSS and on Stata were presented. Whatever the main objective for their application, nonparametric tests can provide the collection of good and interesting research results that will be useful in any decision-making process. The correct use of each test, from a conscious choice of the modeling software, must always be done based on the underlying theory, without ignoring the researcher’s experience and intuition.
Bank Branch A | Bank Branch B |
---|---|
6.24 | 8.14 |
8.47 | 6.54 |
6.54 | 6.66 |
6.87 | 7.85 |
2.24 | 8.03 |
5.36 | 5.68 |
7.09 | 3.05 |
7.56 | 5.78 |
6.88 | 6.43 |
8.04 | 6.39 |
7.05 | 7.64 |
6.58 | 6.97 |
8.14 | 8.07 |
8.30 | 8.33 |
2.69 | 7.14 |
6.14 | 6.58 |
7.14 | 5.98 |
7.22 | 6.22 |
7.58 | 7.08 |
6.11 | 7.62 |
7.25 | 5.69 |
7.5 | 8.04 |
Student | A | B | C |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 1 | 1 |
3 | 0 | 0 | 0 |
4 | 0 | 1 | 0 |
5 | 0 | 1 | 1 |
6 | 1 | 1 | 1 |
7 | 1 | 0 | 1 |
8 | 0 | 1 | 1 |
9 | 0 | 0 | 0 |
10 | 0 | 0 | 0 |
11 | 1 | 1 | 1 |
12 | 0 | 0 | 1 |
13 | 1 | 0 | 1 |
14 | 0 | 1 | 1 |
15 | 0 | 0 | 1 |
16 | 1 | 1 | 1 |
17 | 0 | 0 | 1 |
18 | 1 | 1 | 1 |
19 | 0 | 1 | 1 |
20 | 1 | 1 | 1 |