10.6 Tests for k Independent Samples

These tests aim to assess if k independent samples come from the same population. Among the most common tests for more than two independent samples, we have the χ2 test for nominal or ordinal variables and the Kruskal-Wallis test for ordinal variables.

10.6.1 The χ2 Test for k Independent Samples

While in Section 10.2.2, the χ2 test was applied to a single sample, in Section 10.4.1 this test was applied to two independent samples. In both cases, the variable(s) is(are) qualitative (nominal or ordinal). The χ2 test for k independent samples (k ≥ 3) is a direct extension of the test for two independent samples.

The data are made available in a contingency table I × J. While the rows represent the different categories of a certain variable, the columns represent the different groups. The null hypothesis of the test is that the frequencies or proportions in each one of the categories of the variable analyzed are the same in each group, so:

  • H0: there is no significant difference among the k groups
  • H1: there is a significant difference among the k groups

The chi-square statistic is given by Expression (10.10), which is not shown here again.

Example 10.13

Applying the χ2 Test to k Independent Samples

A company would like to assess whether the productivity of its employees depends or not on their work shift. In order to do that, it collects data on the productivity (low, average, and high) of all employees in each shift. These data can be found in Table 10.E.19. Test the hypothesis that the groups come from the same population, considering a significance level of 5%.

Table 10.E.19

Frequency of Answers Per Shift (the Expected Values Are in Parentheses)
ProductivityShift 1Shift 2Shift 3Shift 4Total
Low50 (59.3)60 (51.9)40 (44.4)50 (44.4)200 (200)
Average80 (97.8)90 (85.6)80 (73.3)80 (73.3)330 (330)
High270 (243.0)200 (212.6)180 (182.2)170 (182.2)820 (820)
Total400 (400)350 (350)300 (300)300 (300)1350 (1350)

Unlabelled Table

Solution

  • Step 1: The most suitable test to compare k independent samples (k ≥ 3), in the case of qualitative data in nominal or ordinal scale, is the χ2 test for k independent samples.
  • Step 2: Through the null hypothesis, the frequency of individuals in each one of the categories of productivity level is the same for each shift, so:

H0: there is no significant difference in productivity among the four shifts

H1: there is a significant difference in productivity among the four shifts

  • Step 3: The significance level to be considered is 5%.
  • Step 4: The calculation of the χ2 statistic is given by:

χcal2=5059.3259.3+6051.9251.9+4044.4244.4+5044.4244.4+8097.8297.8+9085.6285.6+8073.3273.3+8073.3273.3+270243.02243.0+200212.62212.6+180182.22182.2+170182.22182.2=13.143

si48_e

  • Step 5: The critical region (CR) of the χ2 distribution (Table D in the Appendix), considering α = 5% and ν = (3 − 1) ∙ (4 − 1) = 6 degrees of freedom, is shown in Fig. 10.57.
    Fig. 10.57
    Fig. 10.57 Critical region of Example 10.13.
  • Step 6: Decision: since the value calculated is in the critical region, that is, χcal2 > 12.592, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in productivity among the four shifts.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

  • Step 5: According to Table D in the Appendix, the probability associated to the statistic χcal2 = 13.143, for ν = 6 degrees of freedom, is between 0.05 and 0.025.
  • Step 6: Decision: since P < 0.05, we reject H0.

10.6.1.1 Solving the χ2 Test for k Independent Samples on SPSS

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data from Example 10.13 are available in the file Chi-Square_k_Independent_Samples.sav. Let’s click on Analyze → Descriptive Statistics → Crosstabs … After that, we should insert the variable Productivity in Row(s) and the variable Shift in Column(s), as shown in Fig. 10.58.

Fig. 10.58
Fig. 10.58 Selecting the variables.

In Statistics …, let’s select the option Chi-square, as shown in Fig. 10.59. If we wish to obtain the observed and expected frequency distribution table, in Cells …, we must select the options Observed and Expected in Counts, as shown in Fig. 10.60. Finally, let’s click on Continue and OK. The results can be seen in Figs. 10.61 and 10.62.

Fig. 10.59
Fig. 10.59 Selecting the χ2 statistic.
Fig. 10.60
Fig. 10.60 Selecting the observed and expected frequencies distribution table.
Fig. 10.61
Fig. 10.61 Distribution of the observed and expected frequencies.
Fig. 10.62
Fig. 10.62 Results of the χ2 test for Example 10.13 on SPSS.

From Fig. 10.62, we can see that the value of χ2 is 13.143, similar to the one calculated in Example 10.13. For a confidence level of 95%, since P = 0.041 < 0.05 (we saw in Example 10.13 that this probability is between 0.025 and 0.05), we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in productivity among the four shifts.

10.6.1.2 Solving the χ2 Test for k Independent Samples on Stata

The use of the images presented in this section has been authorized by Stata Corp LP©.

The data in Example 10.13 are available in the file Chi-Square_k_Independent_Samples.dta. The variables being studied are productivity and shift. The syntax of the χ2 test for k independent samples is similar to the one presented in Section 10.4.1 for two independent samples. Thus, we must use the command tabulate, or simply tab, followed by the name of the variables being studied, besides the option chi2, or simply ch. The difference is that, in this case, the categorical variable that represents the groups has more than two categories. Therefore, the syntax of the test for the data in Example 10.13 is:

tabulate productivity shift, chi2

or simply:

tab productivity shift, ch

The results can be seen in Fig. 10.63. The value of the χ2 statistic as well as the probability associated to it is similar to the results presented in Example 10.13, and also generated on SPSS.

Fig. 10.63
Fig. 10.63 Results of the χ2 test for Example 10.13 on Stata.

10.6.2 Kruskal-Wallis Test

The Kruskal-Wallis test aims at verifying if k independent samples (k > 2) come from the same population. It is an alternative to the analysis of variance when the hypotheses of data normality and equality of variances are violated, or when the sample is small, or even when the variable is measured in an ordinal scale. For k = 2, the Kruskal-Wallis test is equivalent to the Mann-Whitney test.

The data are represented in a table with double entry with N rows and k columns, in which the rows represent the observations and the columns represent the different samples or groups.

The null hypothesis of the Kruskal-Wallis test assumes that all k samples come from the same population or from identical populations with the same median (μ). For a bilateral test, we have:

  • H0: μ1 = μ2 = … = μk
  • H1: ∃(i,j) μi ≠ μj, i ≠ j

In the Kruskal-Wallis test, all N observations (N is the total number of observations in the global sample) are organized in a single series, and we attribute ranks to each element in the series. Thus, position 1 is attributed to the lowest observation in the global sample, position 2 to the second lowest observation, and so on, and so forth, up to position N. If there are ties, we attribute the mean of the corresponding ranks. The Kruskal-Wallis statistic (H) is given by:

Hcal=12NN+1j=1kRj2nj3N+1

si49_e  (10.17)

where:

  • k: the number of samples or groups;
  • nj: the number of observations in the sample or group j;
  • N: the number of observations in the global sample;
  • Rj: sum of the ranks in the sample or group j.

However, according to Siegel and Castellan (2006), whenever there are ties between two or more ranks, regardless of the group, the Kruskal-Wallis statistic must be corrected in a way that considers the changes in the sample distribution, so:

Hcal=H1j=1gtj3tjN3N

si50_e  (10.18)

where:

  • g: the number of clusters with different tied ranks;
  • tj: the number of tied ranks in the jth cluster.

According to Siegel and Castellan (2006), the main objective for correcting these ties is to increase the value of H, making the result more significant.

The value calculated must be compared to the critical value of the sample distribution. If k = 3 and n1, n2, n3 ≤ 5, we must use Table L in the Appendix, which shows the critical values of the Kruskal-Wallis statistic (Hc), whereP(Hcal > Hc) = α (for a right-tailed unilateral test). Otherwise, the sample distribution can be approximated by the χ2 distribution with ν = k − 1 degrees of freedom.

Therefore, if the value of the Hcal statistic is in the critical region, that is, if Hcal > Hc for k = 3 and n1, n2, n3 ≤ 5, or Hcal > χc2 for other values, the null hypothesis is rejected, which allows us to conclude that there is no difference between the samples. Otherwise, we do not reject H0.

Example 10.14

Applying the Kruskal-Wallis Test

A group of 36 patients with the same level of stress was submitted to three different treatments, that is, 12 patients were submitted to treatment A, 12 patients to treatment B, and the remaining 12 to treatment C. At the end of the treatment, each patient answered a questionnaire that evaluates a person’s stress level, which is classified in three phases: the resistance phase, for those who got a maximum of three points, the warning phase, for those who got more than 6 points, and the exhaustion phase, for those who got more than 8 points. The results can be seen in Table 10.E.20. Verify if the three treatments lead to the same results. Consider a significance level of 1%.

Table 10.E.20

Stress Level After the Treatment
Treatment A654534524352
Treatment B675878698688
Treatment C59879117891078

Unlabelled Table

Solution

  • Step 1: Since the variable is measured in an ordinal scale, the most suitable test to verify if the three independent samples are drawn from the same population is the Kruskal-Wallis test.
  • Step 2: Through the null hypothesis, there is no difference among the treatments. Through the alternative hypothesis, there is a difference between at least two treatments, so:

H0: μ1 = μ2 = μ3

H1: ∃(i,j) μi ≠ μj, i ≠ j

  • Step 3: The significance level to be considered is 1%.
  • Step 4: In order to calculate the Kruskal-Wallis statistic, first of all, we must attribute ranks from 1 to 36 to each element in the global sample, as shown in Table 10.E.21. In case of ties, we attribute the mean of the corresponding ranks.

Table 10.E.21

Attributing Ranks
SumMean
A15.510.5610.53.5610.51.563.510.51.585.57.13
B15.52010.526.52026.515.532.526.515.526.526.526221.83
C10.532.526.52032.5362026.532.5352026.5318.526.54

Unlabelled Table

Since there are ties, the Kruskal-Wallis statistic is calculated from Expression (10.18). First of all, we calculate the value of H:

Hcal=12NN+1j=1kRj2nj3N+1=12363785.52+2622+318.5212337Hcal=22.181

si51_e

From Tables 10.E.20 and 10.E.21, we can verify that there are eight tied groups. For example, there are two groups with 2 points (with a rank of 1.5), two groups with 3 points (with a rank of 3.5), three groups with 4 points (with a rank of 6) and, thus, successively, up to four groups with 9 points (with a rank of 32.5). The Kruskal-Wallis statistic is corrected to:

Hcal'=H1j=1gtj3tjN3N=22.1811232+232+333++43436336=22.662

si52_e

  • Step 5: Since n1, n2, n3 > 5, let’s use the χ2 distribution. The critical region (CR) of the χ2 distribution (Table D in the Appendix), considering α = 1% and ν = k − 1 = 2 degrees of freedom, is shown in Fig. 10.64.
    Fig. 10.64
    Fig. 10.64 Critical region of Example 10.14.
  • Step 6: Decision: since the value calculated is in the critical region, that is, Hcal > 9.210, we must reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is a difference among the treatments.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

  • Step 5: According to Table D in the Appendix, for ν = 2 degrees of freedom, the probability associated to the statistic Hcal = 22.662 is less than 0.005 (P < 0.005).
  • Step 6: Decision: since P < 0.01, we reject H0.

10.6.2.1 Solving the Kruskal-Wallis Test by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.14 are available in the file Kruskal-Wallis_Test.sav. In order to elaborate the Kruskal-Wallis test on SPSS, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → K Independent Samples …, as shown in Fig. 10.65.

Fig. 10.65
Fig. 10.65 Procedure for elaborating the Kruskal-Wallis test on SPSS.

After that, we should insert the variable Result in the box Test Variable List, define the groups of the variable Treatment and select the Kruskal-Wallis test, as shown in Fig. 10.66.

Fig. 10.66
Fig. 10.66 Selecting the variable and defining the groups for the Kruskal-Wallis test.

Let’s click on OK to obtain the results of the Kruskal-Wallis test. Fig. 10.67 shows the mean of the ranks for each group, similar to the values calculated in Table 10.E.21.

Fig. 10.67
Fig. 10.67 Ranks.

The value of the Kruskal-Wallis statistic and the significance level of the test are in Fig. 10.68.

Fig. 10.68
Fig. 10.68 Results of the Kruskal-Wallis test for Example 10.14 on SPSS.

The value of the test is 22.662, similar to the value calculated in Example 10.14. The probability associated to the statistic is 0.000 (we saw in Example 10.14 that this probability is less than 0.005). Since P < 0.01, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is a difference among the treatments.

10.6.2.2 Solving the Kruskal-Wallis Test by Using Stata

The use of the images presented in this section has been authorized by Stata Corp LP©.

On Stata, the Kruskal-Wallis test is elaborated through the command kwallis, using the following syntax:

kwallis variable⁎, by(groups⁎)

where the term variable⁎ must be replaced by the quantitative or ordinal variable being studied and the term groups⁎ by the categorical variable that represents the groups.

Let’s open the file Kruskal-Wallis_Test.dta that contains the data from Example 10.14. All three groups are represented by the variable treatment and the characteristic analyzed by the variable result. Thus, the command to be typed is:

kwallis result, by(treatment)

The result of the test can be seen in Fig. 10.69. Analogous to the results presented in Example 10.14 and generated on SPSS, Stata calculates the original value of the statistic (22.181) and with the correction factor whenever there are ties (22.662). Since the probability associated to the statistic is 0.000, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is no difference among the treatments.

Fig. 10.69
Fig. 10.69 Results of the Kruskal-Wallis test for Example 10.14 on Stata.

10.7 Final Remarks

In the previous chapter, we studied parametric tests. This chapter, however, was totally dedicated to the study of nonparametric tests.

Nonparametric tests are classified according to the variables’ level of measurement and to the sample size. So, for each situation, the main types of nonparametric tests were studied. In addition, the advantages and disadvantages of each test as well as their assumptions were also established.

For each nonparametric test, the main inherent concepts, the null and alternative hypotheses, the respective statistics, and the solution of the examples proposed on SPSS and on Stata were presented. Whatever the main objective for their application, nonparametric tests can provide the collection of good and interesting research results that will be useful in any decision-making process. The correct use of each test, from a conscious choice of the modeling software, must always be done based on the underlying theory, without ignoring the researcher’s experience and intuition.

10.8 Exercises

  1. (1) In what situations are nonparametric tests applied?
  1. (2) What are the advantages and disadvantages of nonparametric tests?
  1. (3) What are the differences between the sign test and the Wilcoxon test for two paired samples?
  1. (4) Which test is an alternative to the t-test for one sample when the data distribution does not follow a normaldistribution?
  1. (5) A group of 20 consumers tasted two types of coffee (A and B). At the end, they chose one of the brands, as shown in the table. Test the null hypothesis that there is no difference in these consumers’ preference, with a significance level of 5%.
EventsBrand ABrand BTotal
Frequency81220
Proportion0.400.601.00

Unlabelled Table

  1. (6) A group of 60 readers evaluated three novels and, at the end, they chose one of the three options, as shown in the table. Test the null hypothesis that there is no difference in these readers’ preference, with a significance level of 5%.
EventsBook ABook BBook CTotal
Frequency29151660
Proportion0.4830.2500.2671.00

Unlabelled Table

  1. (7) A group of 20 teenagers went on the Points Diet for 30 days. Check and see if there was weight loss after the diet. Assume that α = 5%.
BeforeAfter
5856
6762
7265
8884
7772
6768
7576
6962
10497
6665
5859
5960
6162
6763
7365
5858
6762
6764
7872
8580
  1. (8) Aiming to compare the average service times in two bank branches, data on 22 clients from each bank branch were collected, as shown in the table. Use the most suitable test, with a significance level of 5%, to test whether both samples come or do not come from populations with the same medians.
Bank Branch ABank Branch B
6.248.14
8.476.54
6.546.66
6.877.85
2.248.03
5.365.68
7.093.05
7.565.78
6.886.43
8.046.39
7.057.64
6.586.97
8.148.07
8.308.33
2.697.14
6.146.58
7.145.98
7.226.22
7.587.08
6.117.62
7.255.69
7.58.04
  1. (9) A group of 20 Business Administration students evaluated their level of learning based on three subjects studied in the field of Applied Quantitative Methods, by answering if their level of learning was high (1) or low (0). The results can be seen in the table. Check and see if the proportion of students with a high level of learning is the same for each subject. Consider a significance level of 2.5%.
StudentABC
1011
2111
3000
4010
5011
6111
7101
8011
9000
10000
11111
12001
13101
14011
15001
16111
17001
18111
19011
20111

Unlabelled Table

  1. (10) A group of 15 consumers evaluated their level of satisfaction (1—somewhat dissatisfied, 2—somewhat satisfied, and 3—very satisfied) with three different bank services. The results can be seen in the table. Verify if there is a difference between the three services. Assume a significance level of 5%.
ConsumerABC
1323
2222
3121
4322
5111
6321
7332
8221
9322
10211
11112
12311
13321
14212
15312

Unlabelled Table

References

Fávero L.P., Belfiore P., Silva F.L., Chan B.L. Análise de dados: modelagem multivariada para tomada de decisões. Rio de Janeiro: Campus Elsevier; 2009.

Maroco J. Análise estatística com o SPSS Statistics. sixth ed. Lisboa: Edições Sílabo; 2014.

Siegel S., Castellan Jr. N.J. Estatística não-paramétrica para ciências do comportamento. second ed. Porto Alegre: Bookman; 2006.


"To view the full reference list for the book, click here"

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset