10.6 Tests for k Independent Samples

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10.6 Tests for k Independent Samples

These tests aim to assess if k independent samples come from the same population. Among the most common tests for more than two independent samples, we have the χ² test for nominal or ordinal variables and the Kruskal-Wallis test for ordinal variables.

10.6.1 The χ² Test for k Independent Samples

While in Section 10.2.2, the χ² test was applied to a single sample, in Section 10.4.1 this test was applied to two independent samples. In both cases, the variable(s) is(are) qualitative (nominal or ordinal). The χ² test for k independent samples (k ≥ 3) is a direct extension of the test for two independent samples.

The data are made available in a contingency table I × J. While the rows represent the different categories of a certain variable, the columns represent the different groups. The null hypothesis of the test is that the frequencies or proportions in each one of the categories of the variable analyzed are the same in each group, so:

H₀: there is no significant difference among the k groups
H₁: there is a significant difference among the k groups

The chi-square statistic is given by Expression (10.10), which is not shown here again.

Example 10.13

Applying the χ² Test to k Independent Samples

A company would like to assess whether the productivity of its employees depends or not on their work shift. In order to do that, it collects data on the productivity (low, average, and high) of all employees in each shift. These data can be found in Table 10.E.19. Test the hypothesis that the groups come from the same population, considering a significance level of 5%.

Table 10.E.19

Frequency of Answers Per Shift (the Expected Values Are in Parentheses)
Productivity	Shift 1	Shift 2	Shift 3	Shift 4	Total
Low	50 (59.3)	60 (51.9)	40 (44.4)	50 (44.4)	200 (200)
Average	80 (97.8)	90 (85.6)	80 (73.3)	80 (73.3)	330 (330)
High	270 (243.0)	200 (212.6)	180 (182.2)	170 (182.2)	820 (820)
Total	400 (400)	350 (350)	300 (300)	300 (300)	1350 (1350)

Unlabelled Table

Solution

Step 1: The most suitable test to compare k independent samples (k ≥ 3), in the case of qualitative data in nominal or ordinal scale, is the χ² test for k independent samples.

Step 2: Through the null hypothesis, the frequency of individuals in each one of the categories of productivity level is the same for each shift, so:

H₀: there is no significant difference in productivity among the four shifts

H₁: there is a significant difference in productivity among the four shifts

Step 3: The significance level to be considered is 5%.
Step 4: The calculation of the χ² statistic is given by:

$\begin{array}{l} χ_{cal}^{2} = \frac{{(50 - 59.3)}^{2}}{59.3} + \frac{{(60 - 51.9)}^{2}}{51.9} + \frac{{(40 - 44.4)}^{2}}{44.4} + \frac{{(50 - 44.4)}^{2}}{44.4} \\ + \frac{{(80 - 97.8)}^{2}}{97.8} + \frac{{(90 - 85.6)}^{2}}{85.6} + \frac{{(80 - 73.3)}^{2}}{73.3} + \frac{{(80 - 73.3)}^{2}}{73.3} \\ + \frac{{(270 - 243.0)}^{2}}{243.0} + \frac{{(200 - 212.6)}^{2}}{212.6} + \frac{{(180 - 182.2)}^{2}}{182.2} + \frac{{(170 - 182.2)}^{2}}{182.2} \\ = 13.143 \end{array}$ $\begin{array}{l} χ_{cal}^{2} = \frac{{(50 - 59.3)}^{2}}{59.3} + \frac{{(60 - 51.9)}^{2}}{51.9} + \frac{{(40 - 44.4)}^{2}}{44.4} + \frac{{(50 - 44.4)}^{2}}{44.4} \\ + \frac{{(80 - 97.8)}^{2}}{97.8} + \frac{{(90 - 85.6)}^{2}}{85.6} + \frac{{(80 - 73.3)}^{2}}{73.3} + \frac{{(80 - 73.3)}^{2}}{73.3} \\ + \frac{{(270 - 243.0)}^{2}}{243.0} + \frac{{(200 - 212.6)}^{2}}{212.6} + \frac{{(180 - 182.2)}^{2}}{182.2} + \frac{{(170 - 182.2)}^{2}}{182.2} \\ = 13.143 \end{array}$

si48_e

Step 5: The critical region (CR) of the χ² distribution (Table D in the Appendix), considering α = 5% and ν = (3 − 1) ∙ (4 − 1) = 6 degrees of freedom, is shown in Fig. 10.57.

Fig. 10.57 Critical region of Example 10.13.
Step 6: Decision: since the value calculated is in the critical region, that is, χ_cal² > 12.592, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in productivity among the four shifts.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table D in the Appendix, the probability associated to the statistic χ_cal² = 13.143, for ν = 6 degrees of freedom, is between 0.05 and 0.025.
Step 6: Decision: since P < 0.05, we reject H₀.

10.6.1.1 Solving the χ² Test for k Independent Samples on SPSS

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data from Example 10.13 are available in the file Chi-Square_k_Independent_Samples.sav. Let’s click on Analyze → Descriptive Statistics → Crosstabs … After that, we should insert the variable Productivity in Row(s) and the variable Shift in Column(s), as shown in Fig. 10.58.

In Statistics …, let’s select the option Chi-square, as shown in Fig. 10.59. If we wish to obtain the observed and expected frequency distribution table, in Cells …, we must select the options Observed and Expected in Counts, as shown in Fig. 10.60. Finally, let’s click on Continue and OK. The results can be seen in Figs. 10.61 and 10.62.

Fig. 10.60 Selecting the observed and expected frequencies distribution table.

Fig. 10.61 Distribution of the observed and expected frequencies.

From Fig. 10.62, we can see that the value of χ² is 13.143, similar to the one calculated in Example 10.13. For a confidence level of 95%, since P = 0.041 < 0.05 (we saw in Example 10.13 that this probability is between 0.025 and 0.05), we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is a difference in productivity among the four shifts.

10.6.1.2 Solving the χ² Test for k Independent Samples on Stata

The use of the images presented in this section has been authorized by Stata Corp LP©.

The data in Example 10.13 are available in the file Chi-Square_k_Independent_Samples.dta. The variables being studied are productivity and shift. The syntax of the χ² test for k independent samples is similar to the one presented in Section 10.4.1 for two independent samples. Thus, we must use the command tabulate, or simply tab, followed by the name of the variables being studied, besides the option chi2, or simply ch. The difference is that, in this case, the categorical variable that represents the groups has more than two categories. Therefore, the syntax of the test for the data in Example 10.13 is:

tabulate productivity shift, chi2

or simply:

tab productivity shift, ch

The results can be seen in Fig. 10.63. The value of the χ² statistic as well as the probability associated to it is similar to the results presented in Example 10.13, and also generated on SPSS.

10.6.2 Kruskal-Wallis Test

The Kruskal-Wallis test aims at verifying if k independent samples (k > 2) come from the same population. It is an alternative to the analysis of variance when the hypotheses of data normality and equality of variances are violated, or when the sample is small, or even when the variable is measured in an ordinal scale. For k = 2, the Kruskal-Wallis test is equivalent to the Mann-Whitney test.

The data are represented in a table with double entry with N rows and k columns, in which the rows represent the observations and the columns represent the different samples or groups.

The null hypothesis of the Kruskal-Wallis test assumes that all k samples come from the same population or from identical populations with the same median (μ). For a bilateral test, we have:

H₀: μ₁ = μ₂ = … = μ_k
H₁: ∃_(i,j) μ_i ≠ μ_j, i ≠ j

In the Kruskal-Wallis test, all N observations (N is the total number of observations in the global sample) are organized in a single series, and we attribute ranks to each element in the series. Thus, position 1 is attributed to the lowest observation in the global sample, position 2 to the second lowest observation, and so on, and so forth, up to position N. If there are ties, we attribute the mean of the corresponding ranks. The Kruskal-Wallis statistic (H) is given by:

$H_{cal} = \frac{12}{N \cdot (N + 1)} \cdot \sum_{j = 1}^{k} \frac{R_{j}^{2}}{n_{j}} - 3 \cdot (N + 1)$ $H_{cal} = \frac{12}{N \cdot (N + 1)} \cdot \sum_{j = 1}^{k} \frac{R_{j}^{2}}{n_{j}} - 3 \cdot (N + 1)$

si49_e (10.17)

where:

k: the number of samples or groups;
n_j: the number of observations in the sample or group j;
N: the number of observations in the global sample;
R_j: sum of the ranks in the sample or group j.

However, according to Siegel and Castellan (2006), whenever there are ties between two or more ranks, regardless of the group, the Kruskal-Wallis statistic must be corrected in a way that considers the changes in the sample distribution, so:

$H_{cal}^{'} = \frac{H}{1 - \frac{\sum_{j = 1}^{g} (t_{j}^{3} - t_{j})}{(N^{3} - N)}}$ $H_{cal}^{'} = \frac{H}{1 - \frac{\sum_{j = 1}^{g} (t_{j}^{3} - t_{j})}{(N^{3} - N)}}$

si50_e (10.18)

where:

g: the number of clusters with different tied ranks;
t_j: the number of tied ranks in the jth cluster.

According to Siegel and Castellan (2006), the main objective for correcting these ties is to increase the value of H, making the result more significant.

The value calculated must be compared to the critical value of the sample distribution. If k = 3 and n₁, n₂, n₃ ≤ 5, we must use Table L in the Appendix, which shows the critical values of the Kruskal-Wallis statistic (H_c), whereP(H_cal > H_c) = α (for a right-tailed unilateral test). Otherwise, the sample distribution can be approximated by the χ² distribution with ν = k − 1 degrees of freedom.

Therefore, if the value of the H_cal statistic is in the critical region, that is, if H_cal > H_c for k = 3 and n₁, n₂, n₃ ≤ 5, or H_cal > χ_c² for other values, the null hypothesis is rejected, which allows us to conclude that there is no difference between the samples. Otherwise, we do not reject H₀.

Example 10.14

Applying the Kruskal-Wallis Test

A group of 36 patients with the same level of stress was submitted to three different treatments, that is, 12 patients were submitted to treatment A, 12 patients to treatment B, and the remaining 12 to treatment C. At the end of the treatment, each patient answered a questionnaire that evaluates a person’s stress level, which is classified in three phases: the resistance phase, for those who got a maximum of three points, the warning phase, for those who got more than 6 points, and the exhaustion phase, for those who got more than 8 points. The results can be seen in Table 10.E.20. Verify if the three treatments lead to the same results. Consider a significance level of 1%.

Table 10.E.20

Stress Level After the Treatment
Treatment A	6	5	4	5	3	4	5	2	4	3	5	2
Treatment B	6	7	5	8	7	8	6	9	8	6	8	8
Treatment C	5	9	8	7	9	11	7	8	9	10	7	8

Unlabelled Table

Solution

Step 1: Since the variable is measured in an ordinal scale, the most suitable test to verify if the three independent samples are drawn from the same population is the Kruskal-Wallis test.

Step 2: Through the null hypothesis, there is no difference among the treatments. Through the alternative hypothesis, there is a difference between at least two treatments, so:

H₀: μ₁ = μ₂ = μ₃

H₁: ∃_(i,j) μ_i ≠ μ_j, i ≠ j

Step 3: The significance level to be considered is 1%.
Step 4: In order to calculate the Kruskal-Wallis statistic, first of all, we must attribute ranks from 1 to 36 to each element in the global sample, as shown in Table 10.E.21. In case of ties, we attribute the mean of the corresponding ranks.

Table 10.E.21

Attributing Ranks
													Sum	Mean
A	15.5	10.5	6	10.5	3.5	6	10.5	1.5	6	3.5	10.5	1.5	85.5	7.13
B	15.5	20	10.5	26.5	20	26.5	15.5	32.5	26.5	15.5	26.5	26.5	262	21.83
C	10.5	32.5	26.5	20	32.5	36	20	26.5	32.5	35	20	26.5	318.5	26.54

Unlabelled Table

Since there are ties, the Kruskal-Wallis statistic is calculated from Expression (10.18). First of all, we calculate the value of H:

$\begin{array}{l} H_{cal} = \frac{12}{N \cdot (N + 1)} \cdot \sum_{j = 1}^{k} \frac{R_{j}^{2}}{n_{j}} - 3 \cdot (N + 1) = \frac{12}{36 \cdot 37} \cdot \frac{{85.5}^{2} + 262^{2} + {318.5}^{2}}{12} - 3 \cdot 37 \\ H_{cal} = 22.181 \end{array}$ $\begin{array}{l} H_{cal} = \frac{12}{N \cdot (N + 1)} \cdot \sum_{j = 1}^{k} \frac{R_{j}^{2}}{n_{j}} - 3 \cdot (N + 1) = \frac{12}{36 \cdot 37} \cdot \frac{{85.5}^{2} + 262^{2} + {318.5}^{2}}{12} - 3 \cdot 37 \\ H_{cal} = 22.181 \end{array}$

si51_e

From Tables 10.E.20 and 10.E.21, we can verify that there are eight tied groups. For example, there are two groups with 2 points (with a rank of 1.5), two groups with 3 points (with a rank of 3.5), three groups with 4 points (with a rank of 6) and, thus, successively, up to four groups with 9 points (with a rank of 32.5). The Kruskal-Wallis statistic is corrected to:

$H_{cal}^{'} = \frac{H}{1 - \frac{\sum_{j = 1}^{g} (t_{j}^{3} - t_{j})}{(N^{3} - N)}} = \frac{22.181}{1 - \frac{(2^{3} - 2) + (2^{3} - 2) + (3^{3} - 3) + \dots + (4^{3} - 4)}{(36^{3} - 36)}} = 22.662$ $H_{cal}^{'} = \frac{H}{1 - \frac{\sum_{j = 1}^{g} (t_{j}^{3} - t_{j})}{(N^{3} - N)}} = \frac{22.181}{1 - \frac{(2^{3} - 2) + (2^{3} - 2) + (3^{3} - 3) + \dots + (4^{3} - 4)}{(36^{3} - 36)}} = 22.662$

si52_e

Step 5: Since n₁, n₂, n₃ > 5, let’s use the χ² distribution. The critical region (CR) of the χ² distribution (Table D in the Appendix), considering α = 1% and ν = k − 1 = 2 degrees of freedom, is shown in Fig. 10.64.

Fig. 10.64 Critical region of Example 10.14.
Step 6: Decision: since the value calculated is in the critical region, that is, H′_cal > 9.210, we must reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is a difference among the treatments.

If we use P-value instead of the critical value of the statistic, Steps 5 and 6 will be:

Step 5: According to Table D in the Appendix, for ν = 2 degrees of freedom, the probability associated to the statistic H′_cal = 22.662 is less than 0.005 (P < 0.005).
Step 6: Decision: since P < 0.01, we reject H₀.

10.6.2.1 Solving the Kruskal-Wallis Test by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

The data in Example 10.14 are available in the file Kruskal-Wallis_Test.sav. In order to elaborate the Kruskal-Wallis test on SPSS, let’s click on Analyze → Nonparametric Tests → Legacy Dialogs → K Independent Samples …, as shown in Fig. 10.65.

Fig. 10.65 Procedure for elaborating the Kruskal-Wallis test on SPSS.

After that, we should insert the variable Result in the box Test Variable List, define the groups of the variable Treatment and select the Kruskal-Wallis test, as shown in Fig. 10.66.

Fig. 10.66 Selecting the variable and defining the groups for the Kruskal-Wallis test.

Let’s click on OK to obtain the results of the Kruskal-Wallis test. Fig. 10.67 shows the mean of the ranks for each group, similar to the values calculated in Table 10.E.21.

The value of the Kruskal-Wallis statistic and the significance level of the test are in Fig. 10.68.

The value of the test is 22.662, similar to the value calculated in Example 10.14. The probability associated to the statistic is 0.000 (we saw in Example 10.14 that this probability is less than 0.005). Since P < 0.01, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is a difference among the treatments.

10.6.2.2 Solving the Kruskal-Wallis Test by Using Stata

On Stata, the Kruskal-Wallis test is elaborated through the command kwallis, using the following syntax:

kwallis variable⁎, by(groups⁎)

where the term variable⁎ must be replaced by the quantitative or ordinal variable being studied and the term groups⁎ by the categorical variable that represents the groups.

Let’s open the file Kruskal-Wallis_Test.dta that contains the data from Example 10.14. All three groups are represented by the variable treatment and the characteristic analyzed by the variable result. Thus, the command to be typed is:

kwallis result, by(treatment)

The result of the test can be seen in Fig. 10.69. Analogous to the results presented in Example 10.14 and generated on SPSS, Stata calculates the original value of the statistic (22.181) and with the correction factor whenever there are ties (22.662). Since the probability associated to the statistic is 0.000, we reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there is no difference among the treatments.

10.7 Final Remarks

In the previous chapter, we studied parametric tests. This chapter, however, was totally dedicated to the study of nonparametric tests.

Nonparametric tests are classified according to the variables’ level of measurement and to the sample size. So, for each situation, the main types of nonparametric tests were studied. In addition, the advantages and disadvantages of each test as well as their assumptions were also established.

For each nonparametric test, the main inherent concepts, the null and alternative hypotheses, the respective statistics, and the solution of the examples proposed on SPSS and on Stata were presented. Whatever the main objective for their application, nonparametric tests can provide the collection of good and interesting research results that will be useful in any decision-making process. The correct use of each test, from a conscious choice of the modeling software, must always be done based on the underlying theory, without ignoring the researcher’s experience and intuition.

10.8 Exercises

(1) In what situations are nonparametric tests applied?

(2) What are the advantages and disadvantages of nonparametric tests?

(3) What are the differences between the sign test and the Wilcoxon test for two paired samples?

(4) Which test is an alternative to the t-test for one sample when the data distribution does not follow a normaldistribution?

(5) A group of 20 consumers tasted two types of coffee (A and B). At the end, they chose one of the brands, as shown in the table. Test the null hypothesis that there is no difference in these consumers’ preference, with a significance level of 5%.

Events	Brand A	Brand B	Total
Frequency	8	12	20
Proportion	0.40	0.60	1.00

Unlabelled Table

(6) A group of 60 readers evaluated three novels and, at the end, they chose one of the three options, as shown in the table. Test the null hypothesis that there is no difference in these readers’ preference, with a significance level of 5%.

Events	Book A	Book B	Book C	Total
Frequency	29	15	16	60
Proportion	0.483	0.250	0.267	1.00

Unlabelled Table

(7) A group of 20 teenagers went on the Points Diet for 30 days. Check and see if there was weight loss after the diet. Assume that α = 5%.

Before	After
58	56
67	62
72	65
88	84
77	72
67	68
75	76
69	62
104	97
66	65
58	59
59	60
61	62
67	63
73	65
58	58
67	62
67	64
78	72
85	80

(8) Aiming to compare the average service times in two bank branches, data on 22 clients from each bank branch were collected, as shown in the table. Use the most suitable test, with a significance level of 5%, to test whether both samples come or do not come from populations with the same medians.

Bank Branch A	Bank Branch B
6.24	8.14
8.47	6.54
6.54	6.66
6.87	7.85
2.24	8.03
5.36	5.68
7.09	3.05
7.56	5.78
6.88	6.43
8.04	6.39
7.05	7.64
6.58	6.97
8.14	8.07
8.30	8.33
2.69	7.14
6.14	6.58
7.14	5.98
7.22	6.22
7.58	7.08
6.11	7.62
7.25	5.69
7.5	8.04

(9) A group of 20 Business Administration students evaluated their level of learning based on three subjects studied in the field of Applied Quantitative Methods, by answering if their level of learning was high (1) or low (0). The results can be seen in the table. Check and see if the proportion of students with a high level of learning is the same for each subject. Consider a significance level of 2.5%.

Student	A	B	C
1	0	1	1
2	1	1	1
3	0	0	0
4	0	1	0
5	0	1	1
6	1	1	1
7	1	0	1
8	0	1	1
9	0	0	0
10	0	0	0
11	1	1	1
12	0	0	1
13	1	0	1
14	0	1	1
15	0	0	1
16	1	1	1
17	0	0	1
18	1	1	1
19	0	1	1
20	1	1	1

Unlabelled Table

(10) A group of 15 consumers evaluated their level of satisfaction (1—somewhat dissatisfied, 2—somewhat satisfied, and 3—very satisfied) with three different bank services. The results can be seen in the table. Verify if there is a difference between the three services. Assume a significance level of 5%.

Consumer	A	B	C
1	3	2	3
2	2	2	2
3	1	2	1
4	3	2	2
5	1	1	1
6	3	2	1
7	3	3	2
8	2	2	1
9	3	2	2
10	2	1	1
11	1	1	2
12	3	1	1
13	3	2	1
14	2	1	2
15	3	1	2

Unlabelled Table

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10.6 Tests for k Independent Samples

Create new playlist

Sign In

Sign Up

10.6 Tests for k Independent Samples

10.6.1 The χ2 Test for k Independent Samples

10.6.1.1 Solving the χ2 Test for k Independent Samples on SPSS

10.6.1.2 Solving the χ2 Test for k Independent Samples on Stata

10.6.2 Kruskal-Wallis Test

10.6.2.1 Solving the Kruskal-Wallis Test by Using SPSS Software

10.6.2.2 Solving the Kruskal-Wallis Test by Using Stata

10.7 Final Remarks

10.8 Exercises

Table of Contents for
10.6 Tests for k Independent Samples

10.6.1 The χ² Test for k Independent Samples

10.6.1.1 Solving the χ² Test for k Independent Samples on SPSS

10.6.1.2 Solving the χ² Test for k Independent Samples on Stata