Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9

Hypotheses Tests

Abstract

This chapter discusses how hypotheses tests are inserted in statistical inference. The concept of hypotheses tests and their goals is presented here, as well as the procedures for constructing them. Hypotheses tests are classified as parametric and nonparametric, and this chapter focuses mainly on parametric tests (nonparametric tests will be discussed in the following chapter). We define the concepts and assumptions of parametric tests, in addition to their respective advantages and disadvantages. We will study the main types of parametric hypotheses tests and the inherent assumptions, including tests for normality, homogeneity of variance tests, Student’s t-test and its applications, besides the ANOVA and its extensions. Thus, it is possible to know when to use each one of the parametric tests. Each test is solved analytically and also through IBM SPSS Statistics Software and Stata Statistical Software. Then, the results obtained are interpreted.

Keywords

Hypotheses tests; Parametric tests; Normality tests; Homogeneity of variance tests; Student’s t-test; ANOVA

We must conduct research and then accept the results. If they don’t stand up to experimentation, Buddha’s own words must be rejected.

Tenzin Gyatso, 14th Dalai Lama

9.1 Introduction

As discussed previously, one of the problems to be solved by statistical inference is hypotheses testing. A statistical hypothesis is an assumption about a certain population parameter, such as, the mean, the standard deviation, the correlation coefficient, etc. A hypothesis test is a procedure to decide the veracity or falsehood of a certain hypothesis. In order for a statistical hypothesis to be validated or rejected with accuracy, it would be necessary to examine the entire population, which in practice is not viable. As an alternative, we draw a random sample from the population we are interested in. Since the decision is made based on the sample, errors may occur (rejecting a hypothesis when it is true or not rejecting a hypothesis when it is false), as we will study later on.

The procedures and concepts necessary to construct a hypothesis test will be presented. Let’s consider X a variable associated to a population and θ a certain parameter of this population. We must define the hypothesis to be tested about parameter θ of this population, which is called null hypothesis:

$H_{0} : θ = θ_{0}$

(9.1)

Let’s also define the alternative hypothesis (H₁), in case H₀ is rejected, which can be characterized as follows:

$H_{1} : θ \neq θ_{0}$

(9.2)

and the test is called bilateral test (or two-tailed test).

The significance level of a test (α) represents the probability of rejecting the null hypothesis when it is true (it is one of the two errors that may occur, as we will see later). The critical region (CR) or rejection region (RR) of a bilateral test is represented by two tails of the same size, respectively, in the left and right extremities of the distribution curve, and each one of them corresponds to half of the significance level α, as shown in Fig. 9.1.

Fig. 9.1 Critical region (CR) of a bilateral test, also emphasizing the nonrejection region (NR) of the null hypothesis.

Another way to define the alternative hypothesis (H₁) would be:

$H_{1} : θ < θ_{0}$

(9.3)

and the test is called unilateral test to the left (or left-tailed test).

In this case, the critical region is in the left tail of the distribution and corresponds to significance level α, as shown in Fig. 9.2.

Or the alternative hypothesis could be:

$H_{1} : θ > θ_{0}$

(9.4)

and the test is called unilateral test to the right (or right-tailed test). In this case, the critical region is in the right tail of the distribution and corresponds to significance level α, as shown in Fig. 9.3.

Fig. 9.3 Critical region (CR) of a right-tailed test.

Thus, if the main objective is to check whether a parameter is significantly higher or lower than a certain value, we have to use a unilateral test. On the other hand, if the objective is to check whether a parameter is different from a certain value, we have to use a bilateral test.

After defining the null hypothesis to be tested, through a random sample collected from the population, we either prove the hypothesis or not. Since the decision is made based on the sample, two types of errors may happen:

Type I error: rejecting the null hypothesis when it is true. The probability of this type of error is represented by α:

$P (type I error) = P (rejecting H_{0}| H_{0} is true) = α$

(9.5)

Type II error: not rejecting the null hypothesis when it is false. The probability of this type of error is represented by β:

$P (type II error) = P (not rejecting H_{0}| H_{0} is false) = β$

(9.6)

Table 9.1 shows the types of errors that may happen in a hypothesis test.

Table 9.1

Types of Errors
Decision	H₀ Is True	H₀ Is False
Not rejecting H₀	Correct decision (1 − α)	Type II error (β)
Rejecting H₀	Type I error (α)	Correct decision (1 − β)

The procedure for defining hypotheses tests includes the following phases:

Step 1: Choosing the most suitable statistical test, depending on the researcher’s intention.
Step 2: Presenting the test’s null hypothesis H₀ and its alternative hypothesis H₁.
Step 3: Setting the significance level α.
Step 4: Calculating the value observed of the statistic based on the sample obtained from the population.
Step 5: Determining the test’s critical region based on the value of α set in Step 3.
Step 6: Decision: if the value of the statistic lies in the critical region, reject H₀. Otherwise, do not reject H₀.

According to Fávero et al. (2009), most statistical softwares, among them SPSS and Stata, calculate the P-value that corresponds to the probability associated to the value of the statistic calculated from the sample. P-value indicates the lowest significance level observed that would lead to the rejection of the null hypothesis. Thus, we reject H₀ if P ≤ α.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 of the construction of the hypotheses tests will be:

Step 5: Determine the P-value that corresponds to the probability associated to the value of the statistic calculated in Step 4.
Step 6: Decision: if P-value is less than the significance level α established in Step 3, reject H₀. Otherwise, do not reject H₀.

9.2 Parametric Tests

Hypotheses tests are divided into parametric and nonparametric tests. In this chapter, we will study parametric tests. Nonparametric tests will be studied in the next chapter.

Parametric tests involve population parameters. A parameter is any numerical measure or quantitative characteristic that describes a population. They are fixed values, usually unknown, and represented by Greek characters, such as, the population mean (μ), the population standard deviation (σ), the population variance (σ²), among others.

When hypotheses are formulated about population parameters, the hypothesis test is called parametric. In nonparametric tests, hypotheses are formulated about qualitative characteristics of the population.

Therefore, parametric methods are applied to quantitative data and require strong assumptions in order to be validated, including:

(i) The observations must be independent;
(ii) The sample must be drawn from populations with a certain distribution, usually normal;
(iii) The populations must have equal variances for the comparison tests of two paired population means or k population means (k ≥ 3);
(iv) The variables being studied must be measured in an interval or in a reason scale, so that it can be possible to use arithmetic operations over their respective values.

We will study the main parametric tests, including tests for normality, homogeneity of variance tests, Student’s t-test and its applications, in addition to the analysis of variance (ANOVA) and its extensions. All of them will be solved in an analytical way and also through the statistical softwares SPSS and Stata.

To verify the univariate normality of the data, the most common tests used are Kolmogorov-Smirnov and Shapiro-Wilk. To compare the variance homogeneity between populations, we have Bartlett’s χ² (1937), Cochran’s C (1947a,b), Hartley’s F_max (1950), and Levene’s F (1960) tests.

We will describe Student’s t-test for three situations: to test hypotheses about the population mean, to test hypotheses to compare two independent means, and to compare two paired means.

ANOVA is an extension of Student’s t-test and is used to compare the means of more than two populations. In this chapter, ANOVA of one factor, ANOVA of two factors and its extension for more than two factors will be described.

9.3 Univariate Tests for Normality

Among all univariate tests for normality, the most common are Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia.

9.3.1 Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov test (K-S) is an adherence test, that is, it compares the cumulative frequency distribution of a set of sample values (values observed) to a theoretical distribution. The main goal is to test if the sample values come from a population with a supposed theoretical or expected distribution, in this case, the normal distribution. The statistic is given by the point with the biggest difference (in absolute values) between the two distributions.

To use the K-S test, the population mean and standard deviation must be known. For small samples, the test loses power, so, it should be used with large samples (n ≥ 30).

The K-S test assumes the following hypotheses:

H₀: the sample comes from a population with distribution N(μ, σ)
H₁: the sample does not come from a population with distribution N(μ, σ)

As specified in Fávero et al. (2009), let F_exp(X) be an expected distribution function (normal) of cumulative relative frequencies of variable X, where F_exp(X) ~ N(μ,σ), and F_obs(X) the observed cumulative relative frequency distribution of variable X. The objective is to test whether F_obs(X) = F_exp(X), in contrast with the alternative that F_obs(X) ≠ F_exp(X).

The statistic can be calculated through the following expression:

$D_{cal} = max \{|F_{\exp} (X_{i}) - F_{obs} (X_{i})|; |F_{\exp} (X_{i}) - F_{obs} (X_{i - 1})|\}, for i = 1, \dots, n$

(9.7)

where:

F_exp(X_i): expected cumulative relative frequency in category i;
F_obs(X_i): observed cumulative relative frequency in category i;
F_obs(X_i − 1): observed cumulative relative frequency in category i − 1.

The critical values of Kolmogorov-Smirnov statistic (D_c) are shown in Table G in the Appendix. This table provides the critical values of D_c considering that P(D_cal > D_c) = α (for a right-tailed test). In order for the null hypothesis H₀ to be rejected, the value of the D_cal statistic must be in the critical region, that is, D_cal > D_c. Otherwise, we do not reject H₀.

P-value (the probability associated to the value of D_cal statistic calculated from the sample) can also be seen in Table G. In this case, we reject H₀ if P ≤ α.

Example 9.1

Using the Kolmogorov-Smirnov Test

Table 9.E.1 shows the data on a company’s monthly production of farming equipment in the last 36 months. Check and see if the data in Table 9.E.1 come from a population that follows a normal distribution, considering that α = 5%.

Table 9.E.1

Production of Farming Equipment in the Last 36 Months
52	50	44	50	42	30	36	34	48	40	55	40
30	36	40	42	55	44	38	42	40	38	52	44
52	34	38	44	48	36	36	55	50	34	44	42

Unlabelled Table

Solution

Step 1: Since the objective is to verify if the data in Table 9.E.1 come from a population with a normal distribution, the most suitable test is Kolmogorov-Smirnov (K-S).

Step 2: The K-S test hypotheses for this example are:

H₀: the production of farming equipment in the population follows distribution N(μ, σ)

H₁: the production of farming equipment in the population does not follow distribution N(μ, σ)

Step 3: The significance level to be considered is 5%.

Step 4: All the steps necessary to calculate D_cal from Expression (9.7) are specified in Table 9.E.2.

Table 9.E.2

Calculating the Kolmogorov-Smirnov Statistic
X_i	^aF_abs	^bF_ac	^cFrac_obs	^dZ_i	^eFrac_exp	\| F_exp(X_i) − F_obs(X_i)\|	\| F_exp(X_i) − F_obs(X_i − 1)\|
30	2	2	0.056	− 1.7801	0.0375	0.018	0.036
34	3	5	0.139	− 1.2168	0.1118	0.027	0.056
36	4	9	0.250	− 0.9351	0.1743	0.076	0.035
38	3	12	0.333	− 0.6534	0.2567	0.077	0.007
40	4	16	0.444	− 0.3717	0.3551	0.089	0.022
42	4	20	0.556	− 0.0900	0.4641	0.092	0.020
44	5	25	0.694	0.1917	0.5760	0.118	0.020
48	2	27	0.750	0.7551	0.7749	0.025	0.081
50	3	30	0.833	1.0368	0.8501	0.017	0.100
52	3	33	0.917	1.3185	0.9064	0.010	0.073
55	3	36	1	1.7410	0.9592	0.041	0.043

Unlabelled Table

^a Absolute frequency.

^b Cumulative (absolute) frequency.

^c Observed cumulative relative frequency of X_i.

^d Standardized X_i values according to the expression $Z_{i} = \frac{X_{i} - \bar{X}}{S}$ .

^e Expected cumulative relative frequency of X_i and it corresponds to the probability obtained in Table E in the Appendix (standard normal distribution table) from the value of Z_i.

Therefore, the real value of the K-S statistic based on the sample is D_cal = 0.118.

Step 5: According to Table G in the Appendix, for n = 36 and α = 5%, the critical value of the Kolmogorov-Smirnov statistic is D_c = 0.23.

Step 6: Decision: since the value calculated is not in the critical region (D_cal < D_c), the null hypothesis is not rejected, which allows us to conclude, with a 95% confidence level, that the sample is drawn from a population that follows a normal distribution.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table G in the Appendix, for a sample size n = 36, the probability associated to D_cal = 0.118 has as its lowest limit P = 0.20.

Step 6: Decision: since P > 0.05, we do not reject H₀.

9.3.2 Shapiro-Wilk Test

The Shapiro-Wilk test (S-W) is based on Shapiro and Wilk (1965) and can be applied to samples with 4 ≤ n ≤ 2000 observations, and it is an alternative to the Kolmogorov-Smirnov test for normality (K-S) in the case of small samples (n < 30).

Analogous to the K-S test, the S-W test for normality assumes the following hypotheses:

H₀: the sample comes from a population with distribution N(μ, σ)
H₁: the sample does not come from a population with distribution N(μ, σ)

The calculation of the Shapiro-Wilk statistic (W_cal) is given by:

$W_{cal} = \frac{b^{2}}{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}, for i = 1, \dots, n$

si48_e (9.8)

$b = \sum_{i = 1}^{n / 2} a_{i, n} \cdot (X_{(n - i + 1)} - X_{(i)})$

si49_e (9.9)

where:

X_(i) are the sample statistics of order i, that is, the i-th ordered observation, so, X₍₁₎ ≤ X₍₂₎ ≤ … ≤ X_(n);

$\bar{X}$ is the mean of X;

a_{i, n} are constants generated from the means, variances, and covariances of the statistics of order i of a random sample of size n from a normal distribution. Their values can be seen in Table H₂ in the Appendix.

Small values of W_cal indicate that the distribution of the variable being studied is not normal. The critical values of Shapiro-Wilk statistic W_c are shown in Table H₁ in the Appendix. Different from most tables, this table provides the critical values of W_c considering that P(W_cal < W_c) = α (for a left-tailed test). In order for the null hypothesis H₀ to be rejected, the value of the W_cal statistic must be in the critical region, that is, W_cal < W_c. Otherwise, we do not reject H₀.

P-value (the probability associated to the value of W_cal statistic calculated from the sample) can also be seen in Table H₁. In this case, we reject H₀ if P ≤ α.

Example 9.2

Using the Shapiro-Wilk Test

Table 9.E.3 shows the data on an aerospace company’s monthly production of aircraft in the last 24 months. Check and see if the data in Table 9.E.3 come from a population with a normal distribution, considering that α = 1%.

Table 9.E.3

Production of Aircraft in the Last 24 Months
28	32	46	24	22	18	20	34	30	24	31	29
15	19	23	25	28	30	32	36	39	16	23	36

Unlabelled Table

Solution

Step 1: For a normality test in which n < 30, the most recommended test is the Shapiro-Wilk (S-W).

Step 2: The S-W test hypotheses for this example are:

H₀: the production of aircraft in the population follows normal distribution N(μ, σ)

H₁: the production of aircraft in the population does not follow normal distribution N(μ, σ)

Step 3: The significance level to be considered is 1%.

Step 4: The calculation of the S-W statistic for the data in Table 9.E.3, according to Expressions (9.8) and (9.9), is shown.

First of all, to calculate b, we must sort the data in Table 9.E.3 in ascending order, as shown in Table 9.E.4.

Table 9.E.4

Values From Table 9.E.3 Sorted in Ascending Order
15	16	18	19	20	22	23	23	24	24	25	28
28	29	30	30	31	32	32	34	36	36	39	46

Unlabelled Table

All the steps necessary to calculate b, from Expression (9.9), are specified in Table 9.E.5. The values of a_i,n were obtained from Table H₂ in the Appendix.

Table 9.E.5

Procedure to Calculate b
i	n − i + 1	a_i,n	X_{(n − i + 1)}	X_(i)	a_i,n (X_{(n − i + 1)} − X_(i))
1	24	0.4493	46	15	13.9283
2	23	0.3098	39	16	7.1254
3	22	0.2554	36	18	4.5972
4	21	0.2145	36	19	3.6465
5	20	0.1807	34	20	2.5298
6	19	0.1512	32	22	1.5120
7	18	0.1245	32	23	1.1205
8	17	0.0997	31	23	0.7976
9	16	0.0764	30	24	0.4584
10	15	0.0539	30	24	0.3234
11	14	0.0321	29	25	0.1284
12	13	0.0107	28	28	0.0000
					b = 36.1675

Unlabelled Table

We have $\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} = {(28 - 27.5)}^{2} + \dots + {(36 - 27.5)}^{2} = 1388$

Therefore, $W_{cal} = \frac{b^{2}}{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} = \frac{{(36.1675)}^{2}}{1338} = 0.978$

Step 5: According to Table H₁ in the Appendix, for n = 24 and α = 1%, the critical value of the Shapiro-Wilk statistic is W_c = 0.884.

Step 6: Decision: the null hypothesis is not rejected, since W_cal > W_c (Table H₁ provides the critical values of W_c considering that P(W_cal < W_c) = α), which allows us to conclude, with a 99% confidence level, that the sample is drawn from a population with a normal distribution.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table H₁ in the Appendix, for a sample size n = 24, the probability associated to W_cal = 0.978 is between 0.50 and 0.90 (a probability of 0.90 is associated to W_cal = 0.981).

Step 6: Decision: since P > 0.01, we do not reject H₀.

9.3.3 Shapiro-Francia Test

This test is based on Shapiro and Francia (1972). According to Sarkadi (1975), the Shapiro-Wilk (S-W) and Shapiro-Francia tests (S-F) have the same format, being different only when it comes to defining the coefficients. Moreover, calculating the S-F test is much simpler and it can be considered a simplified version of the S-W test. Despite its simplicity, it is as robust as the Shapiro-Wilk test, making it a substitute for the S-W.

The Shapiro-Francia test can be applied to samples with 5 ≤ n ≤ 5000 observations, and it is similar to the Shapiro-Wilk test for large samples.

Analogous to the S-W test, the S-F test assumes the following hypotheses:

H₀: the sample comes from a population with distribution N(μ, σ)
H₁: the sample does not come from a population with distribution N(μ, σ)

The calculation of the Shapiro-Francia statistic (W_cal′) is given by:

$W_{cal}^{'} = {[\sum_{i = 1}^{n} m_{i} \cdot X_{(i)}]}^{2} / [\sum_{i = 1}^{n} m_{i}^{2} \cdot \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}], for i = 1, \dots, n$

si53_e (9.10)

where:

X_(i) are the sample statistics of order i, that is, the ith ordered observation, so, X₍₁₎ ≤ X₍₂₎ ≤ … ≤ X_(n);
m_i is the approximate expected value of the ith observation (Z-score). The values of m_i are estimated by:

$m_{i} = Φ^{- 1} \cdot (\frac{i}{n + 1})$

si54_e (9.11)

where Φ^− 1 corresponds to the opposite of a standard normal distribution with a mean = zero and a standard deviation = 1. These values can be obtained from Table E in the Appendix.

Small values of W_cal′ indicate that the distribution of the variable being studied is not normal. The critical values of Shapiro-Francia statistic (W_c′) are shown in Table H₁ in the Appendix. Different from most tables, this table provides the critical values of W_c′ considering that P(W_cal′ < W_c′) = α = α (for a left-tailed test). In order for the null hypothesis H₀ to be rejected, the value of the W_cal′ statistic must be in the critical region, that is, W_cal′ < W_c′. Otherwise, we do not reject H₀.

P-value (the probability associated to W_cal′ statistic calculated from the sample) can also be seen in Table H₁. In this case, we reject H₀ if P ≤ α.

Example 9.3

Using the Shapiro-Francia Test

Table 9.E.6 shows all the data regarding a company’s daily production of bicycles in the last 60 months. Check and see if the data come from a population with a normal distribution, considering α = 5%.

Table 9.E.6

Production of Bicycles in the Last 60 Months
85	70	74	49	67	88	80	91	57	63	66	60
72	81	73	80	55	54	93	77	80	64	60	63
67	54	59	78	73	84	91	57	59	64	68	67
70	76	78	75	80	81	70	77	65	63	59	60
61	74	76	81	79	78	60	68	76	71	72	84

Unlabelled Table

Solution

Step 1: The normality of the data can be verified through the Shapiro-Francia test.

Step 2: The S-F test hypotheses for this example are:

H₀: the production of bicycles in the population follows normal distribution N(μ, σ)

H₁: the production of bicycles in the population does not follow normal distribution N(μ, σ)

Step 3: The significance level to be considered is 5%.

Step 4: The procedure to calculate the S-F statistic for the data in Table 9.E.6 is shown in Table 9.E.7.

Table 9.E.7

Procedure to Calculate the Shapiro-Francia Statistic
i	X_(i)	i/(n + 1)	m_i	m_i X_(i)	m_i²	(X_i − $\bar{X}$ )²
1	49	0.0164	− 2.1347	− 104.5995	4.5569	481.8025
2	54	0.0328	− 1.8413	− 99.4316	3.3905	287.3025
3	54	0.0492	− 1.6529	− 89.2541	2.7319	287.3025
4	55	0.0656	− 1.5096	− 83.0276	2.2789	254.4025
5	57	0.0820	− 1.3920	− 79.3417	1.9376	194.6025
6	57	0.0984	− 1.2909	− 73.5841	1.6665	194.6025
7	59	0.1148	− 1.2016	− 70.8960	1.4439	142.8025
8	59	0.1311	− 1.1210	− 66.1380	1.2566	142.8025
…
60	93	0.9836	2.1347	198.5256	4.5569	486.2025
			Sum	574.6704	53.1904	6278.8500

Unlabelled Table

Therefore, W_cal′ = (574.6704)²/(53.1904 × 6278.8500) = 0.989

Step 5: According to Table H₁ in the Appendix, for n = 60 and α = 5%, the critical value of the Shapiro-Francia statistic is W_c′ = 0.9625.

Step 6: Decision: the null hypothesis is not rejected because W_cal′ > W_c′ (Table H₁ provides the critical values of W_c′ considering that P(W_cal′ < W_c′) = α), which allows us to conclude, with a 95% confidence level, that the sample is drawn from a population that follows a normal distribution.

If we used P-value instead of the statistic’s critical value, Steps 5 and 6 would be:

Step 5: According to Table H₁ in the Appendix, for a sample size n = 60, the probability associated to W_cal′ = 0.989 is greater than 0.10 (P-value).

Step 6: Decision: since P > 0.05, we do not reject H₀.

9.3.4 Solving Tests for Normality by Using SPSS Software

The Kolmogorov-Smirnov and Shapiro-Wilk tests for normality can be solved by using IBM SPSS Statistics Software. The Shapiro-Francia test, on the other hand, will be elaborated through the Stata software, as we will see in the next section.

Based on the procedure that will be described, SPSS shows the results of the K-S and the S-W tests for the sample selected. The use of the images in this section has been authorized by the International Business Machines Corporation©.

Let’s consider the data presented in Example 9.1 that are available in the file Production_FarmingEquipment.sav. Let´s open the file and select Analyze → Descriptive Statistics → Explore …, as shown in Fig. 9.4.

Fig. 9.4 Procedure for elaborating a univariate normality test on SPSS for Example 9.1.

From the Explore dialog box, we must select the variable we are interested in on the Dependent List, as shown in Fig. 9.5. Let´s click on Plots … (the Explore: Plots dialog box will open) and select the option Normality plots with tests (Fig. 9.6). Finally, let’s click on Continue and on OK.

Fig. 9.6 Selecting the normality test on SPSS.

The results of the Kolmogorov-Smirnov and Shapiro-Wilk tests for normality for the data in Example 9.1 are shown in Fig. 9.7.

According to Fig. 9.7, the result of the K-S statistic was 0.118, similar to the value calculated in Example 9.1. Since the sample has more than 30 elements, we should only use the K-S test to verify the normality of the data (the S-W test was applied to Example 9.2). Nevertheless, SPSS also makes the result of the S-W statistic available for the sample selected.

As presented in the introduction of this chapter, SPSS calculates the P-value that corresponds to the lowest significance level observed that would lead to the rejection of the null hypothesis. For the K-S and S-W tests the P-value corresponds to the lowest value of P from which D_cal > D_c and W_cal < W_c. As shown in Fig. 9.7, the value of P for the K-S test was of 0.200 (this probability can also be obtained from Table G in the Appendix, as shown in Example 9.1). Since P > 0.05, we do not reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the data distribution is normal. The S-W test also allows us to conclude that the data distribution follows a normal distribution.

Applying the same procedure to verify the normality of the data in Example 9.2 (the data are available in the file Production_Aircraft.sav), we get the results shown in Fig. 9.8.

Fig. 9.8 Results of the tests for normality for Example 9.2 on SPSS.

Analogous to Example 9.2, the result of the S-W test was 0.978. The K-S test was not applied to this example due to the sample size (n < 30). The P-value of the S-W test is 0.857 (in Example 9.2, we saw that this probability would be between 0.50 and 0.90 and closer to 0.90) and, since P > 0.01, the null hypothesis is not rejected, which allows us to conclude that the data distribution in the population follows a normal distribution. We will use this test when estimating regression models in Chapter 13.

For this example, we can also conclude from the K-S test that the data distribution follows a normal distribution.

9.3.5 Solving Tests for Normality by Using Stata

The Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia tests for normality can be solved by using Stata Statistical Software. The Kolmogorov-Smirnov test will be applied to Example 9.1, the Shapiro-Wilk test to Example 9.2, and the Shapiro-Francia test to Example 9.3. The use of the images in this section has been authorized by StataCorp LP©.

9.3.5.1 Kolmogorov-Smirnov Test on the Stata Software

The data presented in Example 9.1 are available in the file Production_FarmingEquipment.dta. Let’s open this file and verify that the name of the variable being studied is production.

To elaborate the Kolmogorov-Smirnov test on Stata, we must specify the mean and the standard deviation of the variable that interests us in the test syntax, so, the command summarize, or simply sum, must be typed first, followed by the respective variable:

sum production

and we get Fig. 9.9. Therefore, we can see that the mean is 42.63889 and the standard deviation is 7.099911.

The Kolmogorov-Smirnov test is given by the following command:

ksmirnov production = normal((production-42.63889)/7.099911)

The result of the test can be seen in Fig. 9.10. We can see that the value of the statistic is similar to the one calculated in Example 9.1 and by SPSS software. Since P > 0.05, we conclude that the data distribution is normal.

9.3.5.2 Shapiro-Wilk Test on the Stata Software

The data presented in Example 9.2 are available in the file Production_Aircraft.dta. To elaborate the Shapiro-Wilk test on Stata, the syntax of the command is:

swilk variables⁎

where the term variables⁎ should be substituted for the list of variables being considered. For the data in Example 9.2, we have a single variable called production, so, the command to be typed is:

swilk production

The result of the Shapiro-Wilk test can be seen in Fig. 9.11. Since P > 0.05, we can conclude that the sample comes from a population with a normal distribution.

9.3.5.3 Shapiro-Francia Test on the Stata Software

The data presented in Example 9.3 are available in the file Production_Bicycles.dta. To elaborate the Shapiro-Francia test on Stata, the syntax of the command is:

sfrancia variables⁎

where the term variables⁎ should be substituted for the list of variables being considered. For the data in Example 9.3, we have a single variable called production, so, the command to be typed is:

sfrancia production

The result of the Shapiro-Francia test can be seen in Fig. 9.12. We can see that the value is similar to the one calculated in Example 9.3 (W ′ = 0.989). Since P > 0.05, we conclude that the sample comes from a population with a normal distribution.

We will use this test when estimating regression models in Chapter 13.

9.4 Tests for the Homogeneity of Variances

One of the conditions to apply a parametric test to compare k population means is that the population variances, estimated from k representative samples, be homogeneous or equal. The most common tests to verify variance homogeneity are Bartlett’s χ² (1937), Cochran’s C (1947a,b), Hartley’s F_max (1950), and Levene’s F (1960) tests.

In the null hypothesis of variance homogeneity tests, the variances of k populations are homogeneous. In the alternative hypothesis, at least one population variance is different from the others. That is:

$\begin{array}{l} H_{0} : σ_{1}^{2} = σ_{2}^{2} = \dots = σ_{k}^{2} \\ H_{1} : \exists_{i, j} : σ_{i}^{2} \neq σ_{j}^{2} (i j = 1 \dots k) \end{array}$

si55_e (9.12)

9.4.1 Bartlett’s χ² Test

The original test proposed to verify variance homogeneity among groups is Bartlett’s χ² test (1937). This test is very sensitive to normality deviations, and Levene’s test is an alternative in this case.

Bartlett’s statistic is calculated from q:

$q = (N - k) \cdot ln (S_{p}^{2}) - \sum_{i = 1}^{k} (n_{i} - 1) \cdot ln (S_{i}^{2})$

si56_e (9.13)

where:

n_i, i = 1, …, k, is the size of each sample i and ∑_i = 1^kn_i = N;
S_i², i = 1, …, k, is the variance in each sample i;

and

$S_{p}^{2} = \frac{\sum_{i = 1}^{k} (n_{i} - 1) \cdot S_{i}^{2}}{N - k}$

si57_e (9.14)

A correction factor c is applied to q statistic, with the following expression:

$c = 1 + \frac{1}{3 \cdot (k - 1)} \cdot (\sum_{i = 1}^{k} \frac{1}{n_{i} - 1} - \frac{1}{N - k})$

si58_e (9.15)

where Bartlett’s statistic (B_cal) approximately follows a chi-square distribution with k − 1 degrees of freedom:

$B_{cal} = \frac{q}{c} \sim χ_{k - 1}^{2}$

(9.16)

From the previous expressions, we can see that the higher the difference between the variances, the higher the value of B. On the other hand, if all the sample variances are equal, its value will be zero. To confirm if the null hypothesis of variance homogeneity will be rejected or not, the value calculated must be compared to the statistic’s critical value (χ_c²), which is available in Table D in the Appendix.

This table provides the critical values of χ_c² considering that P(χ_cal² > χ_c²) = α (for a right-tailed test). Therefore, we reject the null hypothesis if B_cal > χ_c². On the other hand, if B_cal ≤ χ_c², we do not reject H₀.

P-value (the probability associated to χ_cal² statistic) can also be obtained from Table D. In this case, we reject H₀ if P ≤ α.

Example 9.4

Applying Bartlett’s χ² Test

A chain of supermarkets wishes to study the number of customers they serve every day in order to make strategic operational decisions. Table 9.E.8 shows the data of three stores throughout two weeks. Check if the variances between the groups are homogeneous. Consider α = 5%.

Table 9.E.8

Number of Customers Served Per Day and Per Store
	Store 1	Store 2	Store 3
Day 1	620	710	924
Day 2	630	780	695
Day 3	610	810	854
Day 4	650	755	802
Day 5	585	699	931
Day 6	590	680	924
Day 7	630	710	847
Day 8	644	850	800
Day 9	595	844	769
Day 10	603	730	863
Day 11	570	645	901
Day 12	605	688	888
Day 13	622	718	757
Day 14	578	702	712
Standard deviation	24.4059	62.2466	78.9144
Variance	595.6484	3874.6429	6227.4780

Unlabelled Table

Solution

If we apply the Kolmogorov-Smirnov or the Shapiro-Wilk test for normality to the data in Table 9.E.8, we will verify that their distribution shows adherence to normality, with a 5% significance level, so, Bartlett’s χ² test can be applied to compare the homogeneity of the variances between the groups.

Step 1: Since the main goal is to compare the equality of the variances between the groups, we can use Bartlett’s χ² test.

Step 2: Bartlett’s χ² test hypotheses for this example are:

H₀: the population variances of all three groups are homogeneous

H₁: the population variance of at least one group is different from the others

Step 3: The significance level to be considered is 5%.

Step 4: The complete calculation of Bartlett’s χ² statistic is shown. First, we calculate the value of S_p², according to Expression (9.14):

$S_{p}^{2} = \frac{13 \times (595.65 + 3874.64 + 6227.48)}{42 - 3} = 3565.92$

si60_e

Thus, we can calculate q through Expression (9.13):

$q = 39 \cdot ln (3565.92) - 13 \cdot [ln (595.65) + ln (3874.64) + ln (6227.48)] = 14.94$

The correction factor c for q statistic is calculated from Expression (9.15):

$c = 1 + (\frac{1}{3 \cdot (3 - 1)}) \cdot 3 \cdot (\frac{1}{13} - \frac{1}{42 - 3}) = 1.0256$

si62_e

Finally, we calculate B_cal:

$B_{cal} = \frac{q}{c} = \frac{14.94}{1.0256} = 14.567$

si63_e

Step 5: According to Table D in the Appendix, for ν = 3 − 1 degrees of freedom and α = 5%, the critical value of Bartlett’s χ² test is χ_c² = 5.991.

Step 6: Decision: since the value calculated lies in the critical region (B_cal > χ_c²), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population variance of at least one group is different from the others.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table D in the Appendix, for ν = 2 degrees of freedom, the probability associated to χ_cal² = 14.567 is less than 0.005 (a probability of 0.005 is associated to χ_cal² = 10.597).

Step 6: Decision: since P < 0.05, we reject H₀.

9.4.2 Cochran’s C Test

Cochran’s C test (1947a,b) compares the group with the highest variance in relation to the others. The test demands that the data have a normal distribution.

Cochran’s C statistic is given by:

$C_{cal} = \frac{S_{\max}^{2}}{\sum_{i = 1}^{k} S_{i}^{2}}$

si64_e (9.17)

where:

S_max² is the highest variance in the sample;

S_i² is the variance in sample i, i = 1, …, k.

According to Expression (9.17), if all the variances are equal, the value of the C_cal statistic is 1/k. The higher the difference of S_max² in relation to the other variances, the more the value of C_cal gets closer to 1. To confirm whether the null hypothesis will be rejected or not, the value calculated must be compared to Cochran’s (C_c) statistic’s critical value, which is available in Table M in the Appendix.

The values of C_c vary depending on the number of groups (k), the number of degrees of freedom ν = max(n_i − 1), and the value of α. Table M provides the critical values of C_c considering that P(C_cal > C_c) = α (for a right-tailed test). Thus, we reject H₀ if C_cal > C_c. Otherwise, we do not reject H₀.

Example 9.5

Applying Cochran’s C Test

Use Cochran’s C test for the data in Example 9.4. The main objective here is to compare the group with the highest variability in relation to the others.

Solution

Step 1: Since the objective is to compare the group with the highest variance (group 3—see Table 9.E.8) in relation to the others, Cochran’s C test is the most recommended.

Step 2: Cochran’s C test hypotheses for this example are:

H₀: the population variance of group 3 is equal to the others

H₁: the population variance of group 3 is different from the others

Step 3: The significance level to be considered is 5%.

Step 4: From Table 9.E.8, we can see that S_max² = 6227.48. Therefore, the calculation of Cochran’s C statistic is given by:

$C_{cal} = \frac{S_{\max}^{2}}{\sum_{i = 1}^{k} S_{i}^{2}} = \frac{6227.48}{595.65 + 3874.64 + 6227.48} = 0.582$

si65_e

Step 5: According to Table M in the Appendix, for k = 3, ν = 13, and α = 5%, the critical value of Cochran’s C statistic is C_c = 0.575.

Step 6: Decision: since the value calculated lies in the critical region (C_cal > C_c), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population variance of group 3 is different from the others.

9.4.3 Hartley’s F_max Test

Hartley’s F_max test (1950) has the statistic that represents the relationship between the group with the highest variance (S_max²) and the group with the lowest variance (S_min²):

$F_{\max, cal} = \frac{S_{\max}^{2}}{S_{\min}^{2}}$

si66_e (9.18)

The test assumes that the number of observations per group is equal to (n₁ = n₂ = … = n_k = n). If all the variances are equal, the value of F_max will be 1. The higher the difference between S_max² and S_min², the higher the value of F_max. To confirm if the null hypothesis of variance homogeneity will be rejected or not, the value calculated must be compared to the (F_max,c) statistic’s critical value, which is available in Table N in the Appendix. The critical values vary depending on the number of groups (k), the number of degrees of freedom ν = n − 1, and the value of α, and this table provides the critical values of F_max,c considering that P(F_max,cal > F_max,c) = α (for a right-tailed test). Therefore, we reject the null hypothesis H₀ of variance homogeneity if F_max,cal > F_max,c. Otherwise, we do not reject H₀.

P-value (the probability associated to F_max,cal statistic) can also be obtained from Table N in the Appendix. In this case, we reject H₀ if P ≤ α.

Example 9.6

Applying Hartley’s F_max Test

Use Hartley’s F_max test for the data in Example 9.4. The goal here is to compare the group with the highest variability to the group with the lowest variability.

Solution

Step 1: Since the main objective is to compare the group with the highest variance (group 3—see Table 9.E.8) to the group with the lowest variance (group 1), Hartley’s F_max test is the most recommended.

Step 2: Hartley’s F_max test hypotheses for this example are:

H₀: the population variance of group 3 is the same as group 1

H₁: the population variance of group 3 is different from group 1

Step 3: The significance level to be considered is 5%.

Step 4: From Table 9.E.8, we can see that S_min² = 595.65 and S_max² = 6227.48. Therefore, the calculation of Hartley’s F_max statistic is given by:

$F_{max, cal} = \frac{S_{\max}^{2}}{S_{\min}^{2}} = \frac{6, 227.48}{595.65} = 10.45$

si67_e

Step 5: According to Table N in the Appendix, for k = 3, ν = 13, and α = 5%, the critical value of the test is F_max,c = 3.953.

Step 6: Decision: since the value calculated lies in the critical region (F_max,cal > F_max,c), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population variance of group 3 is different from the population variance of group 1.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table N in the Appendix, the probability associated to F_max,cal = 10.45, for k = 3 and ν = 13, is less than 0.01.

Step 6: Decision: since P < 0.05, we reject H₀.

9.4.4 Levene’s F-Test

The advantage of Levene’s F-test in relation to other homogeneity of variance tests is that it is less sensitive to deviations from normality, in addition to being considered a more robust test.

Levene’s statistic is given by Expression (9.19) and it follows an F-distribution, approximately, with ν₁ = k − 1 and ν₂ = N − k degrees of freedom, for a significance level α:

$F_{cal} = \frac{(N - k)}{(k - 1)} \cdot \frac{\sum_{i = 1}^{k} n_{i} \cdot {({\bar{Z}}_{i} - \bar{Z})}^{2}}{\sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} {(Z_{ij} - {\bar{Z}}_{i})}^{2}} \underset{H_{0}}{~} F_{k - 1, N - k, α}$

si68_e (9.19)

where:

n_i is the dimension of each one of the k samples (i = 1, …, k);
N is the dimension of the global sample (N = n₁ + n₂ + ⋯ + n_k);
$Z_{ij} = |X_{ij} - {\bar{X}}_{i}|$ , i = 1, …, k and j = 1, …, n_i;
X_ij is observation j in sample i;
${\bar{X}}_{i}$ is the mean of sample i;
${\bar{Z}}_{i}$ is the mean of Z_ij in sample i;
$\bar{Z}$ is the mean of Z_i in the global sample.

An expansion of Levene’s test can be found in Brown and Forsythe (1974).

From the F-distribution table (Table A in the Appendix), we can determine the critical values of Levene’s statistic (F_c = F_{k − 1,N − k,α}). Table A provides the critical values of F_c considering that P(F_cal > F_c) = α (right-tailed table). In order for the null hypothesis H₀ to be rejected, the value of the statistic must be in the critical region, that is, F_cal > F_c. If F_cal ≤ F_c, we do not reject H₀.

P-value (the probability associated to F_cal statistic) can also be obtained from Table A. In this case, we reject H₀ if P ≤ α.

Example 9.7

Applying Levene’s Test

Elaborate Levene’s test for the data in Example 9.4.

Solution

Step 1: Levene’s test can be applied to check variance homogeneity between the groups, and it is more robust than the other tests.

Step 2: Levene’s test hypotheses for this example are:

H₀: the population variances of all three groups are homogeneous

H₁: the population variance of at least one group is different from the others

Step 3: The significance level to be considered is 5%.

Step 4: The calculation of the F_cal statistic, according to Expression (9.19), is shown.

Table 9.E.9

Calculating the F_cal Statistic
I	X_1j	$Z_{1 j} = \|X_{1 j} - {\bar{X}}_{1}\|$	$Z_{1 j} - {\bar{Z}}_{1}$	${(Z_{1 j} - {\bar{Z}}_{1})}^{2}$
1	620	10.571	− 9.429	88.898
1	630	20.571	0.571	0.327
1	610	0.571	− 19.429	377.469
1	650	40.571	20.571	423.184
1	585	24.429	4.429	19.612
1	590	19.429	− 0.571	0.327
1	630	20.571	0.571	0.327
1	644	34.571	14.571	212.327
1	595	14.429	− 5.571	31.041
1	603	6.429	− 13.571	184.184
1	570	39.429	19.429	377.469
1	605	4.429	− 15.571	242.469
1	622	12.571	− 7.429	55.184
1	578	31.429	11.429	130.612
	${\bar{X}}_{1} = 609.429$	${\bar{Z}}_{1} = 20$		Sum = 2143.429
I	X_2j	$Z_{2 j} = \|X_{2 j} - {\bar{X}}_{2}\|$	$Z_{2 j} - {\bar{Z}}_{2}$	${(Z_{2 j} - {\bar{Z}}_{2})}^{2}$
2	710	27.214	− 23.204	538.429
2	780	42.786	− 7.633	58.257
2	810	72.786	22.367	500.298
2	755	17.786	− 32.633	1064.890
2	699	38.214	− 12.204	148.940
2	680	57.214	6.796	46.185
2	710	27.214	− 23.204	538.429
2	850	112.786	62.367	3889.686
2	844	106.786	56.367	3177.278
2	730	7.214	− 43.204	1866.593
2	645	92.214	41.796	1746.899
2	688	49.214	− 1.204	1.450
2	718	19.214	− 31.204	973.695
2	702	35.214	− 15.204	231.164
	${\bar{X}}_{2} = 737.214$	${\bar{Z}}_{2} = 50.418$		Sum = 14,782.192
I	X_3j	$Z_{3 j} = \|X_{3 j} - {\bar{X}}_{3}\|$	$Z_{3 j} - {\bar{Z}}_{3}$	${(Z_{3 j} - {\bar{Z}}_{3})}^{2}$
3	924	90.643	24.194	585.344
3	695	138.357	71.908	5170.784
3	854	20.643	− 45.806	2098.201
3	802	31.357	− 35.092	1231.437
3	931	97.643	31.194	973.058
3	924	90.643	24.194	585.344
3	847	13.643	− 52.806	2788.487
3	800	33.357	− 33.092	1095.070
3	769	64.357	− 2.092	4.376
3	863	29.643	− 36.806	1354.691
3	901	67.643	1.194	1.425
3	888	54.643	− 11.806	139.385
3	757	76.357	9.908	98.172
3	712	121.357	54.908	3014.906
	${\bar{X}}_{3} = 833.36$	${\bar{Z}}_{3} = 66.449$		Sum = 19,140.678

Unlabelled Table

Therefore, the calculation of F_cal is carried out as follows:

$F_{cal} = \frac{(42 - 3)}{(3 - 1)} \cdot \frac{14 \cdot {(20 - 45.62)}^{2} + 14 \cdot {(50.418 - 45.62)}^{2} + 14 \cdot {(66.449 - 45.62)}^{2}}{2143.429 + 14, 782.192 + 19, 140.678}$

si73_e

$F_{cal} = 8.427$

Step 5: According to Table A in the Appendix, for ν₁ = 2, ν₂ = 39, and α = 5%, the critical value of the test is F_c = 3.24.

Step 6: Decision: since the value calculated lies in the critical region (F_cal > F_c), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population variance of at least one group is different from the others.

If we use P-value instead of the statistic’s critical value, Steps 5 and 6 will be:

Step 5: According to Table A in the Appendix, for ν₁ = 2 and ν₂ = 39, the probability associated to F_cal = 8.427 is less than 0.01 (P-value).

Step 6: Decision: since P < 0.05, we reject H₀.

9.4.5 Solving Levene’s Test by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©. To test the variance homogeneity between the groups, SPSS uses Levene’s test. The data presented in Example 9.4 are available in the file CustomerServices_Store.sav. In order to elaborate the test, we must click on Analyze → Descriptive Statistics → Explore …, as shown in Fig. 9.13.

Fig. 9.13 Procedure for elaborating Levene’s test on SPSS.

Let’s include the variable Customer_services in the list of dependent variables (Dependent List) and the variable Store in the factor list (Factor List), as shown in Fig. 9.14.

Fig. 9.14 Selecting the variables to elaborate Levene’s test on SPSS.

Next, we must click on Plots … and select the option Untransformed in Spread vs Level with Levene Test, as shown in Fig. 9.15.

Finally, let’s click on Continue and on OK. The result of Levene’s test can also be obtained through the ANOVA test, by clicking on Analyze → Compare Means → One-Way ANOVA …. In Options …, we must select the option Homogeneity of variance test (Fig. 9.16).

Fig. 9.16 Results of Levene’s test for Example 9.4 on SPSS.

The value of Levene’s statistic is 8.427, exactly the same as the one calculated previously. Since the significance level observed is 0.001, a value lower than 0.05, the test shows the rejection of the null hypothesis, which allows us to conclude, with a 95% confidence level, that the population variances are not homogeneous.

9.4.6 Solving Levene’s Test by Using the Stata Software

The use of the images in this section has been authorized by StataCorp LP©.

Levene’s statistical test for equality of variances is calculated on Stata by using the command robvar (robust-test for equality of variances), which has the following syntax:

robvar variable⁎, by(groups⁎)

in which the term variable⁎ should be substituted for the quantitative variable studied and the term groups⁎ by the categorical variable that represents them.

Let’s open the file CustomerServices_Store.dta that contains the data of Example 9.7. The three groups are represented by the variable store and the number of customers served by the variable services. Therefore, the command to be typed is:

robvar services, by(store)

The result of the test can be seen in Fig. 9.17. We can verify that the value of the statistic (8.427) is similar to the one calculated in Example 9.7 and to the one generated on SPSS, as well as the calculation of the probability associated to the statistic (0.001). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the variances are not homogeneous.

9.5 Hypotheses Tests Regarding a Population Mean (μ) From One Random Sample

The main goal is to test if a population mean assumes a certain value or not.

9.5.1 Z Test When the Population Standard Deviation (σ) Is Known and the Distribution Is Normal

This test is applied when a random sample of size n is obtained from a population with a normal distribution, whose mean (μ) is unknown and whose standard deviation (σ) is known. If the distribution of the population is not known, it is necessary to work with large samples (n > 30), because the central limit theorem guarantees that, as the sample size grows, the sample distribution of its mean gets closer and closer to a normal distribution.

For a bilateral test, the hypotheses are:

H₀: the sample comes from a population with a certain mean (μ = μ₀)
H₁: it challenges the null hypothesis (μ ≠ μ₀)

The statistical test used here refers to the sample mean ( $\bar{X}$ ). In order for the sample mean to be compared to the value in the table, it must be standardized, so:

$Z_{cal} = \frac{\bar{X} - μ_{0}}{σ_{\bar{X}}} ~ N (0, 1), where σ_{\bar{X}} = \frac{σ}{\sqrt{n}}$

si76_e (9.20)

The critical values of the z_c statistic are shown in Table E in the Appendix. This table provides the critical values of z_c considering that P(Z_cal > z_c) = α (for a right-tailed test). For a bilateral test, we must consider P(Z_cal > z_c) = α/2, since P(Z_cal < − z_c) + P(Z_cal > z_c) = α. The null hypothesis H₀ of a bilateral test is rejected if the value of the Z_cal statistic lies in the critical region, that is, if Z_cal < − z_c or Z_cal > z_c. Otherwise, we do not reject H₀.

The unilateral probabilities associated to Z_cal statistic (P) can also be obtained from Table E. For a unilateral test, we consider that P = P₁. For a bilateral test, this probability must be doubled (P = 2P₁). Therefore, for both tests, we reject H₀ if P ≤ α.

Example 9.8

Applying the z Test to One Sample

A cereal manufacturer states that the average quantity of food fiber in each portion of its product is, at least, 4.2 g with a standard deviation of 1 g. A health care agency wishes to verify if this statement is true. Collecting a random sample of 42 portions, in which the average quantity of food fiber is 3.9 g. With a significance level equal to 5%, is there evidence to reject the manufacturer’s statement?

Solution

Step 1: The suitable test for a population mean with a known σ, considering a single sample of size n > 30 (normal distribution), is the z test.

Step 2: For this example, the z test hypotheses are:

H₀: μ ≥ 4.2 g (information provided by the supplier)

H₁: μ < 4.2 g

which corresponds to a left-tailed test.

Step 3: The significance level to be considered is 5%.

Step 4: The calculation of the Z_cal statistic, according to Expression (9.20), is:

$Z_{cal} = \frac{\bar{X} - μ_{0}}{σ / \sqrt{n}} = \frac{3.9 - 4.2}{1 / \sqrt{42}} = - 1.94$

si77_e

Step 5: According to Table E in the Appendix, for a left-tailed test with α = 5%, the critical value of the test is z_c = − 1.645.

Step 6: Decision: since the value calculated lies in the critical region (z_cal < − 1.645), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the manufacturer’s average quantity of food fiber is less than 4.2 g.

If, instead of comparing the value calculated to the critical value of the standard normal distribution, we use the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table E in the Appendix, for a left-tailed test, the probability associated to z_cal = − 1.94 is 0.0262 (P-value).

Step 6: Decision: since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the manufacturer’s average quantity of food fiber is less than 4.2 g.

9.5.2 Student’s t-Test When the Population Standard Deviation (σ) Is Not Known

Student’s t-test for one sample is applied when we do not know the population standard deviation (σ), so, its value is estimated from the sample standard deviation (S). However, to substitute σ for S in Expression (9.20), the distribution of the variable will no longer be normal; it will become a Student’s t-distribution with n − 1 degrees of freedom.

Analogous to the z test, Student’s t-test for one sample assumes the following hypotheses for a bilateral test:

H₀: μ = μ₀
H₁: μ ≠ μ₀

And the calculation of the statistic becomes:

$T_{cal} = \frac{\bar{X} - μ_{0}}{S / \sqrt{n}} ~ t_{n - 1}$

si78_e (9.21)

The value calculated must be compared to the value in Student’s t-distribution table (Table B in the Appendix). This table provides the critical values of t_c considering that P(T_cal > t_c) = α (for a right-tailed test). For a bilateral test, we have P(T_cal < − t_c) = α/2 = P(T_cal > t_c), as shown in Fig. 9.18.

Fig. 9.18 Nonrejection region (NR) and critical region (CR) of Student’s t-distribution for a bilateral test.

Therefore, for a bilateral test, the null hypothesis is rejected if T_cal < − t_c or T_cal > t_c. If − t_c ≤ T_cal ≤ t_c, we do not reject H₀.

The unilateral probabilities associated to T_cal statistic (P₁) can also be obtained from Table B. For a unilateral test, we have P = P₁. For a bilateral test, this probability must be doubled (P = 2P₁). Therefore, for both tests, we reject H₀ if P ≤ α.

Example 9.9

Applying Student’s t-Test to One Sample

The average processing time of a task using a certain machine has been 18 min. New concepts have been implemented in order to reduce the average processing time. Hence, after a certain period of time, a sample with 25 elements was collected, and an average time of 16.808 min was measured, with a standard deviation of 2.733 min. Check and see if this result represents an improvement in the average processing time. Consider α = 1%.

Solution

Step 1: The suitable test for a population mean with an unknown σ is Student’s t-test.

Step 2: For this example, Student’s t-test hypotheses are:

H₀: μ = 18

H₁: μ < 18

which corresponds to a left-tailed test.

Step 3: The significance level to be considered is 1%.

Step 4: The calculation of the T_cal statistic, according to Expression (9.21), is:

$T_{cal} = \frac{\bar{X} - μ_{0}}{S / \sqrt{n}} = \frac{16.808 - 18}{2.733 / \sqrt{25}} = - 2.18$

si79_e

Step 5: According to Table B in the Appendix, for a left-tailed test with 24 degrees of freedom and α = 1%, the critical value of the test is t_c = − 2.492.

Step 6: Decision: since the value calculated is not in the critical region (T_cal > − 2.492), the null hypothesis is not rejected, which allows us to conclude, with a 99% confidence level, that there was no improvement in the average processing time.

If, instead of comparing the value calculated to the critical value of Student’s t-distribution, we use the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table B in the Appendix, for a left-tailed test with 24 degrees of freedom, the probability associated to T_cal = − 2.18 is between 0.01 and 0.025 (P-value).

Step 6: Decision: since P > 0.01, we do not reject the null hypothesis.

9.5.3 Solving Student’s t-Test for a Single Sample by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

If we wish to compare means from a single sample, SPSS makes Student’s t-test available. The data in Example 9.9 are available in the file T_test_One_Sample.sav. The procedure to apply the test from Example 9.9 will be described. Initially, let´s select Analyze → Compare Means → One-Sample T Test …, as shown in Fig. 9.19.

Fig. 9.19 Procedure for elaborating the t-test from one sample on SPSS.

We must select the variable Time and specify the value 18 that will be tested in Test Value, as shown in Fig. 9.20.

Now, we must click on Options … to define the desired confidence level (Fig. 9.21).

Finally, let’s click on Continue and on OK. The results of the test are shown in Fig. 9.22.

This figure shows the result of the t-test (similar to the value calculated in Example 9.9) and the associated probability (P-value) for a bilateral test. For a unilateral test, the associated probability is 0.0195 (we saw in Example 9.9 that this probability would be between 0.01 and 0.025). Since 0.0195 > 0.01, we do not reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there was no improvement in the average processing time.

9.5.4 Solving Student’s t-Test for a Single Sample by Using Stata Software

The use of the images in this section has been authorized by StataCorp LP©.

Student’s t-test is elaborated on Stata by using the command ttest. For one population mean, the test syntax is:

ttest variable⁎ == #

where the term variable⁎ should be substituted for the name of the variable considered in the analysis and # for the value of the population mean to be tested.

The data in Example 9.9 are available in the file T_test_One_Sample.dta. In this case, the variable being analyzed is called time and the goal is to verify if the average processing time is still 18 min, so, the command to be typed is:

ttest time == 18

The result of the test can be seen in Fig. 9.23. We can see that the calculated value of the statistic (− 2.180) is similar to the one calculated in Example 9.9 and also generated on SPSS, as well as the associated probability for a left-tailed test (0.0196). Since P > 0.01, we do not reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there was no improvement in the processing time.

9.6 Student’s t-Test to Compare Two Population Means From Two Independent Random Samples

The t-test for two independent samples is applied to compare the means of two random samples (X_1i, i = 1, …, n₁; X_2j, _j = 1, …, n₂) obtained from the same population. In this test, the population variance is unknown.

For a bilateral test, the null hypothesis of the test states that the population means are the same. If the population means are different, the null hypothesis is rejected, so:

H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂

The calculation of the T statistic depends on the comparison of the population variances between the groups.

Case 1: σ₁² ≠ σ₂²

Considering that the population variances are different, the calculation of the T statistic is given by:

$T_{cal} = \frac{({\bar{X}}_{1} - {\bar{X}}_{2})}{\sqrt{\frac{S_{1}^{2}}{n_{1}} + \frac{S_{2}^{2}}{n_{2}}}}$

si80_e (9.22)

with the following degrees of freedom:

$ν = \frac{{(\frac{S_{1}^{2}}{n_{1}} + \frac{S_{2}^{2}}{n_{2}})}^{2}}{\frac{{(S_{1}^{2} / n_{1})}^{2}}{(n_{1} - 1)} + \frac{{(S_{2}^{2} / n_{2})}^{2}}{(n_{2} - 1)}}$

si81_e (9.23)

Case 2: σ₁² = σ₂²

When the population variances are homogeneous, to calculate the T statistic, the researcher has to use:

$T_{cal} = \frac{({\bar{X}}_{1} - {\bar{X}}_{2})}{S_{p} \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}$

si82_e (9.24)

where:

$S_{p} = \sqrt{\frac{(n_{1} - 1) \cdot S_{1}^{2} + (n_{2} - 1) \cdot S_{2}^{2}}{n_{1} + n_{2} - 2}}$

si83_e (9.25)

and T_cal follows Student’s t-distribution with ν = n₁ + n₂ − 2 degrees of freedom.

Fig. 9.24 Nonrejection region (NR) and critical region (CR) of Student’s t-distribution for a bilateral test.

Therefore, for a bilateral test, if the value of the statistic lies in the critical region, that is, if T_cal < − t_c or T_cal > t_c, the test allows us to reject the null hypothesis. On the other hand, if − t_c ≤ T_cal ≤ t_c, we do not reject H₀.

Example 9.10

Applying Student’s t-Test to Two Independent Samples

A quality engineer believes that the average time to manufacture a certain plastic product may depend on the raw materials used, which come from two different suppliers. A sample with 30 observations from each supplier is collected for a test and the results are shown in Tables 9.E.10 and 9.E.11. For a significance level α = 5%, check if there is any difference between the means.

Solution

Step 1: The suitable test to compare two population means with an unknown σ is Student’s t-test for two independent samples.

Step 2: For this example, Student’s t-test hypotheses are:

H₀: μ₁ = μ₂

H₁: μ₁ ≠ μ₂

Step 3: The significance level to be considered is 5%.

Step 4: For the data in Tables 9.E.10 and 9.E.11, we calculate ${\bar{X}}_{1} = 24.277$ , ${\bar{X}}_{2} = 27.530$ , S₁² = 1.810, and S₂² = 1.559. Considering that the population variances are homogeneous, according to the solution generated on SPSS, let’s use Expressions (9.24) and (9.25) to calculate the T_cal statistic, as follows:

Table 9.E.10

Manufacturing Time Using Raw Materials From Supplier 1
22.8 23.4 26.2 24.3 22.0 24.8 26.7 25.1 23.1 22.8
25.6 25.1 24.3 24.2 22.8 23.2 24.7 26.5 24.5 23.6
23.9 22.8 25.4 26.7 22.9 23.5 23.8 24.6 26.3 22.7

Table 9.E.11

Manufacturing Time Using Raw Materials From Supplier 2
26.8 29.3 28.4 25.6 29.4 27.2 27.6 26.8 25.4 28.6
29.7 27.2 27.9 28.4 26.0 26.8 27.5 28.5 27.3 29.1
29.2 25.7 28.4 28.6 27.9 27.4 26.7 26.8 25.6 26.1

Manufacturing Time Using Raw Materials From Supplier 1
22.8	23.4	26.2	24.3	22.0	24.8	26.7	25.1	23.1	22.8
25.6	25.1	24.3	24.2	22.8	23.2	24.7	26.5	24.5	23.6
23.9	22.8	25.4	26.7	22.9	23.5	23.8	24.6	26.3	22.7

Manufacturing Time Using Raw Materials From Supplier 2
26.8	29.3	28.4	25.6	29.4	27.2	27.6	26.8	25.4	28.6
29.7	27.2	27.9	28.4	26.0	26.8	27.5	28.5	27.3	29.1
29.2	25.7	28.4	28.6	27.9	27.4	26.7	26.8	25.6	26.1

$S_{p} = \sqrt{\frac{29 \cdot 1.810 + 29 \cdot 1.559}{30 + 30 - 2}} = 1.298$

si86_e

$T_{cal} = \frac{24.277 - 27.530}{1.298 \cdot \sqrt{\frac{1}{30} + \frac{1}{30}}} = - 9.708$

si87_e

with ν = 30 + 30 – 2 = 58 degrees of freedom.

Step 5: The critical region of the bilateral test, considering ν = 58 degrees of freedom and α = 5%, can be defined from Student’s t-distribution table (Table B in the Appendix), as shown in Fig. 9.25.

Fig. 9.25 Critical region of Example 9.10.

For a bilateral test, each one of the tails corresponds to half of significance level α.

Step 6: Decision: since the value calculated lies in the critical region, that is, T_cal < − 2.002, we must reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the population means are different.

If, instead of comparing the value calculated to the critical value of Student’s t-distribution, we use the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table B in the Appendix, for a right-tailed test with ν = 58 degrees of freedom, probability P₁ associated to T_cal = 9.708 is less than 0.0005. For a bilateral test, this probability must be doubled (P = 2P₁).

Step 6: Decision: since P < 0.05, the null hypothesis is rejected.

9.6.1 Solving Student’s t-Test From Two Independent Samples by Using SPSS Software

The data in Example 9.10 are available in the file T_test_Two_Independent_Samples.sav. The procedure for solving Student’s t-test to compare two population means from two independent random samples on SPSS is described. The use of the images in this section has been authorized by the International Business Machines Corporation©.

We must click on Analyze → Compare Means → Independent-Samples T Test …, as shown in Fig. 9.26.

Let’s include the variable Time in Test Variable(s) and the variable Supplier in Grouping Variable. Next, let’s click on Define Groups … to define the groups (categories) of the variable Supplier, as shown in Fig. 9.27.

Fig. 9.27 Selecting the variables and defining the groups.

If the confidence level desired by the researcher is different from 95%, the button Options … must be selected to change it. Finally, let’s click on OK. The results of the test are shown in Fig. 9.28.

Fig. 9.28 Results of the t-test for two independent samples for Example 9.10 on SPSS.

The value of the t statistic for the test is − 9.708 and the associated bilateral probability is 0.000 (P < 0.05), which leads us to reject the null hypothesis, and allows us to conclude, with a 95% confidence level, that the population means are different. We can notice that Fig. 9.28 also shows the result of Levene’s test. Since the significance level observed is 0.694, value greater than 0.05, we can also conclude, with a 95% confidence level, that the variances are homogeneous.

9.6.2 Solving Student’s t-Test From Two Independent Samples by Using Stata Software

The t-test to compare the means of two independent groups on Stata is elaborated by using the following syntax:

ttest variable⁎, by(groups⁎)

where the term variable⁎ must be substituted for the quantitative variable being analyzed, and the term groups⁎ for the categorical variable that represents them.

The data in Example 9.10 are available in the file T_test_Two_Independent_Samples.dta. The variable supplier shows the groups of suppliers. The values for each group of suppliers are specified in the variable time. Thus, we must type the following command:

ttest time, by(supplier)

The result of the test can be seen in Fig. 9.29. We can see that the calculated value of the statistic (− 9.708) is similar to the one calculated in Example 9.10 and also generated on SPSS, as well as the associated probability for a bilateral test (0.000). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population means are different.

9.7 Student’s t-Test to Compare Two Population Means From Two Paired Random Samples

This test is applied to check whether the means of two paired or related samples, obtained from the same population (before and after) with a normal distribution, are significantly different or not. Besides the normality of the data of each sample, the test requires the homogeneity of the variances between the groups.

Different from the t-test for two independent samples, first, we must calculate the difference between each pair of values in position i (d_i = X_before,i − X_after,i, i = 1, …, n) and, after that, test the null hypothesis that the mean of the differences in the population is zero.

For a bilateral test, we have:

H₀: μ_d = 0, μ_d = μ_before − μ_after
H₁: μ_d ≠ 0

The T_cal statistic for the test is given by:

$T_{cal} = \frac{\bar{d} - μ_{d}}{S_{d} / \sqrt{n}} ~ t_{ν = n - 1}$

si88_e (9.26)

where:

$\bar{d} = \frac{\sum_{i = 1}^{n} d}{n}$

si89_e (9.27)

and

$S_{d} = \sqrt{\frac{\sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{2}}{n - 1}}$

si90_e (9.28)

Fig. 9.30 Nonrejection region (NR) and critical region (CR) of Student’s t-distribution for a bilateral test.

Therefore, for a bilateral test, the null hypothesis is rejected if T_cal < − t_c or T_cal > t_c. If − t_c ≤ T_cal ≤ t_c, we do not reject H₀.

Example 9.11

Applying Student’s t-Test to Two Paired Samples

A group of 10 machine operators, responsible for carrying out a certain task, is trained to perform the same task more efficiently. To verify if there is a reduction in the time taken to perform the task, we measured the time spent by each operator, before and after the training course. Test the hypothesis that the population means of both paired samples are similar, that is, that there is no reduction in time taken to perform the task after the training course. Consider α = 5%.

Table 9.E.12

Time Spent Per Operator Before the Training Course
3.2	3.6	3.4	3.8	3.4	3.5	3.7	3.2	3.5	3.9

Unlabelled Table

Table 9.E.13

Time Spent Per Operator After the Training Course
3.0	3.3	3.5	3.6	3.4	3.3	3.4	3.0	3.2	3.6

Unlabelled Table

Solution

Step 1: In this case, the most suitable test is Student’s t-test for two paired samples.

Since the test requires the normality of the data in each sample and the homogeneity of the variances between the groups, K-S or S-W tests, besides Levene’s test, must be applied for such verification. As we will see, in the solution of this example on SPSS, all of these assumptions will be validated.

Step 2: For this example, Student’s t-test hypotheses are:

H₀: μ_d = 0

H₁: μ_d ≠ 0

Step 3: The significance level to be considered is 5%.

Step 4: In order to calculate the T_cal statistic, first, we must calculate d_i:

$\bar{d} = \frac{\sum_{i = 1}^{n} d_{i}}{n} = \frac{0.2 + 0.3 + \dots + 0.3}{10} = 0.19$

si91_e

$S_{d} = \sqrt{\frac{{(0.2 - 0.19)}^{2} + {(0.3 - 0.19)}^{2} + \dots + {(0.3 - 0.19)}^{2}}{9}} = 0.137$

si92_e

$T_{cal} = \frac{\bar{d}}{S_{d} / \sqrt{n}} = \frac{0.19}{0.137 / \sqrt{10}} = 4.385$

si93_e

Step 5: The critical region of the bilateral test can be defined from Student’s t-distribution table (Table B in the Appendix), considering ν = 9 degrees of freedom and α = 5%, as shown in Fig. 9.31.

Fig. 9.31 Critical region of Example 9.11.

Table 9.E.14

Calculating d_i
X_{before, i}	3.2	3.6	3.4	3.8	3.4	3.5	3.7	3.2	3.5	3.9
X_{after, i}	3.0	3.3	3.5	3.6	3.4	3.3	3.4	3.0	3.2	3.6
d_i	0.2	0.3	−0.1	0.2	0	0.2	0.3	0.2	0.3	0.3

Unlabelled Table

For a bilateral test, each tail corresponds to half of significance level α.

Step 6: Decision: since the value calculated lies in the critical region (T_cal > 2.262), the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that there is a significant difference between the times spent by the operators before and after the training course.

If, instead of comparing the value calculated to the critical value of Student’s t-distribution, we used the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table B in the Appendix, for a right-tailed test with ν = 9 degrees of freedom, the P₁ probability associated to T_cal = 4.385 is between 0.0005 and 0.001. For a bilateral test, this probability must be doubled (P = 2P₁), so, 0.001 < P < 0.002.

Step 6: Decision: since P < 0.05, the null hypothesis is rejected.

9.7.1 Solving Student’s t-Test From Two Paired Samples by Using SPSS Software

First, we must test the normality of the data in each sample, as well as the variance homogeneity between the groups. Using the same procedures described in Sections 9.3.3 and 9.4.5 (the data must be placed in a table the same way as in Section 9.4.5), we obtain Figs. 9.32 and 9.33.

Based on Fig. 9.32, we conclude that there is normality of the data for each sample. From Fig. 9.33, we can conclude that the variances between the samples are homogeneous.

The use of the images in this section has been authorized by the International Business Machines Corporation©. To solve Student’s t-test for two paired samples on SPSS, we must open the file T_test_Two_Paired_Samples.sav. Then, we have to click on Analyze → Compare Means → Paired-Samples T Test …, as shown in Fig. 9.34.

Fig. 9.34 Procedure for elaborating the t-test from two paired samples on SPSS.

We must select the variable Before and move it to Variable1 and the variable After to Variable2, as shown in Fig. 9.35.

If the desired confidence level is different from 95%, we must click on Options … to change it. Finally, let’s click on OK. The results of the test are shown in Fig. 9.36.

Fig. 9.36 Results of the t-test for two paired samples.

The value of the t-test is 4.385 and the significance level observed for a bilateral test is 0.002, value less than 0.05, which leads us to reject the null hypothesis and allows us to conclude, with a 95% confidence level, that there is a significant difference between the times spent by the operators before and after the training course.

9.7.2 Solving Student’s t-Test From Two Paired Samples by Using Stata Software

The t-test to compare the means of two paired groups will be solved on Stata for the data in Example 9.11. The use of the images in this section has been authorized by StataCorp LP©.

Therefore, let’s open the file T_test_Two_Paired_Samples.dta. The paired variables are called before and after. In this case, we must type the following command:

ttest before == after

The result of the test can be seen in Fig. 9.37. We can see that the calculated value of the statistic (4.385) is similar to the one calculated in Example 9.11 and on SPSS, as well as the probability associated to the statistic for a bilateral test (0.0018). Since P < 0.05, we reject the null hypothesis that the times spent by the operators before and after the training course are the same, with a 95% confidence level.

9.8 ANOVA to Compare the Means of More Than Two Populations

ANOVA is a test used to compare the means of three or more populations, through the analysis of sample variances. The test is based on a sample obtained from each population, aiming at determining if the differences between the sample means suggest significant differences between the population means, or if such differences are only a result of the implicit variability of the sample.

ANOVA’s assumptions are:

(i) The samples must be independent from each other;
(ii) The data in the populations must have a normal distribution;
(iii) The population variances must be homogeneous.

9.8.1 One-Way ANOVA

One-way ANOVA is an extension of Student’s t-test for two population means, allowing the researcher to compare three or more population means.

The null hypothesis of the test states that the population means are the same. If there is at least one group with a mean that is different from the others, the null hypothesis is rejected.

As stated in Fávero et al. (2009), the one-way ANOVA allows researcher to verify the effect of a qualitative explanatory variable (factor) on a quantitative dependent variable. Each group includes the observations of the dependent variable in one category of the factor.

Assuming that size n independent samples are obtained from k populations (k ≥ 3) and that the means of these populations can be represented by μ₁, μ₂, …, μ_k, the analysis of variance tests the following hypotheses:

$\begin{array}{l} H_{0} : μ_{1} = μ_{2} = \dots = μ_{k} \\ H_{1} : \exists_{(i, j)} μ_{i} \neq μ_{j}, i \neq j \end{array}$

si94_e (9.29)

According to Maroco (2014), in general, the observations for this type of problem can be represented according to Table 9.2.

Table 9.2

Observations of the One-Way ANOVA
Samples or Groups
1	2	…	K
Y₁₁	Y₁₂	…	Y_1k
Y₂₁	Y₂₂	…	Y_2k
…	…	…	…
Y_n₁1	Y_n₂2	…	Y_{n_kk}

Table 9.2

where Y_ij represents observation i of sample or group j (i = 1, …, n_j; j = 1, …, k) and n_j is the dimension of sample or group j. The dimension of the global sample is N = ∑_i = 1^kn_i. Pestana and Gageiro (2008) present the following model:

$Y_{ij} = μ_{i} + ɛ_{ij}$

(9.30)

$Y_{ij} = μ + (μ_{i} - μ) \cdot ɛ_{ij}$

(9.31)

$Y_{ij} = μ + α_{i} + ɛ_{ij}$

(9.32)

where:

μ is the global mean of the population;
μ_i is the mean of sample or group i;
α_i is the effect of sample or group i;
ɛ_ij is the random error.

Therefore, ANOVA assumes that each group comes from a population with a normal distribution, mean μ_i, and a homogeneous variance, that is, Y_ij ~ N(μ_i,σ), resulting in the hypothesis that the errors (residuals) have a normal distribution with a mean equal to zero and a constant variance, that is, ɛ_ij ~ N(0,σ), besides being independent (Fávero et al., 2009).

The technique’s hypotheses are tested from the calculation of the group variances, and that is where the name ANOVA comes from. The technique involves the calculation of the variations between the groups ( ${\bar{Y}}_{i} - \bar{Y}$ ) and within each group ( $Y_{ij} - {\bar{Y}}_{i}$ ). The residual sum of squares within groups (RSS) is calculated by:

$RSS = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{j}} {(Y_{ij} - {\bar{Y}}_{i})}^{2}$

si100_e (9.33)

The residual sum of squares between groups, or the sum of squares of the factor (SSF), is given by:

$SSF = \sum_{i = 1}^{k} n_{i} \cdot {({\bar{Y}}_{i} - \bar{Y})}^{2}$

si101_e (9.34)

Therefore, the total sum is:

$TSS = RSS + SSF = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} {(Y_{ij} - \bar{Y})}^{2}$

si102_e (9.35)

According to Fávero et al. (2009) and Maroco (2014), the ANOVA statistic is given by the division between the variance of the factor (SSF divided by k − 1 degrees of freedom) and the variance of the residuals (RSS divided by N − k degrees of freedom):

$F_{cal} = \frac{\frac{SSF}{(k - 1)}}{\frac{RSS}{(N - k)}} = \frac{MSF}{MSR}$

si103_e (9.36)

where:

MSF represents the mean square between groups (estimate of the variance of the factor);
MSR represents the mean square within groups (estimate of the variance of the residuals).

Table 9.3 summarizes the calculations of the one-way ANOVA.

Table 9.3

Calculating the One-Way ANOVA
Source of Variation	Sum of Squares	Degrees of Freedom	Mean Squares	F
Between the groups	$SSF = \sum_{i = 1}^{k} n_{i} {({\bar{Y}}_{i} - \bar{Y})}^{2}$	k − 1	$MSF = \frac{SSF}{(k - 1)}$	$F = \frac{MSF}{MSR}$
Within the groups	$RSS = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} {(Y_{ij} - {\bar{Y}}_{i})}^{2}$	N − k	$MSR = \frac{RSS}{(N - k)}$
Total	$TSS = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} {(Y_{ij} - \bar{Y})}^{2}$	N − 1

Table 9.3

Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.

The value of F can be null or positive, but never negative. Therefore, ANOVA requires an asymmetrical F-distribution to the right.

The calculated value (F_cal) must be compared to the value in the F-distribution table (Table A in the Appendix). This table provides the critical values of F_c = F_{k − 1,N − k,α} where P(F_cal > F_c) = α (right-tailed test). Therefore, one-way ANOVA’s null hypothesis is rejected if F_cal > F_c. Otherwise, if (F_cal ≤ F_c), we do not reject H₀.

We will use these concepts when we study the estimation of regression models in Chapter 13.

Example 9.12

Applying the One-Way ANOVA Test

A sample with 32 products is collected to analyze the quality of the honey supplied by three different suppliers. One of the ways to test the quality of the honey is finding out how much sucrose it contains, which usually varies between 0.25% and 6.5%. Table 9.E.15 shows the percentage of sucrose in the sample collected from each supplier. Check if there are differences in this quality indicator among the three suppliers, considering a 5% significance level.

Table 9.E.15

Percentage of Sucrose for the Three Suppliers
Supplier 1 (n₁ = 12)	Supplier 2 (n₂ = 10)	Supplier 3 (n₃ = 10)
0.33	1.54	1.47
0.79	1.11	1.69
1.24	0.97	1.55
1.75	2.57	2.04
0.94	2.94	2.67
2.42	3.44	3.07
1.97	3.02	3.33
0.87	3.55	4.01
0.33	2.04	1.52
0.79	1.67	2.03
1.24
3.12
${\bar{Y}}_{1} = 1.316$	${\bar{Y}}_{2} = 2.285$	${\bar{Y}}_{3} = 2.338$
S₁ = 0.850	S₂ = 0.948	S₃ = 0.886

Solution

Step 1: In this case, the most suitable test is the one-way ANOVA.

First, we must verify the assumptions of normality for each group and of variance homogeneity between the groups through the Kolmogorov-Smirnov, Shapiro-Wilk, and Levene tests. Figs. 9.38 and 9.39 show the results obtained by using SPSS software.

Fig. 9.38 Results of the tests for normality on SPSS.

Fig. 9.39 Results of Levene’s test on SPSS.

Since the significance level observed in the tests for normality for each group and in the variance homogeneity test between the groups is greater than 5%, we can conclude that each one of the groups shows data with a normal distribution and that the variances between the groups are homogeneous, with a 95% confidence level. Since the assumptions of the one-way ANOVA were met, the technique can be applied.

Step 2: For this example, ANOVA’s null hypothesis states that there are no differences in the amount of sucrose coming from the three suppliers. If there is at least one supplier with a population mean that is different from the others, the null hypothesis will be rejected. Thus, we have:

H₀: μ₁ = μ₂ = μ₃

H₁: ∃_(i,j) μ_i ≠ μ_j, i ≠ j

Step 3: The significance level to be considered is 5%.

Step 4: The calculation of the F_cal statistic is specified here.

For this example, we know that k = 3 groups and the global sample size is N = 32. The global sample mean is $\bar{Y} = 1.938$ .

The sum of squares between groups (SSF) is:

$SSF = 12 \cdot {(1.316 - 1.938)}^{2} + 10 \cdot {(2.285 - 1.938)}^{2} + 10 \cdot {(2.338 - 1.938)}^{2} = 7.449$

Therefore, the mean square between groups (MS_B) is:

$MSF = \frac{SSF}{(k - 1)} = \frac{7.449}{2} = 3.725$

si106_e

The calculation of the sum of squares within groups (RSS) is shown in Table 9.E.16.

Table 9.E.16

Calculation of the Sum of Squares Within Groups (SS_W)
Supplier	Sucrose	$Y_{ij} - {\bar{Y}}_{i}$	${(Y_{ij} - {\bar{Y}}_{i})}^{2}$
1	0.33	− 0.986	0.972
1	0.79	− 0.526	0.277
1	1.24	− 0.076	0.006
1	1.75	0.434	0.189
1	0.94	− 0.376	0.141
1	2.42	1.104	1.219
1	1.97	0.654	0.428
1	0.87	− 0.446	0.199
1	0.33	− 0.986	0.972
1	0.79	− 0.526	0.277
1	1.24	− 0.076	0.006
1	3.12	1.804	3.255
2	1.54	− 0.745	0.555
2	1.11	− 1.175	1.381
2	0.97	− 1.315	1.729
2	2.57	0.285	0.081
2	2.94	0.655	0.429
2	3.44	1.155	1.334
2	3.02	0.735	0.540
2	3.55	1.265	1.600
2	2.04	− 0.245	0.060
2	1.67	− 0.615	0.378
3	1.47	− 0.868	0.753
3	1.69	− 0.648	0.420
3	1.55	− 0.788	0.621
3	2.04	− 0.298	0.089
3	2.67	0.332	0.110
3	3.07	0.732	0.536
3	3.33	0.992	0.984
3	4.01	1.672	2.796
3	1.52	− 0.818	0.669
3	2.03	− 0.308	0.095
RSS			23.100

Unlabelled Table

Therefore, the mean square within groups is:

$MSR = \frac{RSS}{(N - k)} = \frac{23.100}{29} = 0.797$

si107_e

Thus, the value of the F_cal statistic is:

$F_{cal} = \frac{MSF}{MSR} = \frac{3.725}{0.797} = 4.676$

si108_e

Step 5: According to Table A in the Appendix, the critical value of the statistic is F_c = F_{2, 29, 5%} = 3.33.

Step 6: Decision: since the value calculated lies in the critical region (F_cal > F_c), we reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that there is at least one supplier with a population mean that is different from the others.

If, instead of comparing the value calculated to the critical value of Snedecor’s F-distribution, we use the calculation of P-value, Steps 5 and 6 will be:

Step 5: According to Table A in the Appendix, for ν₁ = 2 degrees of freedom in the numerator and ν₂ = 29 degrees of freedom in the denominator, the probability associated to F_cal = 4.676 is between 0.01 and 0.025 (P-value).

Step 6: Decision: since P < 0.05, the null hypothesis is rejected.

9.8.1.1 Solving the One-Way ANOVA Test by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©. The data in Example 9.12 are available in the file One_Way_ANOVA.sav. First of all, let´s click on Analyze → Compare Means → One-Way ANOVA …, as shown in Fig. 9.40.

Fig. 9.40 Procedure for the one-way ANOVA.

Let's include the variable Sucrose in the list of dependent variables (Dependent List) and the variable Supplier in the box Factor, according to Fig. 9.41.

After that, we must click on Options … and select the option Homogeneity of variance test (Levene’s test for variance homogeneity). Finally, let’s click on Continue and on OK to obtain the result of Levene’s test, besides the ANOVA table. Since ANOVA does not make the normality test available, it must be obtained by applying the same procedure described in Section 9.3.3.

According to Fig. 9.42, we can verify that each one of the groups has data that follow a normal distribution. Moreover, through Fig. 9.43, we can conclude that the variances between the groups are homogeneous.

From the ANOVA table (Fig. 9.44), we can see that the value of the F-test is 4.676 and the respective P-value is 0.017 (we saw in Example 9.12 that this value would be between 0.01 and 0.025), value less than 0.05. This leads us to reject the null hypothesis and allows us to conclude, with a 95% confidence level, that at least one of the population means is different from the others (there are differences in the percentage of sucrose in the honey of the three suppliers).

9.8.1.2 Solving the One-Way ANOVA Test by Using Stata Software

The one-way ANOVA on Stata is generated from the following syntax:

anova variable_y⁎ factor⁎

in which the term variable_y⁎ should be substituted for the quantitative dependent variable and the term factor⁎ for the qualitative explanatory variable.

The data in Example 9.12 are available in the file One_Way_Anova.dta. The quantitative dependent variable is called sucrose and the factor is represented by the variable supplier. Thus, we must type the following command:

anova sucrose supplier

The result of the test can be seen in Fig. 9.45. We can see that the calculated value of the statistic (4.68) is similar to the one calculated in Example 9.12 and also generated on SPSS, as well as the probability associated to the value of the statistic (0.017). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that at least one of the population means is different from the others.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Hypotheses Tests

Create new playlist

Sign In

Sign Up

9.1 Introduction

9.2 Parametric Tests

9.3 Univariate Tests for Normality

9.3.1 Kolmogorov-Smirnov Test

9.3.2 Shapiro-Wilk Test

9.3.3 Shapiro-Francia Test

9.3.4 Solving Tests for Normality by Using SPSS Software

9.3.5 Solving Tests for Normality by Using Stata

9.3.5.1 Kolmogorov-Smirnov Test on the Stata Software

9.3.5.2 Shapiro-Wilk Test on the Stata Software

9.3.5.3 Shapiro-Francia Test on the Stata Software

9.4 Tests for the Homogeneity of Variances

9.4.1 Bartlett’s χ2 Test

9.4.2 Cochran’s C Test

9.4.3 Hartley’s Fmax Test

9.4.4 Levene’s F-Test

9.4.5 Solving Levene’s Test by Using SPSS Software

9.4.6 Solving Levene’s Test by Using the Stata Software

9.5 Hypotheses Tests Regarding a Population Mean (μ) From One Random Sample

9.5.1 Z Test When the Population Standard Deviation (σ) Is Known and the Distribution Is Normal

9.5.2 Student’s t-Test When the Population Standard Deviation (σ) Is Not Known

9.5.3 Solving Student’s t-Test for a Single Sample by Using SPSS Software

9.5.4 Solving Student’s t-Test for a Single Sample by Using Stata Software

9.6 Student’s t-Test to Compare Two Population Means From Two Independent Random Samples

Case 1: σ12 ≠ σ22

Case 2: σ12 = σ22

9.6.1 Solving Student’s t-Test From Two Independent Samples by Using SPSS Software

9.6.2 Solving Student’s t-Test From Two Independent Samples by Using Stata Software

9.7 Student’s t-Test to Compare Two Population Means From Two Paired Random Samples

9.7.1 Solving Student’s t-Test From Two Paired Samples by Using SPSS Software

9.7.2 Solving Student’s t-Test From Two Paired Samples by Using Stata Software

9.8 ANOVA to Compare the Means of More Than Two Populations

9.8.1 One-Way ANOVA

9.8.1.1 Solving the One-Way ANOVA Test by Using SPSS Software

9.8.1.2 Solving the One-Way ANOVA Test by Using Stata Software

Table of Contents for
Chapter 9: Hypotheses Tests

9.4.1 Bartlett’s χ² Test

9.4.3 Hartley’s F_max Test

Case 1: σ₁² ≠ σ₂²

Case 2: σ₁² = σ₂²