This chapter discusses how hypotheses tests are inserted in statistical inference. The concept of hypotheses tests and their goals is presented here, as well as the procedures for constructing them. Hypotheses tests are classified as parametric and nonparametric, and this chapter focuses mainly on parametric tests (nonparametric tests will be discussed in the following chapter). We define the concepts and assumptions of parametric tests, in addition to their respective advantages and disadvantages. We will study the main types of parametric hypotheses tests and the inherent assumptions, including tests for normality, homogeneity of variance tests, Student’s t-test and its applications, besides the ANOVA and its extensions. Thus, it is possible to know when to use each one of the parametric tests. Each test is solved analytically and also through IBM SPSS Statistics Software and Stata Statistical Software. Then, the results obtained are interpreted.
Hypotheses tests; Parametric tests; Normality tests; Homogeneity of variance tests; Student’s t-test; ANOVA
We must conduct research and then accept the results. If they don’t stand up to experimentation, Buddha’s own words must be rejected.
Tenzin Gyatso, 14th Dalai Lama
As discussed previously, one of the problems to be solved by statistical inference is hypotheses testing. A statistical hypothesis is an assumption about a certain population parameter, such as, the mean, the standard deviation, the correlation coefficient, etc. A hypothesis test is a procedure to decide the veracity or falsehood of a certain hypothesis. In order for a statistical hypothesis to be validated or rejected with accuracy, it would be necessary to examine the entire population, which in practice is not viable. As an alternative, we draw a random sample from the population we are interested in. Since the decision is made based on the sample, errors may occur (rejecting a hypothesis when it is true or not rejecting a hypothesis when it is false), as we will study later on.
The procedures and concepts necessary to construct a hypothesis test will be presented. Let’s consider X a variable associated to a population and θ a certain parameter of this population. We must define the hypothesis to be tested about parameter θ of this population, which is called null hypothesis:
H0:θ=θ0
Let’s also define the alternative hypothesis (H1), in case H0 is rejected, which can be characterized as follows:
H1:θ≠θ0
and the test is called bilateral test (or two-tailed test).
The significance level of a test (α) represents the probability of rejecting the null hypothesis when it is true (it is one of the two errors that may occur, as we will see later). The critical region (CR) or rejection region (RR) of a bilateral test is represented by two tails of the same size, respectively, in the left and right extremities of the distribution curve, and each one of them corresponds to half of the significance level α, as shown in Fig. 9.1.
Another way to define the alternative hypothesis (H1) would be:
H1:θ<θ0
and the test is called unilateral test to the left (or left-tailed test).
In this case, the critical region is in the left tail of the distribution and corresponds to significance level α, as shown in Fig. 9.2.
Or the alternative hypothesis could be:
H1:θ>θ0
and the test is called unilateral test to the right (or right-tailed test). In this case, the critical region is in the right tail of the distribution and corresponds to significance level α, as shown in Fig. 9.3.
Thus, if the main objective is to check whether a parameter is significantly higher or lower than a certain value, we have to use a unilateral test. On the other hand, if the objective is to check whether a parameter is different from a certain value, we have to use a bilateral test.
After defining the null hypothesis to be tested, through a random sample collected from the population, we either prove the hypothesis or not. Since the decision is made based on the sample, two types of errors may happen:
Type I error: rejecting the null hypothesis when it is true. The probability of this type of error is represented by α:
P(typeIerror)=P(rejectingH0|H0is true)=α
Type II error: not rejecting the null hypothesis when it is false. The probability of this type of error is represented by β:
P(typeIIerror)=P(not rejectingH0|H0is false)=β
Table 9.1 shows the types of errors that may happen in a hypothesis test.
Table 9.1
Decision | H0 Is True | H0 Is False |
---|---|---|
Not rejecting H0 | Correct decision (1 − α) | Type II error (β) |
Rejecting H0 | Type I error (α) | Correct decision (1 − β) |
The procedure for defining hypotheses tests includes the following phases:
According to Fávero et al. (2009), most statistical softwares, among them SPSS and Stata, calculate the P-value that corresponds to the probability associated to the value of the statistic calculated from the sample. P-value indicates the lowest significance level observed that would lead to the rejection of the null hypothesis. Thus, we reject H0 if P ≤ α.
If we use P-value instead of the statistic’s critical value, Steps 5 and 6 of the construction of the hypotheses tests will be:
Hypotheses tests are divided into parametric and nonparametric tests. In this chapter, we will study parametric tests. Nonparametric tests will be studied in the next chapter.
Parametric tests involve population parameters. A parameter is any numerical measure or quantitative characteristic that describes a population. They are fixed values, usually unknown, and represented by Greek characters, such as, the population mean (μ), the population standard deviation (σ), the population variance (σ2), among others.
When hypotheses are formulated about population parameters, the hypothesis test is called parametric. In nonparametric tests, hypotheses are formulated about qualitative characteristics of the population.
Therefore, parametric methods are applied to quantitative data and require strong assumptions in order to be validated, including:
We will study the main parametric tests, including tests for normality, homogeneity of variance tests, Student’s t-test and its applications, in addition to the analysis of variance (ANOVA) and its extensions. All of them will be solved in an analytical way and also through the statistical softwares SPSS and Stata.
To verify the univariate normality of the data, the most common tests used are Kolmogorov-Smirnov and Shapiro-Wilk. To compare the variance homogeneity between populations, we have Bartlett’s χ2 (1937), Cochran’s C (1947a,b), Hartley’s Fmax (1950), and Levene’s F (1960) tests.
We will describe Student’s t-test for three situations: to test hypotheses about the population mean, to test hypotheses to compare two independent means, and to compare two paired means.
ANOVA is an extension of Student’s t-test and is used to compare the means of more than two populations. In this chapter, ANOVA of one factor, ANOVA of two factors and its extension for more than two factors will be described.
Among all univariate tests for normality, the most common are Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia.
The Kolmogorov-Smirnov test (K-S) is an adherence test, that is, it compares the cumulative frequency distribution of a set of sample values (values observed) to a theoretical distribution. The main goal is to test if the sample values come from a population with a supposed theoretical or expected distribution, in this case, the normal distribution. The statistic is given by the point with the biggest difference (in absolute values) between the two distributions.
To use the K-S test, the population mean and standard deviation must be known. For small samples, the test loses power, so, it should be used with large samples (n ≥ 30).
The K-S test assumes the following hypotheses:
As specified in Fávero et al. (2009), let Fexp(X) be an expected distribution function (normal) of cumulative relative frequencies of variable X, where Fexp(X) ~ N(μ,σ), and Fobs(X) the observed cumulative relative frequency distribution of variable X. The objective is to test whether Fobs(X) = Fexp(X), in contrast with the alternative that Fobs(X) ≠ Fexp(X).
The statistic can be calculated through the following expression:
Dcal=max{|Fexp(Xi)−Fobs(Xi)|;|Fexp(Xi)−Fobs(Xi−1)|},fori=1,…,n
where:
The critical values of Kolmogorov-Smirnov statistic (Dc) are shown in Table G in the Appendix. This table provides the critical values of Dc considering that P(Dcal > Dc) = α (for a right-tailed test). In order for the null hypothesis H0 to be rejected, the value of the Dcal statistic must be in the critical region, that is, Dcal > Dc. Otherwise, we do not reject H0.
P-value (the probability associated to the value of Dcal statistic calculated from the sample) can also be seen in Table G. In this case, we reject H0 if P ≤ α.
The Shapiro-Wilk test (S-W) is based on Shapiro and Wilk (1965) and can be applied to samples with 4 ≤ n ≤ 2000 observations, and it is an alternative to the Kolmogorov-Smirnov test for normality (K-S) in the case of small samples (n < 30).
Analogous to the K-S test, the S-W test for normality assumes the following hypotheses:
The calculation of the Shapiro-Wilk statistic (Wcal) is given by:
Wcal=b2∑ni=1(Xi−ˉX)2,fori=1,…,n
b=n/2∑i=1ai,n⋅(X(n−i+1)−X(i))
where:
X(i) are the sample statistics of order i, that is, the i-th ordered observation, so, X(1) ≤ X(2) ≤ … ≤ X(n);
ˉX is the mean of X;
ai, n are constants generated from the means, variances, and covariances of the statistics of order i of a random sample of size n from a normal distribution. Their values can be seen in Table H2 in the Appendix.
Small values of Wcal indicate that the distribution of the variable being studied is not normal. The critical values of Shapiro-Wilk statistic Wc are shown in Table H1 in the Appendix. Different from most tables, this table provides the critical values of Wc considering that P(Wcal < Wc) = α (for a left-tailed test). In order for the null hypothesis H0 to be rejected, the value of the Wcal statistic must be in the critical region, that is, Wcal < Wc. Otherwise, we do not reject H0.
P-value (the probability associated to the value of Wcal statistic calculated from the sample) can also be seen in Table H1. In this case, we reject H0 if P ≤ α.
This test is based on Shapiro and Francia (1972). According to Sarkadi (1975), the Shapiro-Wilk (S-W) and Shapiro-Francia tests (S-F) have the same format, being different only when it comes to defining the coefficients. Moreover, calculating the S-F test is much simpler and it can be considered a simplified version of the S-W test. Despite its simplicity, it is as robust as the Shapiro-Wilk test, making it a substitute for the S-W.
The Shapiro-Francia test can be applied to samples with 5 ≤ n ≤ 5000 observations, and it is similar to the Shapiro-Wilk test for large samples.
Analogous to the S-W test, the S-F test assumes the following hypotheses:
The calculation of the Shapiro-Francia statistic (Wcal′) is given by:
W′cal=[n∑i=1mi⋅X(i)]2/[n∑i=1m2i⋅n∑i=1(Xi−ˉX)2],fori=1,…,n
where:
mi=Φ−1⋅(in+1)
where Φ− 1 corresponds to the opposite of a standard normal distribution with a mean = zero and a standard deviation = 1. These values can be obtained from Table E in the Appendix.
Small values of Wcal′ indicate that the distribution of the variable being studied is not normal. The critical values of Shapiro-Francia statistic (Wc′) are shown in Table H1 in the Appendix. Different from most tables, this table provides the critical values of Wc′ considering that P(Wcal′ < Wc′) = α = α (for a left-tailed test). In order for the null hypothesis H0 to be rejected, the value of the Wcal′ statistic must be in the critical region, that is, Wcal′ < Wc′. Otherwise, we do not reject H0.
P-value (the probability associated to Wcal′ statistic calculated from the sample) can also be seen in Table H1. In this case, we reject H0 if P ≤ α.
The Kolmogorov-Smirnov and Shapiro-Wilk tests for normality can be solved by using IBM SPSS Statistics Software. The Shapiro-Francia test, on the other hand, will be elaborated through the Stata software, as we will see in the next section.
Based on the procedure that will be described, SPSS shows the results of the K-S and the S-W tests for the sample selected. The use of the images in this section has been authorized by the International Business Machines Corporation©.
Let’s consider the data presented in Example 9.1 that are available in the file Production_FarmingEquipment.sav. Let´s open the file and select Analyze → Descriptive Statistics → Explore …, as shown in Fig. 9.4.
From the Explore dialog box, we must select the variable we are interested in on the Dependent List, as shown in Fig. 9.5. Let´s click on Plots … (the Explore: Plots dialog box will open) and select the option Normality plots with tests (Fig. 9.6). Finally, let’s click on Continue and on OK.
The results of the Kolmogorov-Smirnov and Shapiro-Wilk tests for normality for the data in Example 9.1 are shown in Fig. 9.7.
According to Fig. 9.7, the result of the K-S statistic was 0.118, similar to the value calculated in Example 9.1. Since the sample has more than 30 elements, we should only use the K-S test to verify the normality of the data (the S-W test was applied to Example 9.2). Nevertheless, SPSS also makes the result of the S-W statistic available for the sample selected.
As presented in the introduction of this chapter, SPSS calculates the P-value that corresponds to the lowest significance level observed that would lead to the rejection of the null hypothesis. For the K-S and S-W tests the P-value corresponds to the lowest value of P from which Dcal > Dc and Wcal < Wc. As shown in Fig. 9.7, the value of P for the K-S test was of 0.200 (this probability can also be obtained from Table G in the Appendix, as shown in Example 9.1). Since P > 0.05, we do not reject the null hypothesis, which allows us to conclude, with a 95% confidence level, that the data distribution is normal. The S-W test also allows us to conclude that the data distribution follows a normal distribution.
Applying the same procedure to verify the normality of the data in Example 9.2 (the data are available in the file Production_Aircraft.sav), we get the results shown in Fig. 9.8.
Analogous to Example 9.2, the result of the S-W test was 0.978. The K-S test was not applied to this example due to the sample size (n < 30). The P-value of the S-W test is 0.857 (in Example 9.2, we saw that this probability would be between 0.50 and 0.90 and closer to 0.90) and, since P > 0.01, the null hypothesis is not rejected, which allows us to conclude that the data distribution in the population follows a normal distribution. We will use this test when estimating regression models in Chapter 13.
For this example, we can also conclude from the K-S test that the data distribution follows a normal distribution.
The Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia tests for normality can be solved by using Stata Statistical Software. The Kolmogorov-Smirnov test will be applied to Example 9.1, the Shapiro-Wilk test to Example 9.2, and the Shapiro-Francia test to Example 9.3. The use of the images in this section has been authorized by StataCorp LP©.
The data presented in Example 9.1 are available in the file Production_FarmingEquipment.dta. Let’s open this file and verify that the name of the variable being studied is production.
To elaborate the Kolmogorov-Smirnov test on Stata, we must specify the mean and the standard deviation of the variable that interests us in the test syntax, so, the command summarize, or simply sum, must be typed first, followed by the respective variable:
and we get Fig. 9.9. Therefore, we can see that the mean is 42.63889 and the standard deviation is 7.099911.
The Kolmogorov-Smirnov test is given by the following command:
ksmirnov production = normal((production-42.63889)/7.099911)
The result of the test can be seen in Fig. 9.10. We can see that the value of the statistic is similar to the one calculated in Example 9.1 and by SPSS software. Since P > 0.05, we conclude that the data distribution is normal.
The data presented in Example 9.2 are available in the file Production_Aircraft.dta. To elaborate the Shapiro-Wilk test on Stata, the syntax of the command is:
where the term variables⁎ should be substituted for the list of variables being considered. For the data in Example 9.2, we have a single variable called production, so, the command to be typed is:
The result of the Shapiro-Wilk test can be seen in Fig. 9.11. Since P > 0.05, we can conclude that the sample comes from a population with a normal distribution.
The data presented in Example 9.3 are available in the file Production_Bicycles.dta. To elaborate the Shapiro-Francia test on Stata, the syntax of the command is:
where the term variables⁎ should be substituted for the list of variables being considered. For the data in Example 9.3, we have a single variable called production, so, the command to be typed is:
The result of the Shapiro-Francia test can be seen in Fig. 9.12. We can see that the value is similar to the one calculated in Example 9.3 (W ′ = 0.989). Since P > 0.05, we conclude that the sample comes from a population with a normal distribution.
We will use this test when estimating regression models in Chapter 13.
One of the conditions to apply a parametric test to compare k population means is that the population variances, estimated from k representative samples, be homogeneous or equal. The most common tests to verify variance homogeneity are Bartlett’s χ2 (1937), Cochran’s C (1947a,b), Hartley’s Fmax (1950), and Levene’s F (1960) tests.
In the null hypothesis of variance homogeneity tests, the variances of k populations are homogeneous. In the alternative hypothesis, at least one population variance is different from the others. That is:
H0:σ21=σ22=…=σ2kH1:∃i,j:σ2i≠σ2j(i,j=1,…,k)
The original test proposed to verify variance homogeneity among groups is Bartlett’s χ2 test (1937). This test is very sensitive to normality deviations, and Levene’s test is an alternative in this case.
Bartlett’s statistic is calculated from q:
q=(N−k)⋅ln(S2p)−k∑i=1(ni−1)⋅ln(S2i)
where:
and
S2p=∑ki=1(ni−1)⋅S2iN−k
A correction factor c is applied to q statistic, with the following expression:
c=1+13⋅(k−1)⋅(k∑i=11ni−1−1N−k)
where Bartlett’s statistic (Bcal) approximately follows a chi-square distribution with k − 1 degrees of freedom:
Bcal=qc∼χ2k−1
From the previous expressions, we can see that the higher the difference between the variances, the higher the value of B. On the other hand, if all the sample variances are equal, its value will be zero. To confirm if the null hypothesis of variance homogeneity will be rejected or not, the value calculated must be compared to the statistic’s critical value (χc2), which is available in Table D in the Appendix.
This table provides the critical values of χc2 considering that P(χcal2 > χc2) = α (for a right-tailed test). Therefore, we reject the null hypothesis if Bcal > χc2. On the other hand, if Bcal ≤ χc2, we do not reject H0.
P-value (the probability associated to χcal2 statistic) can also be obtained from Table D. In this case, we reject H0 if P ≤ α.
Cochran’s C test (1947a,b) compares the group with the highest variance in relation to the others. The test demands that the data have a normal distribution.
Cochran’s C statistic is given by:
Ccal=S2max∑ki=1S2i
where:
Smax2 is the highest variance in the sample;
Si2 is the variance in sample i, i = 1, …, k.
According to Expression (9.17), if all the variances are equal, the value of the Ccal statistic is 1/k. The higher the difference of Smax2 in relation to the other variances, the more the value of Ccal gets closer to 1. To confirm whether the null hypothesis will be rejected or not, the value calculated must be compared to Cochran’s (Cc) statistic’s critical value, which is available in Table M in the Appendix.
The values of Cc vary depending on the number of groups (k), the number of degrees of freedom ν = max(ni − 1), and the value of α. Table M provides the critical values of Cc considering that P(Ccal > Cc) = α (for a right-tailed test). Thus, we reject H0 if Ccal > Cc. Otherwise, we do not reject H0.
Hartley’s Fmax test (1950) has the statistic that represents the relationship between the group with the highest variance (Smax2) and the group with the lowest variance (Smin2):
Fmax,cal=S2maxS2min
The test assumes that the number of observations per group is equal to (n1 = n2 = … = nk = n). If all the variances are equal, the value of Fmax will be 1. The higher the difference between Smax2 and Smin2, the higher the value of Fmax. To confirm if the null hypothesis of variance homogeneity will be rejected or not, the value calculated must be compared to the (Fmax,c) statistic’s critical value, which is available in Table N in the Appendix. The critical values vary depending on the number of groups (k), the number of degrees of freedom ν = n − 1, and the value of α, and this table provides the critical values of Fmax,c considering that P(Fmax,cal > Fmax,c) = α (for a right-tailed test). Therefore, we reject the null hypothesis H0 of variance homogeneity if Fmax,cal > Fmax,c. Otherwise, we do not reject H0.
P-value (the probability associated to Fmax,cal statistic) can also be obtained from Table N in the Appendix. In this case, we reject H0 if P ≤ α.
The advantage of Levene’s F-test in relation to other homogeneity of variance tests is that it is less sensitive to deviations from normality, in addition to being considered a more robust test.
Levene’s statistic is given by Expression (9.19) and it follows an F-distribution, approximately, with ν1 = k − 1 and ν2 = N − k degrees of freedom, for a significance level α:
Fcal=(N−k)(k−1)⋅∑ki=1ni⋅(ˉZi−ˉZ)2∑ki=1∑nij=1(Zij−ˉZi)2~H0Fk−1,N−k,α
where:
An expansion of Levene’s test can be found in Brown and Forsythe (1974).
From the F-distribution table (Table A in the Appendix), we can determine the critical values of Levene’s statistic (Fc = Fk − 1,N − k,α). Table A provides the critical values of Fc considering that P(Fcal > Fc) = α (right-tailed table). In order for the null hypothesis H0 to be rejected, the value of the statistic must be in the critical region, that is, Fcal > Fc. If Fcal ≤ Fc, we do not reject H0.
P-value (the probability associated to Fcal statistic) can also be obtained from Table A. In this case, we reject H0 if P ≤ α.
The use of the images in this section has been authorized by the International Business Machines Corporation©. To test the variance homogeneity between the groups, SPSS uses Levene’s test. The data presented in Example 9.4 are available in the file CustomerServices_Store.sav. In order to elaborate the test, we must click on Analyze → Descriptive Statistics → Explore …, as shown in Fig. 9.13.
Let’s include the variable Customer_services in the list of dependent variables (Dependent List) and the variable Store in the factor list (Factor List), as shown in Fig. 9.14.
Next, we must click on Plots … and select the option Untransformed in Spread vs Level with Levene Test, as shown in Fig. 9.15.
Finally, let’s click on Continue and on OK. The result of Levene’s test can also be obtained through the ANOVA test, by clicking on Analyze → Compare Means → One-Way ANOVA …. In Options …, we must select the option Homogeneity of variance test (Fig. 9.16).
The value of Levene’s statistic is 8.427, exactly the same as the one calculated previously. Since the significance level observed is 0.001, a value lower than 0.05, the test shows the rejection of the null hypothesis, which allows us to conclude, with a 95% confidence level, that the population variances are not homogeneous.
The use of the images in this section has been authorized by StataCorp LP©.
Levene’s statistical test for equality of variances is calculated on Stata by using the command robvar (robust-test for equality of variances), which has the following syntax:
in which the term variable⁎ should be substituted for the quantitative variable studied and the term groups⁎ by the categorical variable that represents them.
Let’s open the file CustomerServices_Store.dta that contains the data of Example 9.7. The three groups are represented by the variable store and the number of customers served by the variable services. Therefore, the command to be typed is:
The result of the test can be seen in Fig. 9.17. We can verify that the value of the statistic (8.427) is similar to the one calculated in Example 9.7 and to the one generated on SPSS, as well as the calculation of the probability associated to the statistic (0.001). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the variances are not homogeneous.
The main goal is to test if a population mean assumes a certain value or not.
This test is applied when a random sample of size n is obtained from a population with a normal distribution, whose mean (μ) is unknown and whose standard deviation (σ) is known. If the distribution of the population is not known, it is necessary to work with large samples (n > 30), because the central limit theorem guarantees that, as the sample size grows, the sample distribution of its mean gets closer and closer to a normal distribution.
For a bilateral test, the hypotheses are:
The statistical test used here refers to the sample mean (ˉX). In order for the sample mean to be compared to the value in the table, it must be standardized, so:
Zcal=ˉX−μ0σˉX~N(0,1),whereσˉX=σ√n
The critical values of the zc statistic are shown in Table E in the Appendix. This table provides the critical values of zc considering that P(Zcal > zc) = α (for a right-tailed test). For a bilateral test, we must consider P(Zcal > zc) = α/2, since P(Zcal < − zc) + P(Zcal > zc) = α. The null hypothesis H0 of a bilateral test is rejected if the value of the Zcal statistic lies in the critical region, that is, if Zcal < − zc or Zcal > zc. Otherwise, we do not reject H0.
The unilateral probabilities associated to Zcal statistic (P) can also be obtained from Table E. For a unilateral test, we consider that P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Therefore, for both tests, we reject H0 if P ≤ α.
Student’s t-test for one sample is applied when we do not know the population standard deviation (σ), so, its value is estimated from the sample standard deviation (S). However, to substitute σ for S in Expression (9.20), the distribution of the variable will no longer be normal; it will become a Student’s t-distribution with n − 1 degrees of freedom.
Analogous to the z test, Student’s t-test for one sample assumes the following hypotheses for a bilateral test:
And the calculation of the statistic becomes:
Tcal=ˉX−μ0S/√n~tn−1
The value calculated must be compared to the value in Student’s t-distribution table (Table B in the Appendix). This table provides the critical values of tc considering that P(Tcal > tc) = α (for a right-tailed test). For a bilateral test, we have P(Tcal < − tc) = α/2 = P(Tcal > tc), as shown in Fig. 9.18.
Therefore, for a bilateral test, the null hypothesis is rejected if Tcal < − tc or Tcal > tc. If − tc ≤ Tcal ≤ tc, we do not reject H0.
The unilateral probabilities associated to Tcal statistic (P1) can also be obtained from Table B. For a unilateral test, we have P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Therefore, for both tests, we reject H0 if P ≤ α.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
If we wish to compare means from a single sample, SPSS makes Student’s t-test available. The data in Example 9.9 are available in the file T_test_One_Sample.sav. The procedure to apply the test from Example 9.9 will be described. Initially, let´s select Analyze → Compare Means → One-Sample T Test …, as shown in Fig. 9.19.
We must select the variable Time and specify the value 18 that will be tested in Test Value, as shown in Fig. 9.20.
Now, we must click on Options … to define the desired confidence level (Fig. 9.21).
Finally, let’s click on Continue and on OK. The results of the test are shown in Fig. 9.22.
This figure shows the result of the t-test (similar to the value calculated in Example 9.9) and the associated probability (P-value) for a bilateral test. For a unilateral test, the associated probability is 0.0195 (we saw in Example 9.9 that this probability would be between 0.01 and 0.025). Since 0.0195 > 0.01, we do not reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there was no improvement in the average processing time.
The use of the images in this section has been authorized by StataCorp LP©.
Student’s t-test is elaborated on Stata by using the command ttest. For one population mean, the test syntax is:
where the term variable⁎ should be substituted for the name of the variable considered in the analysis and # for the value of the population mean to be tested.
The data in Example 9.9 are available in the file T_test_One_Sample.dta. In this case, the variable being analyzed is called time and the goal is to verify if the average processing time is still 18 min, so, the command to be typed is:
The result of the test can be seen in Fig. 9.23. We can see that the calculated value of the statistic (− 2.180) is similar to the one calculated in Example 9.9 and also generated on SPSS, as well as the associated probability for a left-tailed test (0.0196). Since P > 0.01, we do not reject the null hypothesis, which allows us to conclude, with a 99% confidence level, that there was no improvement in the processing time.
The t-test for two independent samples is applied to compare the means of two random samples (X1i, i = 1, …, n1; X2j, j = 1, …, n2) obtained from the same population. In this test, the population variance is unknown.
For a bilateral test, the null hypothesis of the test states that the population means are the same. If the population means are different, the null hypothesis is rejected, so:
The calculation of the T statistic depends on the comparison of the population variances between the groups.
Considering that the population variances are different, the calculation of the T statistic is given by:
Tcal=(ˉX1−ˉX2)√S21n1+S22n2
with the following degrees of freedom:
ν=(S21n1+S22n2)2(S21/n1)2(n1−1)+(S22/n2)2(n2−1)
When the population variances are homogeneous, to calculate the T statistic, the researcher has to use:
Tcal=(ˉX1−ˉX2)Sp⋅√1n1+1n2
where:
Sp=√(n1−1)⋅S21+(n2−1)⋅S22n1+n2−2
and Tcal follows Student’s t-distribution with ν = n1 + n2 − 2 degrees of freedom.
The value calculated must be compared to the value in Student’s t-distribution table (Table B in the Appendix). This table provides the critical values of tc considering that P(Tcal > tc) = α (for a right-tailed test). For a bilateral test, we have P(Tcal < − tc) = α/2 = P(Tcal > tc), as shown in Fig. 9.24.
Therefore, for a bilateral test, if the value of the statistic lies in the critical region, that is, if Tcal < − tc or Tcal > tc, the test allows us to reject the null hypothesis. On the other hand, if − tc ≤ Tcal ≤ tc, we do not reject H0.
The unilateral probabilities associated to Tcal statistic (P1) can also be obtained from Table B. For a unilateral test, we have P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Therefore, for both tests, we reject H0 if P ≤ α.
The data in Example 9.10 are available in the file T_test_Two_Independent_Samples.sav. The procedure for solving Student’s t-test to compare two population means from two independent random samples on SPSS is described. The use of the images in this section has been authorized by the International Business Machines Corporation©.
We must click on Analyze → Compare Means → Independent-Samples T Test …, as shown in Fig. 9.26.
Let’s include the variable Time in Test Variable(s) and the variable Supplier in Grouping Variable. Next, let’s click on Define Groups … to define the groups (categories) of the variable Supplier, as shown in Fig. 9.27.
If the confidence level desired by the researcher is different from 95%, the button Options … must be selected to change it. Finally, let’s click on OK. The results of the test are shown in Fig. 9.28.
The value of the t statistic for the test is − 9.708 and the associated bilateral probability is 0.000 (P < 0.05), which leads us to reject the null hypothesis, and allows us to conclude, with a 95% confidence level, that the population means are different. We can notice that Fig. 9.28 also shows the result of Levene’s test. Since the significance level observed is 0.694, value greater than 0.05, we can also conclude, with a 95% confidence level, that the variances are homogeneous.
The use of the images in this section has been authorized by StataCorp LP©.
The t-test to compare the means of two independent groups on Stata is elaborated by using the following syntax:
where the term variable⁎ must be substituted for the quantitative variable being analyzed, and the term groups⁎ for the categorical variable that represents them.
The data in Example 9.10 are available in the file T_test_Two_Independent_Samples.dta. The variable supplier shows the groups of suppliers. The values for each group of suppliers are specified in the variable time. Thus, we must type the following command:
The result of the test can be seen in Fig. 9.29. We can see that the calculated value of the statistic (− 9.708) is similar to the one calculated in Example 9.10 and also generated on SPSS, as well as the associated probability for a bilateral test (0.000). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that the population means are different.
This test is applied to check whether the means of two paired or related samples, obtained from the same population (before and after) with a normal distribution, are significantly different or not. Besides the normality of the data of each sample, the test requires the homogeneity of the variances between the groups.
Different from the t-test for two independent samples, first, we must calculate the difference between each pair of values in position i (di = Xbefore,i − Xafter,i, i = 1, …, n) and, after that, test the null hypothesis that the mean of the differences in the population is zero.
For a bilateral test, we have:
The Tcal statistic for the test is given by:
Tcal=ˉd−μdSd/√n~tν=n−1
where:
ˉd=∑ni=1dn
and
Sd=√∑ni=1(di−ˉd)2n−1
The value calculated must be compared to the value in Student’s t-distribution table (Table B in the Appendix). This table provides the critical values of tc considering that P(Tcal > tc) = α (for a right-tailed test). For a bilateral test, we have P(Tcal < − tc) = α/2 = P(Tcal > tc), as shown in Fig. 9.30.
Therefore, for a bilateral test, the null hypothesis is rejected if Tcal < − tc or Tcal > tc. If − tc ≤ Tcal ≤ tc, we do not reject H0.
The unilateral probabilities associated to Tcal statistic (P1) can also be obtained from Table B. For a unilateral test, we have P = P1. For a bilateral test, this probability must be doubled (P = 2P1). Therefore, for both tests, we reject H0 if P ≤ α.
First, we must test the normality of the data in each sample, as well as the variance homogeneity between the groups. Using the same procedures described in Sections 9.3.3 and 9.4.5 (the data must be placed in a table the same way as in Section 9.4.5), we obtain Figs. 9.32 and 9.33.
Based on Fig. 9.32, we conclude that there is normality of the data for each sample. From Fig. 9.33, we can conclude that the variances between the samples are homogeneous.
The use of the images in this section has been authorized by the International Business Machines Corporation©. To solve Student’s t-test for two paired samples on SPSS, we must open the file T_test_Two_Paired_Samples.sav. Then, we have to click on Analyze → Compare Means → Paired-Samples T Test …, as shown in Fig. 9.34.
We must select the variable Before and move it to Variable1 and the variable After to Variable2, as shown in Fig. 9.35.
If the desired confidence level is different from 95%, we must click on Options … to change it. Finally, let’s click on OK. The results of the test are shown in Fig. 9.36.
The value of the t-test is 4.385 and the significance level observed for a bilateral test is 0.002, value less than 0.05, which leads us to reject the null hypothesis and allows us to conclude, with a 95% confidence level, that there is a significant difference between the times spent by the operators before and after the training course.
The t-test to compare the means of two paired groups will be solved on Stata for the data in Example 9.11. The use of the images in this section has been authorized by StataCorp LP©.
Therefore, let’s open the file T_test_Two_Paired_Samples.dta. The paired variables are called before and after. In this case, we must type the following command:
The result of the test can be seen in Fig. 9.37. We can see that the calculated value of the statistic (4.385) is similar to the one calculated in Example 9.11 and on SPSS, as well as the probability associated to the statistic for a bilateral test (0.0018). Since P < 0.05, we reject the null hypothesis that the times spent by the operators before and after the training course are the same, with a 95% confidence level.
ANOVA is a test used to compare the means of three or more populations, through the analysis of sample variances. The test is based on a sample obtained from each population, aiming at determining if the differences between the sample means suggest significant differences between the population means, or if such differences are only a result of the implicit variability of the sample.
ANOVA’s assumptions are:
One-way ANOVA is an extension of Student’s t-test for two population means, allowing the researcher to compare three or more population means.
The null hypothesis of the test states that the population means are the same. If there is at least one group with a mean that is different from the others, the null hypothesis is rejected.
As stated in Fávero et al. (2009), the one-way ANOVA allows researcher to verify the effect of a qualitative explanatory variable (factor) on a quantitative dependent variable. Each group includes the observations of the dependent variable in one category of the factor.
Assuming that size n independent samples are obtained from k populations (k ≥ 3) and that the means of these populations can be represented by μ1, μ2, …, μk, the analysis of variance tests the following hypotheses:
H0:μ1=μ2=…=μkH1:∃(i,j)μi≠μj,i≠j
According to Maroco (2014), in general, the observations for this type of problem can be represented according to Table 9.2.
Table 9.2
Samples or Groups | |||
---|---|---|---|
1 | 2 | … | K |
Y11 | Y12 | … | Y1k |
Y21 | Y22 | … | Y2k |
… | … | … | … |
Yn11 | Yn22 | … | Ynkk |
where Yij represents observation i of sample or group j (i = 1, …, nj; j = 1, …, k) and nj is the dimension of sample or group j. The dimension of the global sample is N = ∑i = 1kni. Pestana and Gageiro (2008) present the following model:
Yij=μi+ɛij
Yij=μ+(μi−μ)⋅ɛij
Yij=μ+αi+ɛij
where:
Therefore, ANOVA assumes that each group comes from a population with a normal distribution, mean μi, and a homogeneous variance, that is, Yij ~ N(μi,σ), resulting in the hypothesis that the errors (residuals) have a normal distribution with a mean equal to zero and a constant variance, that is, ɛij ~ N(0,σ), besides being independent (Fávero et al., 2009).
The technique’s hypotheses are tested from the calculation of the group variances, and that is where the name ANOVA comes from. The technique involves the calculation of the variations between the groups (ˉYi−ˉY) and within each group (Yij−ˉYi). The residual sum of squares within groups (RSS) is calculated by:
RSS=k∑i=1nj∑j=1(Yij−ˉYi)2
The residual sum of squares between groups, or the sum of squares of the factor (SSF), is given by:
SSF=k∑i=1ni⋅(ˉYi−ˉY)2
Therefore, the total sum is:
TSS=RSS+SSF=k∑i=1ni∑j=1(Yij−ˉY)2
According to Fávero et al. (2009) and Maroco (2014), the ANOVA statistic is given by the division between the variance of the factor (SSF divided by k − 1 degrees of freedom) and the variance of the residuals (RSS divided by N − k degrees of freedom):
Fcal=SSF(k−1)RSS(N−k)=MSFMSR
where:
Table 9.3 summarizes the calculations of the one-way ANOVA.
Table 9.3
Source of Variation | Sum of Squares | Degrees of Freedom | Mean Squares | F |
---|---|---|---|---|
Between the groups | SSF=k∑i=1ni(ˉYi−ˉY)2 | k − 1 | MSF=SSF(k−1) | F=MSFMSR |
Within the groups | RSS=k∑i=1ni∑j=1(Yij−ˉYi)2 | N − k | MSR=RSS(N−k) | |
Total | TSS=k∑i=1ni∑j=1(Yij−ˉY)2 | N − 1 |
Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.
The value of F can be null or positive, but never negative. Therefore, ANOVA requires an asymmetrical F-distribution to the right.
The calculated value (Fcal) must be compared to the value in the F-distribution table (Table A in the Appendix). This table provides the critical values of Fc = Fk − 1,N − k,α where P(Fcal > Fc) = α (right-tailed test). Therefore, one-way ANOVA’s null hypothesis is rejected if Fcal > Fc. Otherwise, if (Fcal ≤ Fc), we do not reject H0.
We will use these concepts when we study the estimation of regression models in Chapter 13.
The use of the images in this section has been authorized by the International Business Machines Corporation©. The data in Example 9.12 are available in the file One_Way_ANOVA.sav. First of all, let´s click on Analyze → Compare Means → One-Way ANOVA …, as shown in Fig. 9.40.
Let's include the variable Sucrose in the list of dependent variables (Dependent List) and the variable Supplier in the box Factor, according to Fig. 9.41.
After that, we must click on Options … and select the option Homogeneity of variance test (Levene’s test for variance homogeneity). Finally, let’s click on Continue and on OK to obtain the result of Levene’s test, besides the ANOVA table. Since ANOVA does not make the normality test available, it must be obtained by applying the same procedure described in Section 9.3.3.
According to Fig. 9.42, we can verify that each one of the groups has data that follow a normal distribution. Moreover, through Fig. 9.43, we can conclude that the variances between the groups are homogeneous.
From the ANOVA table (Fig. 9.44), we can see that the value of the F-test is 4.676 and the respective P-value is 0.017 (we saw in Example 9.12 that this value would be between 0.01 and 0.025), value less than 0.05. This leads us to reject the null hypothesis and allows us to conclude, with a 95% confidence level, that at least one of the population means is different from the others (there are differences in the percentage of sucrose in the honey of the three suppliers).
The use of the images in this section has been authorized by StataCorp LP©.
The one-way ANOVA on Stata is generated from the following syntax:
in which the term variabley⁎ should be substituted for the quantitative dependent variable and the term factor⁎ for the qualitative explanatory variable.
The data in Example 9.12 are available in the file One_Way_Anova.dta. The quantitative dependent variable is called sucrose and the factor is represented by the variable supplier. Thus, we must type the following command:
The result of the test can be seen in Fig. 9.45. We can see that the calculated value of the statistic (4.68) is similar to the one calculated in Example 9.12 and also generated on SPSS, as well as the probability associated to the value of the statistic (0.017). Since P < 0.05, the null hypothesis is rejected, which allows us to conclude, with a 95% confidence level, that at least one of the population means is different from the others.