Factorial ANOVA is an extension of the one-way ANOVA, assuming the same assumptions, but considering two or more factors. Factorial ANOVA presumes that the quantitative dependent variable is influenced by more than one qualitative explanatory variable (factor). It also tests the possible interactions between the factors, through the resulting effect of the combination of factor A’s level i and factor B’s level j, as discussed by Pestana and Gageiro (2008), Fávero et al. (2009), and Maroco (2014).
For Pestana and Gageiro (2008) and Fávero et al. (2009), the main objective of the factorial ANOVA is to determine whether the means for each factor level are the same (an isolated effect of the factors on the dependent variable), and to verify the interaction between the factors (the joint effect of the factors on the dependent variable).
For educational purposes, the factorial ANOVA will be described for the two-way model.
According to Fávero et al. (2009) and Maroco (2014), the observations of the two-way ANOVA can be represented, in general, as shown in Table 9.4. For each cell, we can see the values of the dependent variable in the factors A and B that are being studied.
Table 9.4
Factor B | |||||
---|---|---|---|---|---|
1 | 2 | … | b | ||
Factor A | 1 | Y111 | Y121 | … | Yab1 |
Y112 | Y122 | Yab2 | |||
⋮ | ⋮ | ⋮ | |||
Y11n | Y12n | Yabn | |||
2 | Y211 | Y221 | … | Y2b1 | |
Y212 | Y222 | Y2b2 | |||
⋮ | ⋮ | ⋮ | |||
Y21n | Y22n | Y2bn | |||
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |
a | Ya11 | Ya21 | … | Yab1 | |
Ya12 | Ya22 | Yab2 | |||
⋮ | ⋮ | ⋮ | |||
Ya1n | Ya2n | Yabn |
Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.
where Yijk represents observation k (k = 1, …, n) of factor A’s level i (i = 1, …, a) and of factor B’s level j (j = 1, …, b).
First, in order to check the isolated effects of factors A and B, we must test the following hypotheses (Fávero et al., 2009; Maroco, 2014):
and
Now, in order to verify the joint effect of the factors on the dependent variable, we must test the following hypotheses (Fávero et al., 2009; Maroco, 2014):
The model presented by Pestana and Gageiro (2008) can be described as:
where:
To standardize the effects of the levels chosen of both factors, we must assume that:
Let’s consider , , , and the general mean of the global sample, the mean per sample, the mean of factor A’s level i, and the mean of factor B’s level j, respectively.
We can describe the residual sum of squares (RSS) as:
On the other hand, the sum of squares of factor A (SSFA), the sum of squares of factor B (SSFB), and the sum of squares of the interaction (SSFAB) are represented below in Expressions (9.43)–(9.45), respectively:
Therefore, the sum of total squares can be written as follows:
Thus, the ANOVA statistic for factor A is given by:
where:
On the other hand, the ANOVA statistic for factor B is given by:
where:
And the ANOVA statistic for the interaction is represented by:
where:
The calculations of the two-way ANOVA are summarized in Table 9.5.
Table 9.5
Source of Variation | Sum of Squares | Degrees of Freedom | Mean Squares | F |
---|---|---|---|---|
Factor A | a − 1 | |||
Factor B | b − 1 | |||
Interaction | (a − 1). (b − 1) | |||
Error | (n − 1) ⋅ ab | |||
Total | N − 1 |
Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.
The calculated values of the statistics (FAcal, FBcal, and FABcal) must be compared to the critical values obtained from the F-distribution table (Table A in the Appendix): FAc = Fa − 1, (n − 1)ab, α, FBc = Fb − 1, (n − 1)ab, α, and FABc = F(a − 1)(b − 1), (n − 1)ab, α. For each statistic, if the value lies in the critical region (FAcal > FAc, FBcal > FBc, FABcal > FABc), we must reject the null hypothesis. Otherwise, we do not reject H0.
The use of the images in this section has been authorized by the International Business Machines Corporation©.
First, we must verify if there is normality in the variable Time (metric) in the model (as shown in Fig. 9.46). According to this figure, we can conclude that variable Time follows a normal distribution, with a 95% confidence level. The hypothesis of variance homogeneity will be verified in Step 4.
The null hypothesis H0 also states that there is no interaction between the factor Company and the factor Day_of_the_week, that is, H0: γij = 0 for i ≠ j.
In order to do that, let´s click on Analyze → General Linear Model → Univariate …, as shown in Fig. 9.47.
After that, let´s include the variable Time in the box of dependent variables (Dependent Variable) and the variables Company and Day_of_the_week in the box of Fixed Factor(s), as shown in Fig. 9.48.
This example is based on the one-way ANOVA, in which the factors are fixed. If one of the factors were chosen randomly, it would be inserted into the box Random Factor(s), resulting in a case of a three-way ANOVA. The button Model … defines the variance analysis model to be tested. Through the button Contrasts …, we can assess if the category of one of the factors is significantly different from the other categories of the same factor. Charts can be constructed through the button Plots …, thus allowing the visualization of the existence or nonexistence of interactions between the factors. Button Post Hoc …, on the other hand, allows us to compare multiple means. Finally, from the button Options …, we can obtain descriptive statistics and the result of Levene’s variance homogeneity test, as well as select the appropriate significance level (Fávero et al., 2009; Maroco, 2014).
Therefore, since we want to test variance homogeneity, we must select, in Options …, the option Homogeneity tests, as shown in Fig. 9.49.
Finally, let’s click on Continue and on OK to obtain Levene’s variance homogeneity test and the two-way ANOVA table.
In Fig. 9.50, we can see that the variances between groups are homogeneous (P = 0.451 > 0.05).
Based on Fig. 9.51, we can conclude that there are no significant differences between the travel times of the companies analyzed, that is, the factor Company does not have a significant impact on the variable Time (P = 0.330 > 0.05).
On the other hand, we conclude that there are significant differences between the days of the week, that is, the factor Day_of_the_week has a significant effect on the variable Time (P = 0.003 < 0.05).
We finally conclude that there is no significant interaction, with a 95% confidence level, between the two factors Company and Day_of_the_week, since P = 0.898 > 0.05.
The use of the images in this section has been authorized by StataCorp LP©.
The command anova on Stata specifies the dependent variable being analyzed, as well as the respective factors. The interactions are specified using the character # between the factors. Thus, the two-way ANOVA is generated through the following syntax:
or simply:
in which the term variabley⁎ should be substituted for the quantitative dependent variable and the terms factorA⁎ and factorB⁎ for the respective factors.
If we type the syntax anova variableY⁎ factorA⁎ factorB⁎, only the ANOVA for each factor will be elaborated, and not between the factors.
The data presented in Example 9.13 are available in the file Two_Way_ANOVA.dta. The quantitative dependent variable is called time and the factors correspond to the variables company and day_of_the_week. Thus, we must type the following command:
anova time company##day_of_the_week
The results can be seen in Fig. 9.52 and are similar to those presented on SPSS, which allows us to conclude, with a 95% confidence level, that only the factor day_of_the_week has a significant effect on the variable time (P = 0.003 < 0.05), and that there is no significant interaction between the two factors analyzed (P = 0.898 > 0.05).
The two-way ANOVA can be generalized to three or more factors. According to Maroco (2014), the model becomes very complex, since the effect of multiple interactions can make the effect of the factors a bit confusing. The generic model with three factors presented by the author is:
This chapter presented the concepts and objectives of parametric hypotheses tests and the general procedures for constructing each one of them.
We studied the main types of tests and the situations in which each one of them must be used. Moreover, the advantages and disadvantages of each test were established, as well as their assumptions.
We studied the tests for normality (Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia), variance homogeneity tests (Bartlett’s χ2, Cochran’s C, Hartley’s Fmax, and Levene’s F), Student’s t-test for one population mean, for two independent means, and for two paired means, as well as ANOVA and its extensions.
Regardless of the application’s main goal, parametric tests can provide good and interesting research results that will be useful in the decision-making process. From a conscious choice of the modeling software, the correct use of each test must always be made based on the underlying theory, without ever ignoring the researcher’s experience and intuition.
12.5 | 14.2 | 13.4 | 14.6 | 12.7 | 10.9 | 16.5 | 14.7 | 11.2 | 10.9 | 12.1 | 12.8 |
13.8 | 13.5 | 13.2 | 14.1 | 15.5 | 16.2 | 10.8 | 14.3 | 12.8 | 12.4 | 11.4 | 16.2 |
14.3 | 14.8 | 14.6 | 13.7 | 13.5 | 10.8 | 10.4 | 11.5 | 11.9 | 11.3 | 14.2 | 11.2 |
13.4 | 16.1 | 13.5 | 17.5 | 16.2 | 15.0 | 14.2 | 13.2 | 12.4 | 13.4 | 12.7 | 11.2 |
Hospital 1
Hospital 2
Before the treatment
After the treatment
Factory 1
Factory 2
Factory 3