9.8.2 Factorial ANOVA

Factorial ANOVA is an extension of the one-way ANOVA, assuming the same assumptions, but considering two or more factors. Factorial ANOVA presumes that the quantitative dependent variable is influenced by more than one qualitative explanatory variable (factor). It also tests the possible interactions between the factors, through the resulting effect of the combination of factor A’s level i and factor B’s level j, as discussed by Pestana and Gageiro (2008), Fávero et al. (2009), and Maroco (2014).

For Pestana and Gageiro (2008) and Fávero et al. (2009), the main objective of the factorial ANOVA is to determine whether the means for each factor level are the same (an isolated effect of the factors on the dependent variable), and to verify the interaction between the factors (the joint effect of the factors on the dependent variable).

For educational purposes, the factorial ANOVA will be described for the two-way model.

9.8.2.1 Two-Way ANOVA

According to Fávero et al. (2009) and Maroco (2014), the observations of the two-way ANOVA can be represented, in general, as shown in Table 9.4. For each cell, we can see the values of the dependent variable in the factors A and B that are being studied.

Table 9.4

Observations of the Two-Way ANOVA
Factor B
12b
Factor A1Y111Y121Yab1
Y112Y122Yab2
Y11nY12nYabn
2Y211Y221Y2b1
Y212Y222Y2b2
Y21nY22nY2bn
aYa11Ya21Yab1
Ya12Ya22Yab2
Ya1nYa2nYabn

Table 9.4

Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.

where Yijk represents observation k (k = 1, …, n) of factor A’s level i (i = 1, …, a) and of factor B’s level j (j = 1, …, b).

First, in order to check the isolated effects of factors A and B, we must test the following hypotheses (Fávero et al., 2009; Maroco, 2014):

H0A:μ1=μ2==μaH1A:ijμiμj,ijij=1a

si109_e  (9.37)

and

H0B:μ1=μ2==μbH1B:ijμiμj,ijij=1b

si110_e  (9.38)

Now, in order to verify the joint effect of the factors on the dependent variable, we must test the following hypotheses (Fávero et al., 2009; Maroco, 2014):

H0:γij=0,forijthereisnointeraction between the factorsAandBH1:γij0,forijthere is interaction between the factorsAandB

si111_e  (9.39)

The model presented by Pestana and Gageiro (2008) can be described as:

Yijk=μ+αi+βj+γij+ɛijk

si112_e  (9.40)

where:

  • μ is the population’s global mean;
  • αi is the effect of factor A’s level i, given by μi − μ;
  • βi is the effect of factor B’s level j, given by μj − μ;
  • γij is the interaction between the factors;
  • ɛijk is the random error that follows a normal distribution with a mean equal to zero and a constant variance.

To standardize the effects of the levels chosen of both factors, we must assume that:

i=1aαi=j=1bβj=i=1aγij=i=1bγij=0

si113_e  (9.41)

Let’s consider Y¯si114_e, Y¯ijsi115_e, Y¯isi116_e, and Y¯jsi117_e the general mean of the global sample, the mean per sample, the mean of factor A’s level i, and the mean of factor B’s level j, respectively.

We can describe the residual sum of squares (RSS) as:

RSS=i=1aj=1bk=1nYijkY¯ij2

si118_e  (9.42)

On the other hand, the sum of squares of factor A (SSFA), the sum of squares of factor B (SSFB), and the sum of squares of the interaction (SSFAB) are represented below in Expressions (9.43)(9.45), respectively:

SSFA=bni=1aY¯iY¯2

si119_e  (9.43)

SSFB=anj=1bY¯jY¯2

si120_e  (9.44)

SSFAB=ni=1aj=1bY¯ijY¯iY¯j+Y¯2

si121_e  (9.45)

Therefore, the sum of total squares can be written as follows:

TSS=RSS+SSFA+SSFB+SSFAB=i=1aj=1bk=1nYijkY¯2

si122_e  (9.46)

Thus, the ANOVA statistic for factor A is given by:

FA=SSFAa1RSSn1ab=MSFAMSR

si123_e  (9.47)

where:

  • MSFA is the mean square of factor A;
  • MSR is the mean square of the errors.

On the other hand, the ANOVA statistic for factor B is given by:

FB=SSFBb1RSSn1ab=MSFBMSR

si124_e  (9.48)

where:

  • MSFB is the mean square of factor B.

And the ANOVA statistic for the interaction is represented by:

FAB=SSFABa1b1RSSn1ab=MSFABMSR

si125_e  (9.49)

where:

  • MSFAB is the mean square of the interaction.

The calculations of the two-way ANOVA are summarized in Table 9.5.

Table 9.5

Calculations of the Two-Way ANOVA
Source of VariationSum of SquaresDegrees of FreedomMean SquaresF
Factor ASSFA=bni=1aY¯iY¯2si7_ea − 1MSFA=SSFAa1si8_eFA=MSFAMSRsi9_e
Factor BSSFB=anj=1bY¯jY¯2si10_eb − 1MSFB=SSFBb1si11_eFB=MSFBMSRsi12_e
InteractionSSFAB=ni=1aj=1bY¯ijY¯iY¯j+Y¯2si13_e(a − 1). (b − 1)MSFAB=SSFABa1b1si14_eFAB=MSFABMSRsi15_e
ErrorRSS=i=1aj=1bk=1nYijkY¯ij2si16_e(n − 1) ⋅ abMSR=RSSn1absi17_e
TotalTSS=i=1aj=1bk=1nYijkY¯2si18_eN − 1

Table 9.5

Source: Fávero, L.P., Belfiore, P., Silva, F.L., Chan, B.L., 2009. Análise de dados: modelagem multivariada para tomada de decisões. Campus Elsevier, Rio de Janeiro; Maroco, J., 2014. Análise estatística com o SPSS Statistics, sixth ed. Edições Sílabo, Lisboa.

The calculated values of the statistics (FAcal, FBcal, and FABcal) must be compared to the critical values obtained from the F-distribution table (Table A in the Appendix): FAc = Fa − 1, (n − 1)ab, α, FBc = Fb − 1, (n − 1)ab, α, and FABc = F(a − 1)(b − 1), (n − 1)ab, α. For each statistic, if the value lies in the critical region (FAcal > FAc, FBcal > FBc, FABcal > FABc), we must reject the null hypothesis. Otherwise, we do not reject H0.

Example 9.13

Using the Two-Way ANOVA

A sample with 24 passengers who travel from Sao Paulo to Campinas in a certain week is collected. The following variables are analyzed (1) travel time in minutes, (2) the bus company chosen, and (3) the day of the week. The main objective is to verify if there is a relationship between the travel time and the bus company, between the travel time and the day of the week, and between the bus company and the day of the week. The levels considered in the variable bus company are Company A (1), Company B (2), and Company C (3). On the other hand, the levels regarding the day of the week are Monday (1), Tuesday (2), Wednesday (3), Thursday (4), Friday (5), Saturday (6), and Sunday (7). The results of the sample are shown in Table 9.E.17 and are available in the file Two_Way_ANOVA.sav as well. Test these hypotheses, considering a 5% significance level.

Table 9.E.17

Data From Example 9.13 (Using the Two-Way ANOVA)
Time (Min)CompanyDay of the Week
9024
10015
7216
7631
8522
9515
7931
10024
7017
8031
8523
9015
7727
8012
8534
7427
7236
9215
8424
8013
7921
7036
8835
8424
9.8.2.1.1 Solving the Two-Way ANOVA Test by Using SPSS Software

The use of the images in this section has been authorized by the International Business Machines Corporation©.

  • Step 1: In this case, the most suitable test is the two-way ANOVA.

First, we must verify if there is normality in the variable Time (metric) in the model (as shown in Fig. 9.46). According to this figure, we can conclude that variable Time follows a normal distribution, with a 95% confidence level. The hypothesis of variance homogeneity will be verified in Step 4.

  • Step 2: The null hypothesis H0 of the two-way ANOVA for this example assumes that the population means of each level of the factor Company and of each level of the factor Day_of_the_week are equal, that is, H0A : μ1 = μ2 = μ3 and H0B : μ1 = μ2 = … = μ7.
Fig. 9.46
Fig. 9.46 Results of the normality tests on SPSS.

The null hypothesis H0 also states that there is no interaction between the factor Company and the factor Day_of_the_week, that is, H0: γij = 0 for i ≠ j.

  • Step 3: The significance level to be considered is 5%.
  • Step 4: The F statistics in ANOVA for the factor Company, for the factor Day_of_the_week, and for the interaction Company ⁎ Day_of_the_week will be obtained through the SPSS software, according to the procedure specified below.

In order to do that, let´s click on Analyze → General Linear Model → Univariate …, as shown in Fig. 9.47.

Fig. 9.47
Fig. 9.47 Procedure for elaborating the two-way ANOVA on SPSS.

After that, let´s include the variable Time in the box of dependent variables (Dependent Variable) and the variables Company and Day_of_the_week in the box of Fixed Factor(s), as shown in Fig. 9.48.

Fig. 9.48
Fig. 9.48 Selection of the variables to elaborate the two-way ANOVA.

This example is based on the one-way ANOVA, in which the factors are fixed. If one of the factors were chosen randomly, it would be inserted into the box Random Factor(s), resulting in a case of a three-way ANOVA. The button Model … defines the variance analysis model to be tested. Through the button Contrasts …, we can assess if the category of one of the factors is significantly different from the other categories of the same factor. Charts can be constructed through the button Plots …, thus allowing the visualization of the existence or nonexistence of interactions between the factors. Button Post Hoc …, on the other hand, allows us to compare multiple means. Finally, from the button Options …, we can obtain descriptive statistics and the result of Levene’s variance homogeneity test, as well as select the appropriate significance level (Fávero et al., 2009; Maroco, 2014).

Therefore, since we want to test variance homogeneity, we must select, in Options …, the option Homogeneity tests, as shown in Fig. 9.49.

Fig. 9.49
Fig. 9.49 Test of variance homogeneity.

Finally, let’s click on Continue and on OK to obtain Levene’s variance homogeneity test and the two-way ANOVA table.

In Fig. 9.50, we can see that the variances between groups are homogeneous (P = 0.451 > 0.05).

Fig. 9.50
Fig. 9.50 Results of Levene’s test on SPSS.

Based on Fig. 9.51, we can conclude that there are no significant differences between the travel times of the companies analyzed, that is, the factor Company does not have a significant impact on the variable Time (P = 0.330 > 0.05).

Fig. 9.51
Fig. 9.51 Results of the two-way ANOVA for Example 9.13 on SPSS.

On the other hand, we conclude that there are significant differences between the days of the week, that is, the factor Day_of_the_week has a significant effect on the variable Time (P = 0.003 < 0.05).

We finally conclude that there is no significant interaction, with a 95% confidence level, between the two factors Company and Day_of_the_week, since P = 0.898 > 0.05.

9.8.2.1.2 Solving the Two-Way ANOVA Test by Using Stata Software

The use of the images in this section has been authorized by StataCorp LP©.

The command anova on Stata specifies the dependent variable being analyzed, as well as the respective factors. The interactions are specified using the character # between the factors. Thus, the two-way ANOVA is generated through the following syntax:

  • anova variableY⁎ factorA⁎ factorB⁎ factorA#factorB

or simply:

  • anova variabley⁎ factorA⁎## factorB

in which the term variabley should be substituted for the quantitative dependent variable and the terms factorA and factorB for the respective factors.

If we type the syntax anova variableY⁎ factorA⁎ factorB, only the ANOVA for each factor will be elaborated, and not between the factors.

The data presented in Example 9.13 are available in the file Two_Way_ANOVA.dta. The quantitative dependent variable is called time and the factors correspond to the variables company and day_of_the_week. Thus, we must type the following command:

anova time company##day_of_the_week

The results can be seen in Fig. 9.52 and are similar to those presented on SPSS, which allows us to conclude, with a 95% confidence level, that only the factor day_of_the_week has a significant effect on the variable time (P = 0.003 < 0.05), and that there is no significant interaction between the two factors analyzed (P = 0.898 > 0.05).

Fig. 9.52
Fig. 9.52 Results of the two-way ANOVA for Example 9.13 on Stata.

9.8.2.2 ANOVA With More Than Two Factors

The two-way ANOVA can be generalized to three or more factors. According to Maroco (2014), the model becomes very complex, since the effect of multiple interactions can make the effect of the factors a bit confusing. The generic model with three factors presented by the author is:

Yijkl=μ+αi+βj+γk+αβij+αγik+βγjk+αβγijk+ɛijkl

si126_e  (9.50)

9.9 Final Remarks

This chapter presented the concepts and objectives of parametric hypotheses tests and the general procedures for constructing each one of them.

We studied the main types of tests and the situations in which each one of them must be used. Moreover, the advantages and disadvantages of each test were established, as well as their assumptions.

We studied the tests for normality (Kolmogorov-Smirnov, Shapiro-Wilk, and Shapiro-Francia), variance homogeneity tests (Bartlett’s χ2, Cochran’s C, Hartley’s Fmax, and Levene’s F), Student’s t-test for one population mean, for two independent means, and for two paired means, as well as ANOVA and its extensions.

Regardless of the application’s main goal, parametric tests can provide good and interesting research results that will be useful in the decision-making process. From a conscious choice of the modeling software, the correct use of each test must always be made based on the underlying theory, without ever ignoring the researcher’s experience and intuition.

9.10 Exercises

  1. (1) In what situations should parametric tests be applied and what are the assumptions of these tests?
  1. (2) What are the advantages and disadvantages of parametric tests?
  1. (3) What are the main parametric tests to verify the normality of the data? In what situations must we use each oneof them?
  1. (4) What are the main parametric tests to verify the variance homogeneity between groups? In what situations must we use each one of them?
  1. (5) To test a single population mean, we can use z-test and Student’s t-test. In what cases must each one of them be applied?
  1. (6) What are the main mean comparison tests? What are the assumptions of each test?
  1. (7) The monthly aircraft sales data throughout last year can be seen in the table below. Check and see if there is normality in the data. Consider α = 5%.
Jan.Feb.Mar.Apr.MayJun.Jul.Aug.Sept.Oct.Nov.Dec.
485250494750515439565255

Unlabelled Table

  1. (8) Test the normality of the temperature data listed (α = 5%):
12.514.213.414.612.710.916.514.711.210.912.112.8
13.813.513.214.115.516.210.814.312.812.411.416.2
14.314.814.613.713.510.810.411.511.911.314.211.2
13.416.113.517.516.215.014.213.212.413.412.711.2

Unlabelled Table

  1. (9) The table shows the final grades of two students in nine subjects. Check and see if there is variance homogeneity between the students (α = 5%).
Student 16.45.86.95.47.38.26.15.56.0
Student 26.57.07.56.58.19.07.56.56.8

Unlabelled Table

  1. (10) A fat-free yogurt manufacturer states that the number of calories in each cup is 60 cal. In order to check if this information is true, a random sample with 36 cups is collected; and we observed that the average number of calories was 65 cal. with a standard deviation of 3.5. Apply the appropriate test and check if the manufacturer’s statement is true, considering a significance level of 5%.
  1. (11) We would like to compare the average waiting time before being seen by a doctor (in minutes) in two hospitals. In order to do that, we collected a sample with 20 patients from each hospital. The data are available in the tables. Check and see if there are differences between the average waiting times in both hospitals. Consider α = 1%.

Hospital 1

725891887076981016573
79828091938897837174

Unlabelled Table

Hospital 2

66405570766153504761
52486072577066554651

Unlabelled Table

  1. (12) Thirty teenagers whose total cholesterol level is higher than what is advisable underwent treatment that consisted of a diet and physical activities. The tables show the levels of LDL cholesterol (mg/dL) before and after the treatment. Check if the treatment was effective (α = 5%).

Before the treatment

220212227234204209211245237250
208224220218208205227207222213
210234240227229224204210215228

Unlabelled Table

After the treatment

195180200204180195200210205211
175198195200190200222198201194
190204230222209198195190201210

Unlabelled Table

  1. (13) An aerospace company produces civilian and military helicopters at its three factories. The tables show the monthly production of helicopters in the last 12 months at each factory. Check if there is a difference between the population means. Consider α = 5%.

Factory 1

242628223125272830212024

Unlabelled Table

Factory 2

282624302427252930272625

Unlabelled Table

Factory 3

292524262022222720262425

Unlabelled Table

References

Brown M.B., Forsythe A.B. Robust tests for the equality of variances. J. Am. Stat. Assoc. 1974;69(346):364–367.

Fávero L.P., Belfiore P., Silva F.L., Chan B.L. Análise de dados: modelagem multivariada para tomada de decisões. Rio de Janeiro: Campus Elsevier; 2009.

Maroco J. Análise estatística com o SPSS Statistics. sixth ed. Lisboa: Edições Sílabo; 2014.

Pestana M.H., Gageiro J.N. Análise de dados para ciências sociais: a complementaridade do SPSS. 5. ed. Lisboa: Edições Sílabo; 2008.

Sarkadi K. The consistency of the Shapiro-Francia test. Biometrika. 1975;62(2):445–450.

Shapiro S.S., Francia R.S. An approximate analysis of variance test for normality. J. Am. Stat. Assoc. 1972;67:215–216.

Shapiro S.S., Wilk M.B. An analysis of variance test for normality (complete samples). Biometrika. 1965;52:591–611.


"To view the full reference list for the book, click here"

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset