This chapter discusses the circumstances in which the principal component factor analysis technique can be used. Several factor analysis concepts, calculations, and interpretations are presented: the concept of factor; an evaluation of the factor analysis’s overall adequacy through the KMO statistic and Bartlett’s test of sphericity; concepts of eigenvalues and eigenvectors in Pearson’s correlation matrices; the calculation and interpretation of factor scores, and, from these, the definition of factors; the calculation and interpretation of factor loadings and communalities; the construction of loading plots; concepts of factor rotation and preparation of the Varimax orthogonal rotation; construction of performance rankings from the joint behavior of variables. The principal component factor analysis technique is elaborated algebraically and also by using IBM SPSS Statistics Software® and Stata Statistical Software®, and their results will be interpreted.
Principal component factor analysis; KMO statistic; Bartlett’s test of sphericity; Eigenvalues and eigenvectors in Pearson’s correlation matrices; Factor scores; Factor loadings and communalities; Factor rotation; Varimax orthogonal rotation; SPSS and Stata Software
Love and truth are so intertwined that it is practically impossible to disentangle and separate them.
They are like the two sides of a coin.
Mahatma Gandhi
Exploratory factor analysis techniques are very useful when we intend to work with variables that have, between themselves, relatively high correlation coefficients and one wishes to establish new variables that capture the joint behavior of the original variables. Each one of these new variables is called factor, which can be understood as the cluster of variables from criteria previously established. Therefore, factor analysis is a multivariate technique that tries to identify a relatively small number of factors that represent the joint behavior of interdependent original variables. Thus, while cluster analysis, studied in the previous chapter, uses distance or similarity measures to group observations and form clusters, factor analysis uses correlation coefficients to group variables and generate factors.
Among the methods used to determine factors, the one known as principal components is, without a doubt, the most widely used in factor analysis, because it is based on the assumption that uncorrelated factors can be extracted from linear combinations of the original variables. Consequently, from a set of original variables correlated to one another, the principal component factor analysis allows another set of variables (factors) resulting from the linear combination of the first set to be determined.
Even though, as we know, the term confirmatory factor analysis often appears in the existing literature, factor analysis is essentially an exploratory multivariate technique, or an interdependence, since it does not have a predictive nature for other observations not initially present in the sample, and the inclusion of new observations in the dataset makes it necessary to reapply the technique, so that more accurate and updated new factors can be generated. According to Reis (2001), factor analysis can be used with the main exploratory goal of reducing the data dimension, aiming at creating factors from the original variables, as well as with the objective of confirming an initial hypothesis that the data may be reduced to a certain factor, or a certain dimension, which was previously established. Regardless of the objective, factor analysis will continue to be exploratory. If researchers aim to use a technique to, in fact, confirm the relationships found in the factor analysis, they can use structural equation modeling, for instance.
The principal component factor analysis has four main objectives: (1) to identify correlations between the original variables to create factors that represent the linear combination of those variables (structural reduction); (2) to verify the validity of previously established constructs, bearing in mind the allocation of the original variables to each factor; (3) to prepare rankings by generating performance indexes from the factors; and (4) to extract orthogonal factors for future use in confirmatory multivariate techniques that need the absence of multicollinearity.
Imagine that a researcher is interested in studying the interdependence between several quantitative variables that translate the socioeconomic behavior of a nation’s municipalities. In this situation, factors that may possibly explain the behavior of the original variables can be determined, and, in this regard, the factor analysis is used to reduce the data structurally and, later on, to create a socioeconomic index that captures the joint behavior of these variables. From this index, we may even propose a performance ranking of the municipalities, and the factors themselves can be used in a possible cluster analysis.
In another situation, factors extracted from the original variables can be used as explanatory variables of another variable (dependent), not initially considered in the analysis. For example, factors obtained from the joint behavior of grades in certain 12th grade subjects can be used as explanatory variables of students’ general classification in the college entrance exams, or whether students passed the exams or not. In these situations, note that the factors (orthogonal to one another) are used, instead of the original variables themselves, as explanatory variables of a certain phenomenon in confirmatory multivariate models, such as, multiple or logistic regression, in order to eliminate possible multicollinearity problems. Nevertheless, it is important to highlight that this procedure only makes sense when we intend to elaborate a diagnostic regarding the dependent variable’s behavior, without aiming at having forecasts for other observations not initially present in the sample. Since new observations do not have the corresponding values of the factors generated, obtaining these values is only possible if we include such observations in a new factor analysis.
In a third situation, imagine that a retailer is interested in assessing their clients’ level of satisfaction by applying a questionnaire in which the questions have been previously classified into certain groups. For instance, questions A, B, and C were classified into the group quality of services rendered, questions D and E, into the group positive perception of prices, and questions F, G, H, and I, into the group variety of goods. After applying the questionnaire to a significant number of customers, in which these nine variables are collected by attributing scores that vary from 0 to 10, the retailer has decided to elaborate a principal component factor analysis to verify if, in fact, the combination of variables reflects the construct previously established. If this occurs, the factor analysis will have been used to validate the construct, presenting a confirmatory objective.
In all of these situations, we can see that the original variables from which the factors will be extracted are quantitative, because a factor analysis begins with the study of the behavior of Pearson’s correlation coefficients between the variables. Nonetheless, it is common for researchers to use the incorrect arbitrary weighting procedure with qualitative variables, as, for example, variables on the Likert scale, and, from then on, to apply a factor analysis. This is a serious error! There are exploratory techniques meant exclusively for studying the behavior of qualitative variables as, for instance, the correspondence analysis and homogeneity analysis, and a factor analysis is definitely not meant for such purpose, as discussed by Fávero and Belfiore (2017).
In a historical context, the development of factor analyses is partly due to Pearson’s (1896) and Spearman’s (1904) pioneer work. While Karl Pearson developed a rigorous mathematical treatment regarding what we traditionally call correlation at the beginning of the 20th century, Charles Edward Spearman published highly original work in which the interrelationships between students’ performance in several subjects, such as, French, English, Mathematics and Music were evaluated. Since the grades in these subjects showed strong correlation, Spearman proposed that scores resulting from apparently incompatible tests shared a single general factor, and students who got good grades had a more developed psychological or intelligence component. Generally speaking, Spearman excelled in applying mathematical methods and correlation studies to the analysis of the human mind.
Decades later, in 1933, Harold Hotelling, a statistician, mathematician, and influential economics theoretician decided to call Principal Component Analysis the analysis that determines components from the maximization of the original data’s variance. Also in the first half of the 20th century, psychologist Louis Leon Thurstone, from an investigation of Spearman’s ideas and based on the application of certain psychological tests, whose results were submitted to a factor analysis, identified people’s seven primary mental abilities: spatial visualization, verbal meaning, verbal fluency, perceptual speed, numerical ability, reasoning, and rote memory. In psychology, the term mental factors is even used for variables that have greater influence over a certain behavior.
Currently, factor analysis is used in several fields of knowledge, such as, marketing, economics, strategy, finance, accounting, actuarial science, engineering, logistics, psychology, medicine, ecology and biostatistics, among others.
The principal component factor analysis must be defined based on the underlying theory and on the researcher’s experience, so that it can be possible to apply the technique correctly and to analyze the results obtained.
In this chapter, we will discuss the principal component factor analysis technique, with the following objectives: (1) to introduce the concepts; (2) to present the step by step of modeling in an algebraic and practical way; (3) to interpret the results obtained; and (4) to show the application of the technique in SPSS and Stata. Following the logic proposed in the book, first, we develop the algebraic solution of an example linked to the presentation of the concepts. Only after introducing these concepts, we present and discuss the procedures for running the technique in SPSS and Stata be presented.
Many are the procedures inherent to the factor analysis, with different methods for determining (extraction) factors from Pearson’s correlation matrix. The most frequently used method, which was adopted in this chapter for extracting factors, is known as principal components, in which the consequent structural reduction is also called Karhunen-Loève transformation.
In the following sections, we will discuss the theoretical development of the technique, as well as a practical example. While the main concepts will be presented in Sections 12.2.1–12.2.5, Section 12.2.6 is meant for solving a practical example algebraically, from a dataset.
Let’s imagine a dataset that has n observations and, for each observation i (i = 1, …, n), values corresponding to each one of the k metric variables X, as shown in Table 12.1.
Table 12.1
Observation i | X1i | X2i | … | Xki |
---|---|---|---|---|
1 | X11 | X21 | … | Xk1 |
2 | X12 | X22 | Xk2 | |
3 | X13 | X23 | Xk3 | |
⋮ | ⋮ | ⋮ | ⋮ | |
n | X1n | X2n | Xkn |
From the dataset, and given our intention of extracting factors from k variables X, we must define correlation matrix ρ that displays the values of Pearson’s linear correlation between each pair of variables, as shown in Expression (12.1).
ρ=(1ρ12⋯ρ1kρ211⋯ρ2k⋮⋮⋱⋮ρk1ρk2⋯1)
Correlation matrix ρ is symmetrical in relation to the main diagonal that, obviously, shows values equal to 1. For example, for variables X1 and X2, Pearson’s correlation ρ12 can be calculated by using Expression (12.2).
ρ12=∑ni=1(X1i−ˉX1)⋅(X2i−ˉX2)√∑ni=1(X1i−ˉX1)2⋅√∑ni=1(X2i−ˉX2)2
where ˉX1 and ˉX2 represent the means of variables X1 and X2, respectively, and this expression is analogous to Expression (4.11), defined in Chapter 4.
Thus, since Pearson’s correlation is a measure of the level of linear relationship between two metric variables, which may vary between − 1 and 1, a value closer to one of these extreme values indicates the existence of a linear relationship between the two variables under analysis, which, therefore, may significantly contribute to the extraction of a single factor. On the other hand, a Pearson correlation that is very close to 0 indicates that the linear relationship between the two variables is practically nonexistent. Therefore, different factors can be extracted.
Let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). A three-dimensional scatter plot can be constructed from the values of each variable for each observation. The plot can be seen in Fig. 12.1.
Only based on the visual analysis of the chart in Fig. 12.1, it is difficult to assess the behavior of the linear relationships between each pair of variables. Thus, Fig. 12.2 shows the projection of the points that correspond to each observation in each one of the planes formed by the pairs of variables, highlighting, in the dotted line, the adjustment that represents the linear relationship between the respective variables.
While Fig. 12.2A shows that there is a significant linear relationship between variables X1 and X2 (a very high Pearson correlation), Fig. 12.2B and C make it very clear that there is no linear relationship between X3 and these variables. Fig. 12.3 displays these projections in a three-dimensional plot, with the respective linear adjustments in each plane (the dotted lines).
Thus, in this hypothetical example, while variables X1 and X2 may be represented by a single factor in a very significant way, which we will call F1, variable X3 may be represented by another factor, F2, orthogonal to F1. Fig. 12.4 illustrates the extraction of these new factors in a three-dimensional way.
So, factors can be understood as representations of latent dimensions that explain the behavior of the original variables.
Having presented these initial concepts, it is important to emphasize that in many cases researchers may choose to not extract a factor represented in a considerable way by only one variable (in this case, factor F2), and what will define the extraction of each one of the factors is the calculation of the eigenvalues from correlation matrix ρ, as we will study in Section 12.2.3. Nevertheless, before that, it will be necessary to check the overall adequacy of the factor analysis, which will be discussed in the following section.
An adequate extraction of factors from the original variables requires correlation matrix ρ to have relatively high and statistically significant values. As discussed by Hair et al. (2009), even though visually analyzing correlation matrix ρ does not reveal if the factor extraction will in fact be adequate, a significant number of values less than 0.30 represent a preliminary indication that the factor analysis may not be adequate.
In order to verify the overall adequacy of the factor extraction itself, we must use the Kaiser-Meyer-Olkin statistic (KMO) and Bartlett’s test of sphericity.
The KMO statistic gives us the proportion of variance considered common to all the variables in the sample under analysis, that is, which can be attributed to the existence of a common factor. This statistic varies from 0 to 1 and, while values closer to 1 indicate that the variables share a very high proportion of variance (high Pearson correlations), values closer to 0 are a result of low Pearson correlations between the variables, which may indicate that the factor analysis will not be adequate. The KMO statistic, presented initially by Kaiser (1970), can be calculated through Expression (12.3).
KMO=∑kl=1∑kc=1ρ2lc∑kl=1∑kc=1ρ2lc+∑kl=1∑kc=1φ2lc,l≠c
where l and c represent the rows and columns of correlation matrix ρ, respectively, and the terms φ represent the partial correlation coefficients between two variables. While Pearson’s correlation coefficients ρ are also called zero-order correlation coefficients, partial correlation coefficients φ are also known as higher-order correlation coefficients. For three variables, they are also called first-order correlation coefficients, for four variables, second-order correlation coefficients, and so on.
Let’s imagine a hypothetical situation in which a certain dataset shows three variables once again (k = 3). Is it possible that, in fact, ρ12 reflects the level of linear relationship between X1 and X2 if variable X3 is related to the other two? In this situation, ρ12 may not represent the true level of linear relationship between X1 and X2 when X3 is present, which may provide a false impression regarding the nature of the relationship between the first two. Thus, partial correlation coefficients may contribute with the analysis, since, according to Gujarati and Porter (2008), they are used when researchers wish to find out the correlation between two variables, either by controlling or ignoring the effects of other variables present in the dataset. For our hypothetical situation, it is the correlation coefficient regardless of X3’s influence over X1 and X2, if any.
Hence, for three variables X1, X2, and X3, we can define the first-order correlation coefficients the following way:
φ12,3=ρ12−ρ13⋅ρ23√(1−ρ213)⋅(1−ρ223)
where φ12,3 represents the correlation between X1 and X2, maintaining X3 constant,
φ13,2=ρ13−ρ12⋅ρ23√(1−ρ212)⋅(1−ρ223)
where φ13,2 represents the correlation between X1 and X3, maintaining X2 constant, and
φ23,1=ρ23−ρ12⋅ρ13√(1−ρ212)⋅(1−ρ213)
where φ23,1 represents the correlation between X2 and X3, maintaining X1 constant.
In general, a first-order correlation coefficient can be obtained through the following expression:
φab,c=ρab−ρac⋅ρbc√(1−ρ2ac)⋅(1−ρ2bc)
where a, b, and c can assume values 1, 2, or 3, corresponding to the three variables under analysis.
Conversely, for a case in which there are four variables in the analysis, the general expression of a certain partial correlation coefficient (second-order correlation coefficient) is given by:
φab,cd=φab,c−φad,c⋅φbd,c√(1−φ2ad,c)⋅(1−φ2bd,c)
where φab,cd represents the correlation between Xa and Xb, maintaining Xc and Xd constant, bearing in mind that a, b, c, and d may take on values 1, 2, 3, or 4, which correspond to the four variables under analysis.
Obtaining higher-order correlation coefficients, in which five or more variables are considered in the analysis, should always be done based on the determination of lower-order partial correlation coefficients. In Section 12.2.6, we will propose a practical example by using four variables, in which the algebraic solution of the KMO statistic will be obtained through Expression (12.8).
It is important to highlight that, even if Pearson’s correlation coefficient between two variables is 0, the partial correlation coefficient between them may not be equal to 0, depending on the values of Pearson’s correlation coefficients between each one of these variables and the others present in the dataset.
In order for a factor analysis to be considered adequate, the partial correlation coefficients between the variables must be low. This fact denotes that the variables share a high proportion of variance, and disregarding one or more of them in the analysis may hamper the quality of the factor extraction. Therefore, according to a widely accepted criterion found in the existing literature, Table 12.2 gives us an indication of the relationship between the KMO statistic and the overall adequacy of the factor analysis.
Table 12.2
KMO Statistic | Overall Adequacy of the Factor Analysis |
---|---|
Between 1.00 and 0.90 | Marvelous |
Between 0.90 and 0.80 | Meritorious |
Between 0.80 and 0.70 | Middling |
Between 0.70 and 0.60 | Mediocre |
Between 0.60 and 0.50 | Miserable |
Less than 0.50 | Unacceptable |
On the other hand, Bartlett’s test of sphericity (Bartlett, 1954) consists in comparing correlation matrix ρ to an identity matrix I of the same dimension. If the differences between the corresponding values outside the main diagonal of each matrix are not statistically different from 0, at a certain significance level, we may consider that the factor extraction will not be adequate. In other words, in this case, Pearson’s correlations between each pair of variables are statistically equal to 0, which makes any attempt of performing a factor extraction from the original variables unfeasible. So, we can define the null and alternative hypotheses of Bartlett’s test of sphericity the following way:
H0:ρ=(1ρ12⋯ρ1kρ211⋯ρ2k⋮⋮⋱⋮ρk1ρk2⋯1)=I=(10⋯001⋯0⋮⋮⋱⋮00⋯1)
H1:ρ=(1ρ12⋯ρ1kρ211⋯ρ2k⋮⋮⋱⋮ρk1ρk2⋯1)≠I=(10⋯001⋯0⋮⋮⋱⋮00⋯1)
The statistic corresponding to Bartlett’s test of sphericity is an χ2 statistic, which has the following expression:
χ2Bartlett=−[(n−1)−(2⋅k+56)]⋅ln|D|
with k⋅(k−1)2 degrees of freedom. We know that n is the sample size and k is the number of variables. In addition, D represents the determinant of correlation matrix ρ.
Thus, for a certain number of degrees of freedom and a certain significance level, Bartlett’s test of sphericity allows us to check if the total value of the χBartlett2 statistic is higher than the statistic’s critical value. If this is true, we may state that Pearson’s correlations between the pairs of variables are statistically different from 0 and that, therefore, factors can be extracted from the original variables and the factor analysis is adequate. When we develop a practical example in Section 12.2.6, we will also discuss the calculations of the χBartlett2 statistic and the result of Bartlett’s test of sphericity.
It is important to emphasize that we should always favor Bartlett’s test of sphericity over the KMO statistic to take a decision about the factor analysis’s overall adequacy. Given that the former is a test with a certain significance level, and the latter is only a coefficient (a statistic) calculated without any set distribution of probabilities or hypotheses that allow us to evaluate the corresponding significance level to make a decision.
In addition, it is important to mention that for only two original variables the KMO statistic will always be equal to 0.50. Conversely, the χBartlett2 statistic may indicate if the null hypothesis of the test of sphericity was rejected or not, depending on the magnitude of Pearson’s correlation between both variables. Thus, while the KMO statistic will be 0.50 in these situations, Bartlett’s test of sphericity will allow researchers to decide whether to extract one factor from the two original variables or not. In contrast, for three original variables, it is very common for researchers to extract two factors with the statistical significance of Bartlett’s test of sphericity, however, with the KMO statistic less than 0.50. These two situations emphasize even more the greater relevance of Bartlett’s test of sphericity in relation to the KMO statistic in the decision-making process.
Finally, we must mention that the recommendation to study Cronbach’s alpha’s magnitude, before studying the overall adequacy of the factor analysis, is commonly found in the existing literature, so that the reliability with which a factor can be extracted from original variables can be evaluated. We would like to highlight that Cronbach’s alpha only offers researchers indications of the internal consistency of the variables in the dataset so that a single factor can be extracted. Therefore, determining it is not a mandatory requisite for developing the factor analysis, since this technique allows the extraction of most factors. Nevertheless, for pedagogical purposes, we will discuss the main concepts of Cronbach’s alpha in the Appendix of this chapter, with its algebraic determination and corresponding applications in SPSS and Stata software.
Having discussed these concepts and verified the overall adequacy of the factor analysis, we can now move on to the definition of the factors.
Since a factor represents the linear combination of the original variables, for k variables, we can define a maximum number of k factors (F1, F2, …, Fk), analogous to the maximum number of clusters that can be defined from a sample with n observations, as we discussed in the previous chapter, since a factor can also be understood as the result of the clustering of variables. Therefore, for k variables, we have:
F1i=s11⋅X1i+s21⋅X2i+⋯+sk1⋅XkiF2i=s12⋅X1i+s22⋅X2i+⋯+sk2⋅Xki⋮Fki=s1k⋅X1i+s2k⋅X2i+⋯+skk⋅Xki
where the terms s are known as factor scores, which represent the parameters of a linear model that relates a certain factor to the original variables. Calculating the factor scores is essential in the context of the factor analysis technique and is elaborated by determining the eigenvalues and eigenvectors of correlation matrix ρ. In Expression (12.11), we once again show correlation matrix ρ, which has already been presented in Expression (12.1).
ρ=(1ρ12⋯ρ1kρ211⋯ρ2k⋮⋮⋱⋮ρk1ρk2⋯1)
This correlation matrix, with dimensions k × k, shows k eigenvalues λ2 (λ12 ≥ λ22 ≥ … ≥ λk2), which can be obtained from solving the following equation:
det(λ2⋅I−ρ)=0
where I is the identity matrix, also with dimensions k × k.
Since a certain factor represents the result of the clustering of variables, it is important to highlight that:
λ21+λ22+⋯+λ2k=k
Expression (12.12) can be rewritten as follows:
|λ2−1−ρ12⋯−ρ1k−ρ21λ2−1⋯−ρ2k⋮⋮⋱⋮−ρk1−ρk2⋯λ2−1|=0
from which we can define the eigenvalue matrix Λ2 the following way:
Λ2=(λ210⋯00λ22⋯0⋮⋮⋱⋮00⋯λ2k)
In order to define the eigenvectors of matrix ρ based on the eigenvalues, we must solve the following equation system for each eigenvalue λ2 (λ12, λ22, …, λk2):
(λ21−1−ρ12⋯−ρ1k−ρ21λ21−1⋯−ρ2k⋮⋮⋱⋮−ρk1−ρk2⋯λ21−1)⋅(v11v21⋮vk1)=(00⋮0)
from where we obtain:
{(λ21−1)⋅v11−ρ12⋅v21…−ρ1k⋅vk1=0−ρ21⋅v11+(λ21−1)⋅v21…−ρ2k⋅vk1=0⋮−ρk1⋅v11−ρk2⋅v21…+(λ21−1)⋅vk1=0
(λ22−1−ρ12⋯−ρ1k−ρ21λ22−1⋯−ρ2k⋮⋮⋱⋮−ρk1−ρk2⋯λ22−1)⋅(v12v22⋮vk2)=(00⋮0)
from where we obtain:
{(λ22−1)⋅v12−ρ12⋅v22…−ρ1k⋅vk2=0−ρ21⋅v12+(λ22−1)⋅v22…−ρ2k⋅vk2=0⋮−ρk1⋅v12−ρk2⋅v22…+(λ22−1)⋅vk2=0
(λ2k−1−ρ12⋯−ρ1k−ρ21λ2k−1⋯−ρ2k⋮⋮⋱⋮−ρk1−ρk2⋯λ2k−1)⋅(v1kv2k⋮vkk)=(00⋮0)
from where we obtain:
{(λ2k−1)⋅v1k−ρ12⋅v2k…−ρ1k⋅vkk=0−ρ21⋅v1k+(λ2k−1)⋅v2k…−ρ2k⋅vkk=0⋮−ρk1⋅v1k−ρk2⋅v2k…+(λ2k−1)⋅vkk=0
Thus, we can calculate the factor scores of each factor by determining the eigenvalues and eigenvectors of correlation matrix ρ. The factor scores vectors can be defined as follows:
S1=(s11s21⋮sk1)=(v11√λ21v21√λ21⋮vk1√λ21)
S2=(s12s22⋮sk2)=(v12√λ22v22√λ22⋮vk2√λ22)
Sk=(s1ks2k⋮skk)=(v1k√λ2kv2k√λ2k⋮vkk√λ2k)
Since the factor scores of each factor are standardized by the respective eigenvalues, the factors of the set of equations presented in Expression (12.10) must be obtained by multiplying each factor score by the corresponding original variable, standardized by using the Z-scores procedure. Thus, we can obtain each one of the factors based on the following equations:
F1i=v11√λ21⋅ZX1i+v21√λ21⋅ZX2i+⋯+vk1√λ21⋅ZXkiF2i=v12√λ22⋅ZX1i+v22√λ22⋅ZX2i+⋯+vk2√λ22⋅ZXki⋮Fki=v1k√λ2k⋅ZX1i+v2k√λ2k⋅ZX2i+⋯+vkk√λ2k⋅ZXki
where ZXi represents the standardized value of each variable X for a certain observation i. It is important to emphasize that all the factors extracted show, between themselves, Pearson correlations equal to 0, that is, they are orthogonal to one another.
A more perceptive researcher will notice that the factor scores of each factor correspond exactly to the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself and, as explanatory variables, the standardized variables.
Mathematically, it is also possible to verify the existing relationship between the eigenvectors, correlation matrix ρ, and eigenvalue matrix Λ2. Consequently, defining eigenvector matrix V as follows:
V=(v11v12⋯v1kv21v22⋯v2k⋮⋮⋱⋮vk1vk2⋯vkk)
we can prove that:
V'⋅ρ⋅V=Λ2
or:
(v11v21⋯vk1v12v22⋯vk2⋮⋮⋱⋮v1kv2k⋯vkk)⋅(1ρ12⋯ρ1kρ211⋯ρ2k⋮⋮⋱⋮ρk1ρk2⋯1)⋅(v11v12⋯v1kv21v22⋯v2k⋮⋮⋱⋮vk1vk2⋯vkk)=(λ210⋯00λ22⋯0⋮⋮⋱⋮00⋯λ2k)
In Section 12.2.6, we will discuss a practical example from which this relationship may be demonstrated.
While in Section 12.2.2, we discussed the factor analysis’s overall adequacy, in this section, we will discuss the procedures for carrying out the factor extraction, if the technique is considered adequate. Even knowing that the maximum number of factors is also equal to k for k variables, it is essential for researchers to define, based on a certain criterion, the adequate number of factors that, in fact, represent the original variables. In our hypothetical example in Section 12.2.1, we saw that only two factors (F1 and F2) would be enough to represent the three original variables (X1, X2, and X3).
Although researchers are free to determine the number of factors to be extracted in the analysis, in a preliminary way, since they may wish to verify the validity of a previously established construct (procedure known as a priori criterion), for instance, it is essential to carry out an analysis based on the magnitude of the eigenvalues calculated from correlation matrix ρ.
As the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, as we will discuss in Section 12.2.4, since λ12 ≥ λ22 ≥ … ≥ λk2 and bearing in mind that factors F1, F2, …, Fk are obtained from the respective eigenvalues, factors extracted from smaller eigenvalues are formed from smaller proportions of variance shared by the original variables. Since a factor represents a certain cluster of variables, factors extracted from eigenvalues less than 1 may possibly not be able to represent the behavior of a single original variable (of course there are exceptions to this rule, which occur in cases in which a certain eigenvalue is less than, but also very close to 1). The criterion for choosing the number of factors, in which only the factors that correspond to eigenvalues greater than 1 are considered, is often used and known as the latent root criterion or Kaiser criterion.
The factor extraction method presented in this chapter is known as principal components, and the first factor F1, formed by the highest proportion of variance shared by the original variables, is also called principal factor. This method is often mentioned in the existing literature and is used in practical applications whenever researchers wish to elaborate a structural reduction of the data in order to create orthogonal factors, to define observation rankings by using the factors generated, and even to confirm the validity of previously established constructs. Other factor extraction methods, such as, the generalized least squares, unweighted least squares, maximum likelihood, alpha factoring, and image factoring, have different criteria and certain specificities and, even though they can also be found in the existing literature, they will not be discussed in this book.
Moreover, it is common to discuss the need to apply the factor analysis to variables that have multivariate normal distribution, in order to show consistency when determining the factor scores. Nevertheless, it is important to emphasize that multivariate normality is a very rigid assumption, only necessary for a few factor extraction methods, such as, the maximum likelihood method. Most factor extraction methods do not require the assumption of data multivariate normality and, as discussed by Gorsuch (1983), the principal component factor analysis seems to be, in practice, very robust against breaks in normality.
Having established the factors, we can now define the factor loadings, which simply are Pearson correlations between the original variables and each one of the factors. Table 12.3 shows the factor loadings for each variable-factor pair.
Table 12.3
Variable | Factor | |||
---|---|---|---|---|
F1 | F2 | … | Fk | |
X1 | c11 | c12 | … | c1k |
X2 | c21 | c22 | c2k | |
⋮ | ⋮ | ⋮ | ⋮ | |
Xk | ck1 | ck2 | ckk |
Based on the latent root criterion (in which only factors resulting from eigenvalues greater than 1 are considered), we assume that the factor loadings between the factors that correspond to eigenvalues less than 1 and all the original variables are low, since they will have already presented higher Pearson correlations (loadings) with factors previously extracted from greater eigenvalues. In the same way, original variables that only share a small portion of variance with the other variables will have high factor loadings in only a single factor. If this occurs for all original variables, there will not be significant differences between correlation matrix ρ and identity matrix I, making the χBartlett2 statistic very low. This fact allows us to state that the factor analysis will not be adequate, and, in this situation, researchers may choose not to extract factors from the original variables.
As the factor loadings are Pearson’s correlations between each variable and each factor, the sum of the squares of these loadings in each row of Table 12.3 will always be equal to 1, since each variable shares part of its proportion of variance with all the k factors, and the sum of the proportions of variance (factor loadings or squared Pearson correlations) will be 100%.
Conversely, if less than k factors are extracted, due to the latent root criterion, the sum of the squared factor loadings in each row will not be equal to 1. This sum is called communality, which represents the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, we can say that:
c211+c212+⋯=communalityX1c221+c222+⋯=communalityX2⋮c2k1+c2k2+⋯=communalityXk
The main objective of the analysis of communalities is to check if any variable ends up not sharing a significant proportion of variance with the factors extracted. Even though there is no cutoff point from which a certain communality can be considered high or low, since the sample size can interfere in this assessment, the existence of considerably low communalities in relation to the others can indicate to researchers that they may need to reconsider including the respective variable into the factor analysis.
Therefore, after defining the factors based on the factor scores, we can state that the factor loadings will be exactly the same as the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable ZX and, as explanatory variables, the factors themselves, and the coefficient of determination R2 of each model is equal to the communality of the respective original variable.
The sum of the squared factor loadings in each column of Table 12.3, on the other hand, will be equal to the respective eigenvalue, since the ratio between each eigenvalue and the total number of variables can be understood as the proportion of variance shared by all k original variables to form each factor. So, we can say that:
c211+c221+⋯+c2k1=λ21c212+c222+⋯+c2k2=λ22⋮c21k+c22k+⋯+c2kk=λ2k
After establishing the factors and the calculation of the factor loadings, it is also possible for some variables to have intermediate (neither very high nor very low) Pearson correlations (factor loadings) with all the factors extracted, although its communality is relatively not so low. In this case, although the solution of the factor analysis has already been obtained in an adequate way and considered concluded, researchers can, in the cases in which the factor loadings table shows intermediate values for one or more variables in all the factors, elaborate a rotation of these factors, so that Pearson’s correlations between the original variables and the new factors generated can be increased. In the following section, we will discuss factor rotation.
Once again, let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). After preparing the principal component factor analysis, two factors, orthogonal to one another, are extracted, with factor loadings (Pearson correlations) with each one of the three original variables, according to Table 12.4.
Table 12.4
Variable | Factor | |
---|---|---|
F1 | F2 | |
X1 | c11 | c12 |
X2 | c21 | c22 |
X3 | c31 | c32 |
In order to construct a chart with the relative positions of each variable in each factor (a chart known as loading plot), we can consider the factor loadings to be coordinates (abscissas and ordinates) of the variables in a Cartesian plane formed by both orthogonal factors. The plot can be seen in Fig. 12.5.
In order to better visualize the variables better represented by a certain factor, we can think about a rotation around the origin of the originally extracted factors F1 and F2, so that we can bring the points corresponding to variables X1, X2, and X3 closer to one of the new factors. These are called rotated factors F1′ and F2′. Fig. 12.6 shows this process in a simplified way.
Based on Fig. 12.6, for each variable under analysis, we can see that while the loading for one factor increases, for the other, it decreases. Table 12.5 shows the loading redistribution for our hypothetical situation.
Table 12.5
Variable | Factor | |||
---|---|---|---|---|
Original Factor Loadings | Rotated Factor Loadings | |||
F1 | F2 | F1′ | F2′ | |
X1 | c11 | c12 | | c11′| > | c11 | | | c12′| < | c12 | |
X2 | c21 | c22 | | c21′| > | c21 | | | c22′| < | c22 | |
X3 | c31 | c32 | | c31′| < | c31 | | | c32′| > | c32 | |
Thus, for a generic situation, we can say that rotation is a procedure that maximizes the loadings of each variable in a certain factor, to the detriment of the others. In this regard, the final effect of rotation is the redistribution of factor loadings to factors that initially had smaller proportions of variance shared by all the original variables. The main objective is to minimize the number of variables with high loadings in a certain factor, since each one of the factors will start having more significant loadings only with some of the original variables. Consequently, rotation may simplify the interpretation of the factors.
Despite the fact that communalities and the total proportion of variance shared by all the variables in all the factors are not modified by the rotation (and neither are the KMO statistic or χBartlett2), the proportion of variance shared by the original variables in each factor is redistributed and, therefore, modified. In other words, new eigenvalues are set λ′(λ1′, λ2′, …, λk′) from the rotated factor loadings. Thus, we can say that:
c′211+c′212+⋯=communalityX1c′221+c′222+⋯=communalityX2⋮c′2k1+c′2k2+⋯=communalityXk
and that:
c′211+c′221+⋯+c′2k1=λ′21≠λ21c′212+c′222+⋯+c′2k2=λ′22≠λ22⋮c′21k+c′22k+⋯+c′2kk=λ′2k≠λ2k
even if Expression (12.13) is respected, that is:
λ21+λ22+⋯+λ2k=λ′21+λ′22+⋯+λ′2k=k
Besides, new rotated factor scores are obtained from the rotation of factors, s′, such that the final expressions of the rotated factors will be:
F′1i=s′11⋅ZX1i+s′21⋅ZX2i+⋯+s′k1⋅ZXkiF′2i=s′12⋅ZX1i+s′22⋅ZX2i+⋯+s′k2⋅ZXki⋮F′ki=s′1k⋅ZX1i+s′2k⋅ZX2i+⋯+s′kk⋅ZXki
It is important to highlight that the overall adequacy of the factor analysis (KMO statistic and Bartlett’s test of sphericity) is not altered by the rotation, since correlation matrix ρ continues the same.
Even though there are several factor rotation methods, the orthogonal rotation method, also known as Varimax, whose main purpose is to minimize the number of variables that have high loadings on a certain factor through the redistribution of the factor loadings and maximization of the variance shared in factors that correspond to lower eigenvalues, is the most frequently used and will be used in this chapter to solve a practical example. That is where the name Varimax comes from. This method was proposed by Kaiser (1958).
The algorithm behind the Varimax rotation method consists in determining a rotation angle θ in which pairs of factors are equally rotated. Thus, as discussed by Harman (1976), for a certain pair of factors F1 and F2, for example, the rotated factor loadings c’ between the two factors and the k original variables are obtained from the original factor loadings c, through the following matrix multiplication:
(c11c12c21c22⋮⋮ck1ck2)⋅(cosθ−senθsenθcosθ)=(c′11c′12c′21c′22⋮⋮c′k1c′k2)
where θ, the counterclockwise rotation angle, is obtained by the following expression:
θ=0.25⋅arctan[2(D⋅k−A⋅B)C⋅k−(A2−B2)]
where:
A=k∑l=1(c21lcommunalityl−c22lcommunalityl)
B=k∑l=1(2⋅c1l⋅c2lcommunalityl)
C=k∑l=1[(c21lcommunalityl−c22lcommunalityl)2−(2⋅c1l⋅c2lcommunalityl)2]
D=k∑l=1[(c21lcommunalityl−c22lcommunalityl)⋅(2⋅c1l⋅c2lcommunalityl)]
In Section 12.2.6, we will use these Varimax rotation method expressions to determine the rotated factor loadings from the original loadings.
Besides Varimax, we can also mention other orthogonal rotation methods, such as, Quartimax and Equamax, even though they are less frequently mentioned in the existing literature and less used in practice. In addition to them, the researcher may also use oblique rotation methods, in which nonorthogonal factors are generated. Although they are not discussed in this chapter, we should also mention the Direct Oblimin and Promax methods in this category.
Since oblique rotation methods can sometimes be used when we wish to validate a certain construct, whose initial factors are not correlated, we recommend that an orthogonal rotation method be used so that factors extracted in other multivariate techniques can be used later, such as, certain confirmatory models, in which the lack of multicollinearity of the explanatory variables is a mandatory premise.
Imagine that the same professor, deeply engaged in academic and pedagogical activities, is now interested in studying how his students’ grades behave so that, afterwards, he can propose the creation of a school performance ranking.
In order to do that, he collected information on the final grades, which vary from 0 to 10, of each one of his 100 students in the following subjects: Finance, Costs, Marketing, and Actuarial Science. Part of the dataset can be seen in Table 12.6.
Table 12.6
Student | Final Grade in Finance (X1i) | Final Grade in Costs (X2i) | Final Grade in Marketing (X3i) | Final Grade in Actuarial Science (X4i) |
---|---|---|---|---|
Gabriela | 5.8 | 4.0 | 1.0 | 6.0 |
Luiz Felipe | 3.1 | 3.0 | 10.0 | 2.0 |
Patricia | 3.1 | 4.0 | 4.0 | 4.0 |
Gustavo | 10.0 | 8.0 | 8.0 | 8.0 |
Leticia | 3.4 | 2.0 | 3.2 | 3.2 |
Ovidio | 10.0 | 10.0 | 1.0 | 10.0 |
Leonor | 5.0 | 5.0 | 8.0 | 5.0 |
Dalila | 5.4 | 6.0 | 6.0 | 6.0 |
Antonio | 5.9 | 4.0 | 4.0 | 4.0 |
… | ||||
Estela | 8.9 | 5.0 | 2.0 | 8.0 |
The complete dataset can be found in the file FactorGrades.xls. Through this dataset, it is possible to construct Table 12.7, which shows Pearson’s correlation coefficients between each pair of variables, calculated by using the logic presented in Expression (12.2).
Table 12.7
finance | costs | marketing | actuarial science | |
---|---|---|---|---|
finance | 1.000 | 0.756 | − 0.030 | 0.711 |
costs | 0.756 | 1.000 | 0.003 | 0.809 |
marketing | − 0.030 | 0.003 | 1.000 | − 0.044 |
actuarial science | 0.711 | 0.809 | − 0.044 | 1.000 |
Therefore, we can write the expression of the correlation matrix ρ as follows:
ρ=(1ρ12ρ13ρ14ρ211ρ23ρ24ρ31ρ321ρ34ρ41ρ42ρ431)=(1.0000.756−0.0300.7110.7561.0000.0030.809−0.0300.0031.000−0.0440.7110.809−0.0441.000)
which has determinant D = 0.137.
By analyzing correlation matrix ρ, it is possible to verify that only the grades corresponding to the variable marketing do not have correlations with the grades in the other subjects, represented by the other variables. On the other hand, these show relatively high correlations with one another (0.756 between finance and costs, 0.711 between finance and actuarial, and 0.809 between costs and actuarial), which indicates that they may share significant variance to form one factor. Although this preliminary analysis is important, it cannot represent more than a simple diagnostic, since the overall adequacy of the factor analysis needs to be evaluated based on the KMO statistic and, mainly, by using the result of Bartlett’s test of sphericity.
As we discussed in Section 12.2.2, the KMO statistic provides the proportion of variance considered common to all the variables present in the analysis, and, in order to establish its calculation, we need to determine partial correlation coefficients φ between each pair of variables. In this case, it will be second-order correlation coefficients, since we are working with four variables simultaneously.
Consequently, based on Expression (12.7), first, we need to determine the first-order correlation coefficients used to calculate of the second-order correlation coefficients. Table 12.8 shows these coefficients.
Table 12.8
φ12,3=ρ12−ρ13⋅ρ23√(1−ρ213)⋅(1−ρ223)=0.756 | φ13,2=ρ13−ρ12⋅ρ23√(1−ρ212)⋅(1−ρ223)=−0.049 | φ14,2=ρ14−ρ12⋅ρ24√(1−ρ212)⋅(1−ρ224)=0.258 |
φ14,3=ρ14−ρ13⋅ρ34√(1−ρ213)⋅(1−ρ234)=0.711 | φ23,1=ρ23−ρ12⋅ρ13√(1−ρ212)⋅(1−ρ213)=0.039 | φ24,1=ρ24−ρ12⋅ρ14√(1−ρ212)⋅(1−ρ214)=0.590 |
φ24,3=ρ24−ρ23⋅ρ34√(1−ρ223)⋅(1−ρ234)=0.810 | φ34,1=ρ34−ρ13⋅ρ14√(1−ρ213)⋅(1−ρ214)=−0.033 | φ34,2=ρ34−ρ23⋅ρ24√(1−ρ223)⋅(1−ρ224)=−0.080 |
Hence, from these coefficients and by using Expression (12.8), we can calculate the second-order correlation coefficients considered in the KMO statistic’s expression. Table 12.9 shows these coefficients.
Table 12.9
φ12,34=φ12,3−φ14,3⋅φ24,3√(1−φ214,3)⋅(1−φ224,3)=0.438 | ||
φ13,24=φ13,2−φ14,2⋅φ34,2√(1−φ214,2)⋅(1−φ234,2)=−0.029 | φ23,14=φ23,1−φ24,1⋅φ34,1√(1−φ224,1)⋅(1−φ234,1)=0.072 | |
φ14,23=φ14,2−φ13,2⋅φ34,2√(1−φ213,2)⋅(1−φ234,2)=0.255 | φ24,13=φ24,1−φ23,1⋅φ34,1√(1−φ223,1)⋅(1−φ234,1)=0.592 | φ34,12=φ34,1−φ23,1⋅φ24,1√(1−φ223,1)⋅(1−φ224,1)=−0.069 |
So, based on Expression (12.3), we can calculate the KMO statistic. The terms of the expression are given by:
k∑l=1k∑c=1ρ2lc=(0.756)2+(−0.030)2+(0.711)2+(0.003)2+(0.809)2+(−0.044)2=1.734
k∑l=1k∑c=1φ2lc=(0.438)2+(−0.029)2+(0.255)2+(0.072)2+(0.592)2+(−0.069)2=0.619
from where we obtain:
KMO=1.7341.734+0.619=0.737
Based on the criterion presented in Table 12.2, the value of the KMO statistic suggests that the overall adequacy of the factor analysis is middling. To test whether, in fact, correlation matrix ρ is statistically different from identity matrix I with the same dimension, we must use Bartlett’s test of sphericity, whose χBartlett2 statistic is given by Expression (12.9). For n = 100 observations, k = 4 variables, and correlation matrix ρ determinant D = 0.137, we have:
χ2Bartlett=−[(100−1)−(2⋅4+56)]⋅ln(0.137)=192.335
with 4⋅(4−1)2=6 degrees of freedom. Therefore, by using Table D in the Appendix, we have χc2 = 12.592 (critical χ2 for 6 degrees of freedom and with a significance level of 0.05). Thus, since χBartlett2 = 192.335 > χc2 = 12.592, we can reject the null hypothesis that correlation matrix ρ is statistically equal to identity matrix I, at a significance level of 0.05.
Software packages like SPSS and Stata do not offer the χc2 for the defined degrees of freedom and a certain significance level. However, they offer the significance level of χBartlett2 for these degrees of freedom. So, instead of analyzing if χBartlett2 > χc2, we must verify if the significance level of χBartlett2 is less than 0.05 (5%) so that we can continue performing the factor analysis. Thus:
If P-value (either Sig. χBartlett2, or Prob. χBartlett2) < 0.05, correlation matrix ρ is not statistically equal to identity matrix I with the same dimension.
The significance level of χBartlett2 can be obtained in Excel by using the command Formulas → Insert Function→ CHIDIST, which will open a dialog box, as shown in Fig. 12.7.
As we can see in Fig. 12.7, the P-value of the χBartlett2 statistic is considerably less than 0.05 (χBartlett2 P-value = 8.11 × 10− 39), that is, Pearson’s correlations between the pairs of variables are statistically different from 0 and, therefore, factors can be extracted from the original variables, and the factor analysis very adequate.
Having verified the factor analysis’s overall adequacy, we can move on to the definition of the factors. In order to do that, we must initially determine the four eigenvalues λ2 (λ12 ≥ λ22 ≥ λ32 ≥ λ42) of correlation matrix ρ, which can be obtained from solving Expression (12.12). Therefore, we have:
|λ2−1−0.7560.030−0.711−0.756λ2−1−0.003−0.8090.030−0.003λ2−10.044−0.711−0.8090.044λ2−1|=0
from where we obtain:
{λ21=2.519λ22=1.000λ23=0.298λ24=0.183
Consequently, based on Expression (12.15), eigenvalue matrix Λ2 can be written as follows:
Λ2=(2.51900001.00000000.29800000.183)
Note that Expression (12.13) is satisfied, that is:
λ21+λ22+⋯+λ2k=2.519+1.000+0.298+0.183=4
Since the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, we can construct a shared variance table (Table 12.10).
Table 12.10
Factor | Eigenvalue λ2 | Shared Variance (%) | Cumulative Shared Variance (%) |
---|---|---|---|
1 | 2.519 | (2.5194)⋅100=62.975 | 62.975 |
2 | 1.000 | (1.0004)⋅100=25,010 | 87.985 |
3 | 0.298 | (0.2984)⋅100=7.444 | 95.428 |
4 | 0.183 | (0.1834)⋅100=4.572 | 100.000 |
By analyzing Table 12.10, we can say that while 62.975% of the total variance are shared to form the first factor, 25.010% are shared to form the second factor. The third and fourth factors, whose eigenvalues are less than 1, are formed through smaller proportions of shared variance. Since the most common criterion used to choose the number of factors is the latent root criterion (Kaiser criterion), in which only the factors that correspond to eigenvalues greater than 1 are taken into consideration, the researcher can choose to conduct all the subsequent analysis with only the first two factors, formed by sharing 87.985% of the total variance of the original variables, that is, with a total variance loss of 12.015%. Nonetheless, for pedagogical purposes, let’s discuss how to calculate the factor scores by determining the eigenvectors that correspond to the four eigenvalues.
Consequently, in order to define the eigenvectors of matrix ρ based on the four eigenvalues calculated, we must solve the following equation systems for each eigenvalue, based on Expressions (12.16)–(12.21):
{(2.519−1.000)⋅v11−0.756⋅v21+0.030⋅v31−0.711⋅v41=0−0.756⋅v11+(2.519−1.000)⋅v21−0.003⋅v31−0.809⋅v41=00.030⋅v11−0.003⋅v21+(2.519−1.000)⋅v31+0.044⋅v41=0−0.711⋅v11−0.809⋅v21+0.044⋅v31+(2.519−1.000)⋅v41=0
from where we obtain:
(v11v21v31v41)=(0.56410.5887−0.02670.5783)
{(1.000−1.000)⋅v12−0.756⋅v22+0.030⋅v32−0.711⋅v42=0−0.756⋅v12+(1.000−1.000)⋅v22−0.003⋅v32−0.809⋅v42=00.030⋅v12−0.003⋅v22+(1.000−1.000)⋅v32+0.044⋅v42=0−0.711⋅v12−0.809⋅v22+0.044⋅v32+(1.000−1.000)⋅v42=0
from where we obtain:
(v12v22v32v42)=(0.00680.04870.9987−0.0101)
{(0.298−1.000)⋅v13−0.756⋅v23+0.030⋅v33−0.711⋅v43=0−0.756⋅v13+(0.298−1.000)⋅v23−0.003⋅v33−0.809⋅v43=00.030⋅v13−0.003⋅v23+(0.298−1.000)⋅v33+0.044⋅v43=0−0.711⋅v13−0.809⋅v23+0.044⋅v33+(0.298−1.000)⋅v43=0
from where we obtain:
(v13v23v33v43)=(0.8008−0.2201−0.0003−0.5571)
{(0.183−1.000)⋅v14−0.756⋅v24+0.030⋅v34−0.711⋅v44=0−0.756⋅v14+(0.183−1.000)⋅v24−0.003⋅v34−0.809⋅v44=00.030⋅v14−0.003⋅v24+(0.183−1.000)⋅v34+0.044⋅v44=0−0.711⋅v14−0.809⋅v24+0.044⋅v34+(0.183−1.000)⋅v44=0
from where we obtain:
(v14v24v34v44)=(0.2012−0.77630.04250.5959)
After having determined the eigenvectors, a more inquisitive researcher may prove the relationship presented in Expression (12.27), that is:
V′⋅ρ⋅V=Λ2
(0.56410.5887−0.02670.57830.00680.04870.9987−0.01010.8008−0.2201−0.0003−0.55710.2012−0.77630.04250.5959)⋅(1.0000.756−0.0300.7110.7561.0000.0030.809−0.0300.0031.000−0.0440.7110.809−0.0441.000)⋅(0.56410.00680.80080.20120.58870.0487−0.2201−0.7763−0.02670.9987−0.00030.04250.5783−0.0101−0.55710.5959)=(2.51900001.00000000.29800000.183)
Based on Expressions (12.22)–(12.24), we can calculate the factor scores that correspond to each one of the standardized variables for each one of the factors. Thus, from Expression (12.25), we are able to write the expressions for factors F1, F2, F3, and F4, as follows:
F1i=0.5641√2.519⋅Zfinancei+0.5887√2.519⋅Zcostsi−0.267√2.519⋅Zmarketingi+0.5783√2.519⋅ZactuarialiF2i=0.0068√1.000⋅Zfinancei+0.0487√1.000⋅Zcostsi+0.9987√1.000⋅Zmarketingi−0.0101√1.000⋅ZactuarialiF3i=0.8008√0.298⋅Zfinancei−0.2201√0.298⋅Zcostsi−0.0003√0.298⋅Zmarketingi−0.5571√0.298⋅ZactuarialiF4i=0.2012√0.183⋅Zfinancei−0.7763√0.183⋅Zcostsi+0.0425√0.183⋅Zmarketingi+0.5959√0.183⋅Zactuariali
from where we obtain:
F1i=0.355⋅Zfinancei+0.371⋅Zcostsi−0.017⋅Zmarketingi+0.364⋅ZactuarialiF2i=0.007⋅Zfinancei+0.049⋅Zcostsi+0.999⋅Zmarketingi−0.010⋅ZactuarialiF3i=1.468⋅Zfinancei−0.403⋅Zcostsi−0.001⋅Zmarketingi−1.021⋅ZactuarialiF4i=0.470⋅Zfinancei−1.815⋅Zcostsi+0.099⋅Zmarketingi+1.394⋅Zactuariali
Based on the factor expressions and on the standardized variables, we can calculate the values corresponding to each factor for each observation. Table 12.11 shows these results for part of the dataset.
Table 12.11
Student | Zfinancei | Zcostsi | Zmarketingi | Zactuariali | F1i | F2i | F3i | F4i |
---|---|---|---|---|---|---|---|---|
Gabriela | − 0.011 | − 0.290 | − 1.650 | 0.273 | 0.016 | − 1.665 | − 0.176 | 0.739 |
Luiz Felipe | − 0.876 | − 0.697 | 1.532 | − 1.319 | − 1.076 | 1.503 | 0.342 | − 0.831 |
Patricia | − 0.876 | − 0.290 | − 0.590 | − 0.523 | − 0.600 | − 0.603 | − 0.634 | − 0.672 |
Gustavo | 1.334 | 1.337 | 0.825 | 1.069 | 1.346 | 0.887 | 0.327 | − 0.228 |
Leticia | − 0.779 | − 1.104 | − 0.872 | − 0.841 | − 0.978 | − 0.922 | 0.161 | 0.379 |
Ovidio | 1.334 | 2.150 | − 1.650 | 1.865 | 1.979 | − 1.553 | − 0.812 | − 0.841 |
Leonor | − 0.267 | 0.116 | 0.825 | − 0.125 | − 0.111 | 0.829 | − 0.312 | − 0.429 |
Dalila | − 0.139 | 0.523 | 0.118 | 0.273 | 0.242 | 0.139 | − 0.694 | − 0.623 |
Antonio | 0.021 | − 0.290 | − 0.590 | − 0.523 | − 0.281 | − 0.597 | 0.682 | − 0.250 |
⋮ | ||||||||
Estela | 0.982 | 0.113 | − 1.297 | 1.069 | 0.802 | − 1.293 | 0.305 | 1.616 |
Mean | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Standard deviation | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
For the first observation in the sample (Gabriela), for example, we can see that:
F1Gabriela=0.355⋅(−0.011)+0.371⋅(−0.290)−0.017⋅(−1.650)+0.364⋅(0.273)=0.016
F2Gabriela=0.007⋅(−0.011)+0.049⋅(−0.290)+0.999⋅(−1.650)−0.010⋅(0.273)=−1.665
F3Gabriela=1.468⋅(−0.011)−0.403⋅(−0.290)−0.001⋅(−1.650)−1.021⋅(0.273)=−0.176
F4Gabriela=0.470⋅(−0.011)−1.815⋅(−0.290)+0.099⋅(−1.650)+1.394⋅(0.273)=0.739
It is important to emphasize that all the factors extracted have Pearson correlations equal to 0, between themselves, that is, they are orthogonal to one another.
A more inquisitive researcher may also verify that the factor scores that correspond to each factor are exactly the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself, and as explanatory variables, the standardized variables.
Having established the factors, we can define the factor loadings, which correspond to Pearson’s correlation coefficients between the original variables and each one of the factors. Table 12.12 shows the factor loadings for the data in our example.
Table 12.12
Variable | Factor | |||
---|---|---|---|---|
F1 | F2 | F3 | F4 | |
finance | 0.895 | 0.007 | 0.437 | 0.086 |
costs | 0.934 | 0.049 | − 0.120 | − 0.332 |
marketing | − 0.042 | 0.999 | 0.000 | 0.018 |
actuarial science | 0.918 | − 0.010 | − 0.304 | 0.255 |
For each original variable, the highest value of the factor loading was highlighted in Table 12.12. Consequently, while the variables finance, costs, and actuarial show stronger correlations with the first factor, we can see that only the variable marketing shows stronger correlation with the second factor. This proves the need for a second factor in order for all the variables to share significant proportions of variance. However, the third and fourth factors present relatively low correlations with the original variables, which explains the fact that the respective eigenvalues are less than 1. If the variable marketing had not been inserted into the analysis, only the first factor would be necessary to explain the joint behavior of the other variables, and the other factors would also have respective eigenvalues less than 1.
Therefore, as discussed in Section 12.2.4, we can verify that factor loadings between factors corresponding to eigenvalues less than 1 are relatively low, since they have already shown stronger Pearson correlations with factors previously extracted from greater eigenvalues.
Based on Expression (12.30), we can see that the sum of the squared factor loadings in each column in Table 12.12 will be the respective eigenvalue that, as discussed before, can be understood as the proportion of variance shared by the four original variables to form each factor. Therefore, we have:
(0.895)2+(0.934)2+(−0.042)2+(0.918)2=2.519(0.007)2+(0.049)2+(0.999)2+(−0.010)2=1.000(0.437)2+(−0.120)2+(0.000)2+(−0.304)2=0.298(0.086)2+(−0.332)2+(0.018)2+(0.255)2=0.183
from which we can prove that the second eigenvalue only reached value 1 due to the high existing factor loading for the variable marketing.
Furthermore, from the factor loadings presented in Table 12.12, we can also calculate the communalities, which represent the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, based on Expression (12.29), we can write:
communalityfinance=(0.895)2+(0.007)2=0.802communalitycosts=(0.934)2+(0.049)2=0.875communalitymarketing=(−0.042)2+(0.999)2=1.000communalityactuarial=(0.918)2+(−0.010)2=0.843
Consequently, even though the variable marketing is the only one that has a high factor loading with the second factor, it is the variable in which the lowest proportion of variance is lost to form both factors. On the other hand, the variable finance is the one that presents the highest loss of variance to form these two factors (around 19.8%). If we had considered the factor loadings of the four factors, surely, all the communalities would be equal to 1.
As we discussed in Section 12.2.4, we can see that the factor loadings are exactly the parameters estimated in a multiple linear regression model, which shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves, in which the coefficient of determination R2 of each model is equal to the communality of the respective original variable.
Therefore, for the first two factors, we can construct a chart in which the factor loadings of each variable are plotted in each one of the orthogonal axes that represent factors F1 and F2, respectively. This chart, known as a loading plot, can be seen in Fig. 12.8.
By analyzing the loading plot, the behavior of the correlations becomes clear. While the variables finance, costs, and actuarial show high correlation with the first factor (X-axis), the variable marketing shows strong correlation with the second factor (Y-axis). More inquisitive researchers may investigate the reasons why this phenomenon occurs, since, sometimes, while the subjects Finance, Costs, and Actuarial Science are taught in a more quantitative way, Marketing can be taught in a more qualitative and behavioral manner. However, it is important to mention that the definition of factors does not force researchers to name them, because, normally, this is not a simple task. Factor analysis does not have “naming factors” as one of its goals and, in case we intend to do that, researchers need to have vast knowledge about the phenomenon being studied, and confirmatory techniques can help them in this endeavor.
At this moment, we can consider the preparation of the principal component factor analysis concluded. Nevertheless, as discussed in Section 12.2.5, if researchers wish to obtain a clearer visualization of the variables better represented by a certain factor, they can elaborate a rotation using the Varimax orthogonal method, which maximizes the loadings of each variable in a certain factor. In our example, since we already have an excellent idea of the variables with high loadings in each factor, being the loading plot (Fig. 12.8) already very clear, rotation may be considered unnecessary. Therefore, it will only be elaborated for pedagogical purposes, since, sometimes, researchers may find themselves in situations in which such phenomenon is not so clear.
Consequently, based on the factor loadings for the first two factors (first two columns of Table 12.12), we will obtain rotated factor loadings c′ after rotating both factors for an angle θ. Thus, based on Expression (12.35), we can write:
(0.8950.0070.9340.049−0.0420.9990.918−0.010)⋅(cosθ−senθsenθcosθ)=(c′11c′12c′21c′22⋮⋮c′k1c′k2)
where the counterclockwise rotation angle θ is obtained from Expression (12.36). Nevertheless, before that, we must determine the values of terms A, B, C, and D present in Expressions (12.37)–(12.40). Constructing Tables 12.13–12.16 helps us for this purpose.
Table 12.13
Variable | c1 | c2 | communality | (c21lcommunalityl−c22lcommunalityl) |
---|---|---|---|---|
finance | 0.895 | 0.007 | 0.802 | 1.000 |
costs | 0.934 | 0.049 | 0.875 | 0.995 |
marketing | − 0.042 | 0.999 | 1.000 | − 0.996 |
actuarial science | 0.918 | − 0.010 | 0.843 | 1.000 |
A (sum) | 1.998 |
Table 12.14
Variable | c1 | c2 | communality | (2⋅c1l⋅c2lcommunalityl) |
---|---|---|---|---|
finance | 0.895 | 0.007 | 0.802 | 0.015 |
costs | 0.934 | 0.049 | 0.875 | 0.104 |
marketing | − 0.042 | 0.999 | 1.000 | − 0.085 |
actuarial science | 0.918 | − 0.010 | 0.843 | − 0.022 |
B (sum) | 0.012 |
Table 12.15
Variable | c1 | c2 | communality | (c21lcommunalityl−c22lcommunalityl)2−(2⋅c1l⋅c2lcommunalityl)2 |
---|---|---|---|---|
finance | 0.895 | 0.007 | 0.802 | 1.000 |
costs | 0.934 | 0.049 | 0.875 | 0.978 |
marketing | − 0.042 | 0.999 | 1.000 | 0.986 |
actuarial science | 0.918 | − 0.010 | 0.843 | 0.999 |
C (sum) | 3.963 |
Table 12.16
Variable | c1 | c2 | communality | (c21lcommunalityl−c22lcommunalityl)⋅(2⋅c1l⋅c2lcommunalityl) |
---|---|---|---|---|
finance | 0.895 | 0.007 | 0.802 | 0.015 |
costs | 0.934 | 0.049 | 0.875 | 0.103 |
marketing | − 0.042 | 0.999 | 1.000 | 0.084 |
actuarial science | 0.918 | − 0.010 | 0.843 | − 0.022 |
D (sum) | 0.181 |
So, taking the k = 4 variables into consideration and based on Expression (12.36), we can calculate the counterclockwise rotation angle θ as follows:
θ=0.25⋅arctan[2⋅(D⋅k−A⋅B)C⋅k−(A2−B2)]=0⋅25.arctan{2⋅[(0.181)⋅4−(1.998)⋅(0.012)](3.963)⋅4−[(1.998)2−(0.012)2]}=0.029rad
And, finally, we can calculate the rotated factor loadings:
(0.8950.0070.9340.049−0.0420.9990.918−0.010)⋅(cos0.029−sen0.029sen0.029cos0.029)=(c′11c′12c′21c′22c′31c′32c′41c′42)=(0.895−0.0190.9350.021−0.0131.0000.917−0.037)
Table 12.17 shows, in a consolidated way, the rotated factor loadings through the Varimax method for the data in our example.
Table 12.17
Variable | Factor | |
---|---|---|
F1′ | F2′ | |
finance | 0.895 | − 0.019 |
costs | 0.935 | 0.021 |
marketing | − 0.013 | 1.000 |
actuarial science | 0.917 | − 0.037 |
As we have already mentioned, even though the results without the rotation already showed which variables presented high loadings in each factor, rotation ended up distributing, even if lightly for the data in our example, the variable loadings to each one of the rotated factors. A new loading plot (now with rotated loadings) can also demonstrate this situation (Fig. 12.9).
Even though the plots in Figs. 12.8 and 12.9 are very similar, since rotation angle θ is very small in this example, it is common for the researcher to find situations in which the rotation will contribute considerably for an easier understanding of the loadings, which can, consequently, simplify the interpretation of the factors.
It is important to emphasize that the rotation does not change the communalities, that is, Expression (12.31) can be verified:
communalityfinance=(0.895)2+(−0.019)2=0.802communalitycosts=(0.935)2+(0.021)2=0.875communalitymarketing=(−0.013)2+(1.000)2=1.000communalityactuarial=(0.917)2+(−0.037)2=0.843
Nonetheless, rotation changes the eigenvalues corresponding to each factor. Thus, for the two rotated factors, we have:
(0.895)2+(0.935)2+(−0.013)2+(0.917)2=λ′21=2.518(−0.019)2+(0.021)2+(1.000)2+(−0.037)2=λ′22=1.002
Table 12.18 shows, based on the new eigenvalues λ′12 and λ′22, the proportions of variance shared by the original variables to form both rotated factors.
Table 12.18
Factor | Eigenvalue λ′2 | Shared Variance (%) | Cumulative Shared Variance (%) |
---|---|---|---|
1 | 2.518 | (2.5184)⋅100=62.942 | 62.942 |
2 | 1.002 | (1.0024)⋅100=25.043 | 87.985 |
In comparison to Table 12.10, we can see that even though there is no change in the sharing of 87.985% of the total variance of the original variables to form the rotated factors, the rotation redistributes the variance shared by the variables in each factor.
As we have already discussed, the factor loadings correspond to the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves. Therefore, through algebraic operations, we can arrive at the factor scores expressions from the loadings, since they represent the estimated parameters of the respective regression models that have, as a dependent variable, the factors and, as explanatory variables, the standardized variables. Consequently, from the rotated factor loadings (Table 12.17), we arrive at the following rotated factors expressions F1′ and F2′.
F′1i=0.355⋅Zfinancei+0.372⋅Zcostsi+0.012⋅Zmarketingi+0.364⋅Zactuariali
F′2i=−0.004⋅Zfinancei+0.038⋅Zcostsi+0.999⋅Zmarketingi−0.021⋅Zactuariali
Finally, the professor wishes to develop a school performance ranking of his students. Since the two rotated factors, F1′ and F2′, are formed by the higher proportions of variance shared by the original variables (in this case, 62.942% and 25.043% of the total variance, respectively, as shown in Table 12.18) and correspond to eigenvalues greater than 1, they will be used to create the desired school performance ranking.
A well-accepted criterion that is used to form rankings from factors is known as weighted rank-sum criterion, in which, for each observation, the values of all factors obtained (that have eigenvalues greater than 1) weighted by the respective proportions of shared variance are added, with the subsequent ranking of the observations based on the results obtained. This criterion is well accepted because it considers the performance of all the original variables, since only considering the first factor (principal factor criterion) may not consider the positive performance, for instance, obtained in a certain variable that may possibly share a considerable proportion of variance with the second factor. For 10 students chosen from the sample, Table 12.19 shows the result of the school performance ranking resulting from the ranking created after the sum of the values obtained from the factors weighted by the respective proportions of shared variance.
Table 12.19
Student | Zfinancei | Zcostsi | Zmarketingi | Zactuariali | F1i′ | F2i′ | (F1i′ 0.62942) + (F2i′ 0.25043) | Ranking |
---|---|---|---|---|---|---|---|---|
Adelino | 1.30 | 2.15 | 1.53 | 1.86 | 1.959 | 1.568 | 1.626 | 1 |
Renata | 0.60 | 2.15 | 1.53 | 1.86 | 1.709 | 1.570 | 1.469 | 2 |
⋮ | ||||||||
Ovidio | 1.33 | 2.15 | − 1.65 | 1.86 | 1.932 | − 1.611 | 0.813 | 13 |
Kamal | 1.33 | 2.07 | − 1.65 | 1.86 | 1.902 | − 1.614 | 0.793 | 14 |
⋮ | ||||||||
Itamar | − 1.29 | − 0.55 | 1.53 | − 1.04 | − 1.022 | 1.536 | − 0.259 | 57 |
Luiz Felipe | − 0.88 | − 0.70 | 1.53 | − 1.32 | − 1.032 | 1.535 | − 0.265 | 58 |
⋮ | ||||||||
Gabriela | − 0.01 | − 0.29 | − 1.65 | 0.27 | − 0.032 | − 1.665 | − 0.437 | 73 |
Marina | 0.50 | − 0.50 | − 0.94 | − 1.16 | − 0.443 | − 0.939 | − 0.514 | 74 |
⋮ | ||||||||
Viviane | − 1.64 | − 1.16 | − 1.01 | − 1.00 | − 1.390 | − 1.029 | − 1.133 | 99 |
Gilmar | − 1.52 | − 1.16 | − 1.40 | − 1.44 | − 1.512 | − 1.409 | − 1.304 | 100 |
The complete ranking can be found in the file FactorGradesRanking.xls.
It is essential to highlight that the creation of performance rankings from original variables is considered to be a static procedure, since the inclusion of new observations or variables may alter the factor scores, which makes the preparation of a new factor analysis mandatory. As time goes by, the evolution of the phenomena represented by the variables may change the correlation matrix, which makes it necessary to reapply the technique in order to generate new factors obtained from more precise and updated scores. Here, therefore, we express a criticism against socioeconomic indexes that use previously established static scores for each variable when calculating the factor to be used to define the ranking in situations in which new observations are constantly included; more than this, in situations in which there is an evolution throughout time, which changes the correlation matrix of the original variables in each period.
Finally, it is worth mentioning that the factors extracted are quantitative variables and, therefore, from them, other multivariate exploratory techniques can be elaborated, such as, a cluster analysis, depending on the researcher’s objectives. Besides, each factor can also be transformed into a qualitative variable as, for example, through its categorization into ranges, established based on a certain criterion and, from then on, a correspondence analysis could be elaborated, in order to assess a possible association between the generated categories and the categories of other qualitative variables.
Factors can also be used as explanatory variables of a certain phenomenon in confirmatory multivariate models as, for instance, multiple regression models, since orthogonality eliminates multicollinearity problems. On the other hand, such procedure only makes sense when we intend to elaborate a diagnostic regarding the behavior of the dependent variable, without aiming at having forecasts. Since new observations do not show the corresponding values of the factors generated, obtaining it is only possible if we include such observations in a new factor analysis, in order to obtain new factor scores, since it is an exploratory technique.
Furthermore, a qualitative variable obtained through the categorization of a certain factor into ranges can also be inserted as a dependent variable of a multinomial logistic regression model, allowing researchers to evaluate the probabilities each observation has of being in each range, due to the behavior of other explanatory variables not initially considered in the factor analysis. We would also like to highlight that this procedure has a diagnostic nature, trying to find out the behavior of the variables in the sample for the existing observations, without a predictive purpose.
Next, this same example will be elaborated in the software packages SPSS and Stata. In Section 12.3, the procedures for preparing the principal component factor analysis in SPSS will be presented, as well as their results. In Section 12.4, the commands for running the technique in Stata will be presented, with their respective outputs.