Chapter 12

Principal Component Factor Analysis

Abstract

This chapter discusses the circumstances in which the principal component factor analysis technique can be used. Several factor analysis concepts, calculations, and interpretations are presented: the concept of factor; an evaluation of the factor analysis’s overall adequacy through the KMO statistic and Bartlett’s test of sphericity; concepts of eigenvalues and eigenvectors in Pearson’s correlation matrices; the calculation and interpretation of factor scores, and, from these, the definition of factors; the calculation and interpretation of factor loadings and communalities; the construction of loading plots; concepts of factor rotation and preparation of the Varimax orthogonal rotation; construction of performance rankings from the joint behavior of variables. The principal component factor analysis technique is elaborated algebraically and also by using IBM SPSS Statistics Software® and Stata Statistical Software®, and their results will be interpreted.

Keywords

Principal component factor analysis; KMO statistic; Bartlett’s test of sphericity; Eigenvalues and eigenvectors in Pearson’s correlation matrices; Factor scores; Factor loadings and communalities; Factor rotation; Varimax orthogonal rotation; SPSS and Stata Software

Love and truth are so intertwined that it is practically impossible to disentangle and separate them.

They are like the two sides of a coin.

Mahatma Gandhi

12.1 Introduction

Exploratory factor analysis techniques are very useful when we intend to work with variables that have, between themselves, relatively high correlation coefficients and one wishes to establish new variables that capture the joint behavior of the original variables. Each one of these new variables is called factor, which can be understood as the cluster of variables from criteria previously established. Therefore, factor analysis is a multivariate technique that tries to identify a relatively small number of factors that represent the joint behavior of interdependent original variables. Thus, while cluster analysis, studied in the previous chapter, uses distance or similarity measures to group observations and form clusters, factor analysis uses correlation coefficients to group variables and generate factors.

Among the methods used to determine factors, the one known as principal components is, without a doubt, the most widely used in factor analysis, because it is based on the assumption that uncorrelated factors can be extracted from linear combinations of the original variables. Consequently, from a set of original variables correlated to one another, the principal component factor analysis allows another set of variables (factors) resulting from the linear combination of the first set to be determined.

Even though, as we know, the term confirmatory factor analysis often appears in the existing literature, factor analysis is essentially an exploratory multivariate technique, or an interdependence, since it does not have a predictive nature for other observations not initially present in the sample, and the inclusion of new observations in the dataset makes it necessary to reapply the technique, so that more accurate and updated new factors can be generated. According to Reis (2001), factor analysis can be used with the main exploratory goal of reducing the data dimension, aiming at creating factors from the original variables, as well as with the objective of confirming an initial hypothesis that the data may be reduced to a certain factor, or a certain dimension, which was previously established. Regardless of the objective, factor analysis will continue to be exploratory. If researchers aim to use a technique to, in fact, confirm the relationships found in the factor analysis, they can use structural equation modeling, for instance.

The principal component factor analysis has four main objectives: (1) to identify correlations between the original variables to create factors that represent the linear combination of those variables (structural reduction); (2) to verify the validity of previously established constructs, bearing in mind the allocation of the original variables to each factor; (3) to prepare rankings by generating performance indexes from the factors; and (4) to extract orthogonal factors for future use in confirmatory multivariate techniques that need the absence of multicollinearity.

Imagine that a researcher is interested in studying the interdependence between several quantitative variables that translate the socioeconomic behavior of a nation’s municipalities. In this situation, factors that may possibly explain the behavior of the original variables can be determined, and, in this regard, the factor analysis is used to reduce the data structurally and, later on, to create a socioeconomic index that captures the joint behavior of these variables. From this index, we may even propose a performance ranking of the municipalities, and the factors themselves can be used in a possible cluster analysis.

In another situation, factors extracted from the original variables can be used as explanatory variables of another variable (dependent), not initially considered in the analysis. For example, factors obtained from the joint behavior of grades in certain 12th grade subjects can be used as explanatory variables of students’ general classification in the college entrance exams, or whether students passed the exams or not. In these situations, note that the factors (orthogonal to one another) are used, instead of the original variables themselves, as explanatory variables of a certain phenomenon in confirmatory multivariate models, such as, multiple or logistic regression, in order to eliminate possible multicollinearity problems. Nevertheless, it is important to highlight that this procedure only makes sense when we intend to elaborate a diagnostic regarding the dependent variable’s behavior, without aiming at having forecasts for other observations not initially present in the sample. Since new observations do not have the corresponding values of the factors generated, obtaining these values is only possible if we include such observations in a new factor analysis.

In a third situation, imagine that a retailer is interested in assessing their clients’ level of satisfaction by applying a questionnaire in which the questions have been previously classified into certain groups. For instance, questions A, B, and C were classified into the group quality of services rendered, questions D and E, into the group positive perception of prices, and questions F, G, H, and I, into the group variety of goods. After applying the questionnaire to a significant number of customers, in which these nine variables are collected by attributing scores that vary from 0 to 10, the retailer has decided to elaborate a principal component factor analysis to verify if, in fact, the combination of variables reflects the construct previously established. If this occurs, the factor analysis will have been used to validate the construct, presenting a confirmatory objective.

In all of these situations, we can see that the original variables from which the factors will be extracted are quantitative, because a factor analysis begins with the study of the behavior of Pearson’s correlation coefficients between the variables. Nonetheless, it is common for researchers to use the incorrect arbitrary weighting procedure with qualitative variables, as, for example, variables on the Likert scale, and, from then on, to apply a factor analysis. This is a serious error! There are exploratory techniques meant exclusively for studying the behavior of qualitative variables as, for instance, the correspondence analysis and homogeneity analysis, and a factor analysis is definitely not meant for such purpose, as discussed by Fávero and Belfiore (2017).

In a historical context, the development of factor analyses is partly due to Pearson’s (1896) and Spearman’s (1904) pioneer work. While Karl Pearson developed a rigorous mathematical treatment regarding what we traditionally call correlation at the beginning of the 20th century, Charles Edward Spearman published highly original work in which the interrelationships between students’ performance in several subjects, such as, French, English, Mathematics and Music were evaluated. Since the grades in these subjects showed strong correlation, Spearman proposed that scores resulting from apparently incompatible tests shared a single general factor, and students who got good grades had a more developed psychological or intelligence component. Generally speaking, Spearman excelled in applying mathematical methods and correlation studies to the analysis of the human mind.

Decades later, in 1933, Harold Hotelling, a statistician, mathematician, and influential economics theoretician decided to call Principal Component Analysis the analysis that determines components from the maximization of the original data’s variance. Also in the first half of the 20th century, psychologist Louis Leon Thurstone, from an investigation of Spearman’s ideas and based on the application of certain psychological tests, whose results were submitted to a factor analysis, identified people’s seven primary mental abilities: spatial visualization, verbal meaning, verbal fluency, perceptual speed, numerical ability, reasoning, and rote memory. In psychology, the term mental factors is even used for variables that have greater influence over a certain behavior.

Currently, factor analysis is used in several fields of knowledge, such as, marketing, economics, strategy, finance, accounting, actuarial science, engineering, logistics, psychology, medicine, ecology and biostatistics, among others.

The principal component factor analysis must be defined based on the underlying theory and on the researcher’s experience, so that it can be possible to apply the technique correctly and to analyze the results obtained.

In this chapter, we will discuss the principal component factor analysis technique, with the following objectives: (1) to introduce the concepts; (2) to present the step by step of modeling in an algebraic and practical way; (3) to interpret the results obtained; and (4) to show the application of the technique in SPSS and Stata. Following the logic proposed in the book, first, we develop the algebraic solution of an example linked to the presentation of the concepts. Only after introducing these concepts, we present and discuss the procedures for running the technique in SPSS and Stata be presented.

12.2 Principal Component Factor Analysis

Many are the procedures inherent to the factor analysis, with different methods for determining (extraction) factors from Pearson’s correlation matrix. The most frequently used method, which was adopted in this chapter for extracting factors, is known as principal components, in which the consequent structural reduction is also called Karhunen-Loève transformation.

In the following sections, we will discuss the theoretical development of the technique, as well as a practical example. While the main concepts will be presented in Sections 12.2.112.2.5, Section 12.2.6 is meant for solving a practical example algebraically, from a dataset.

12.2.1 Pearson’s Linear Correlation and the Concept of Factor

Let’s imagine a dataset that has n observations and, for each observation i (i = 1, …, n), values corresponding to each one of the k metric variables X, as shown in Table 12.1.

Table 12.1

General Dataset Model for Developing a Factor Analysis
Observation iX1iX2iXki
1X11X21Xk1
2X12X22Xk2
3X13X23Xk3
nX1nX2nXkn

Table 12.1

From the dataset, and given our intention of extracting factors from k variables X, we must define correlation matrix ρ that displays the values of Pearson’s linear correlation between each pair of variables, as shown in Expression (12.1).

ρ=(1ρ12ρ1kρ211ρ2kρk1ρk21)

si30_e  (12.1)

Correlation matrix ρ is symmetrical in relation to the main diagonal that, obviously, shows values equal to 1. For example, for variables X1 and X2, Pearson’s correlation ρ12 can be calculated by using Expression (12.2).

ρ12=ni=1(X1iˉX1)(X2iˉX2)ni=1(X1iˉX1)2ni=1(X2iˉX2)2

si31_e  (12.2)

where ˉX1si32_e and ˉX2si33_e represent the means of variables X1 and X2, respectively, and this expression is analogous to Expression (4.11), defined in Chapter 4.

Thus, since Pearson’s correlation is a measure of the level of linear relationship between two metric variables, which may vary between − 1 and 1, a value closer to one of these extreme values indicates the existence of a linear relationship between the two variables under analysis, which, therefore, may significantly contribute to the extraction of a single factor. On the other hand, a Pearson correlation that is very close to 0 indicates that the linear relationship between the two variables is practically nonexistent. Therefore, different factors can be extracted.

Let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). A three-dimensional scatter plot can be constructed from the values of each variable for each observation. The plot can be seen in Fig. 12.1.

Fig.12.1
Fig.12.1 Three-dimensional scatter plot for a hypothetical situation with three variables.

Only based on the visual analysis of the chart in Fig. 12.1, it is difficult to assess the behavior of the linear relationships between each pair of variables. Thus, Fig. 12.2 shows the projection of the points that correspond to each observation in each one of the planes formed by the pairs of variables, highlighting, in the dotted line, the adjustment that represents the linear relationship between the respective variables.

Fig. 12.2
Fig. 12.2 Projection of the points in each plane formed by a certain pair of variables. (A) Relationship between X1 and X2: positive and very high Pearson correlation. (B) Relationship between X1 and X3: Pearson correlation very close to 0. (C) Relationship between X2 and X3: Pearson correlation very close to 0.

While Fig. 12.2A shows that there is a significant linear relationship between variables X1 and X2 (a very high Pearson correlation), Fig. 12.2B and C make it very clear that there is no linear relationship between X3 and these variables. Fig. 12.3 displays these projections in a three-dimensional plot, with the respective linear adjustments in each plane (the dotted lines).

Fig. 12.3
Fig. 12.3 Projection of the points in a three-dimensional plot with linear adjustments per plane.

Thus, in this hypothetical example, while variables X1 and X2 may be represented by a single factor in a very significant way, which we will call F1, variable X3 may be represented by another factor, F2, orthogonal to F1. Fig. 12.4 illustrates the extraction of these new factors in a three-dimensional way.

Fig. 12.4
Fig. 12.4 Factor extraction.

So, factors can be understood as representations of latent dimensions that explain the behavior of the original variables.

Having presented these initial concepts, it is important to emphasize that in many cases researchers may choose to not extract a factor represented in a considerable way by only one variable (in this case, factor F2), and what will define the extraction of each one of the factors is the calculation of the eigenvalues from correlation matrix ρ, as we will study in Section 12.2.3. Nevertheless, before that, it will be necessary to check the overall adequacy of the factor analysis, which will be discussed in the following section.

12.2.2 Overall Adequacy of the Factor Analysis: Kaiser-Meyer-Olkin Statistic and Bartlett’s Test of Sphericity

An adequate extraction of factors from the original variables requires correlation matrix ρ to have relatively high and statistically significant values. As discussed by Hair et al. (2009), even though visually analyzing correlation matrix ρ does not reveal if the factor extraction will in fact be adequate, a significant number of values less than 0.30 represent a preliminary indication that the factor analysis may not be adequate.

In order to verify the overall adequacy of the factor extraction itself, we must use the Kaiser-Meyer-Olkin statistic (KMO) and Bartlett’s test of sphericity.

The KMO statistic gives us the proportion of variance considered common to all the variables in the sample under analysis, that is, which can be attributed to the existence of a common factor. This statistic varies from 0 to 1 and, while values closer to 1 indicate that the variables share a very high proportion of variance (high Pearson correlations), values closer to 0 are a result of low Pearson correlations between the variables, which may indicate that the factor analysis will not be adequate. The KMO statistic, presented initially by Kaiser (1970), can be calculated through Expression (12.3).

KMO=kl=1kc=1ρ2lckl=1kc=1ρ2lc+kl=1kc=1φ2lc,lc

si34_e  (12.3)

where l and c represent the rows and columns of correlation matrix ρ, respectively, and the terms φ represent the partial correlation coefficients between two variables. While Pearson’s correlation coefficients ρ are also called zero-order correlation coefficients, partial correlation coefficients φ are also known as higher-order correlation coefficients. For three variables, they are also called first-order correlation coefficients, for four variables, second-order correlation coefficients, and so on.

Let’s imagine a hypothetical situation in which a certain dataset shows three variables once again (k = 3). Is it possible that, in fact, ρ12 reflects the level of linear relationship between X1 and X2 if variable X3 is related to the other two? In this situation, ρ12 may not represent the true level of linear relationship between X1 and X2 when X3 is present, which may provide a false impression regarding the nature of the relationship between the first two. Thus, partial correlation coefficients may contribute with the analysis, since, according to Gujarati and Porter (2008), they are used when researchers wish to find out the correlation between two variables, either by controlling or ignoring the effects of other variables present in the dataset. For our hypothetical situation, it is the correlation coefficient regardless of X3’s influence over X1 and X2, if any.

Hence, for three variables X1, X2, and X3, we can define the first-order correlation coefficients the following way:

φ12,3=ρ12ρ13ρ23(1ρ213)(1ρ223)

si35_e  (12.4)

where φ12,3 represents the correlation between X1 and X2, maintaining X3 constant,

φ13,2=ρ13ρ12ρ23(1ρ212)(1ρ223)

si36_e  (12.5)

where φ13,2 represents the correlation between X1 and X3, maintaining X2 constant, and

φ23,1=ρ23ρ12ρ13(1ρ212)(1ρ213)

si37_e  (12.6)

where φ23,1 represents the correlation between X2 and X3, maintaining X1 constant.

In general, a first-order correlation coefficient can be obtained through the following expression:

φab,c=ρabρacρbc(1ρ2ac)(1ρ2bc)

si38_e  (12.7)

where a, b, and c can assume values 1, 2, or 3, corresponding to the three variables under analysis.

Conversely, for a case in which there are four variables in the analysis, the general expression of a certain partial correlation coefficient (second-order correlation coefficient) is given by:

φab,cd=φab,cφad,cφbd,c(1φ2ad,c)(1φ2bd,c)

si39_e  (12.8)

where φab,cd represents the correlation between Xa and Xb, maintaining Xc and Xd constant, bearing in mind that a, b, c, and d may take on values 1, 2, 3, or 4, which correspond to the four variables under analysis.

Obtaining higher-order correlation coefficients, in which five or more variables are considered in the analysis, should always be done based on the determination of lower-order partial correlation coefficients. In Section 12.2.6, we will propose a practical example by using four variables, in which the algebraic solution of the KMO statistic will be obtained through Expression (12.8).

It is important to highlight that, even if Pearson’s correlation coefficient between two variables is 0, the partial correlation coefficient between them may not be equal to 0, depending on the values of Pearson’s correlation coefficients between each one of these variables and the others present in the dataset.

In order for a factor analysis to be considered adequate, the partial correlation coefficients between the variables must be low. This fact denotes that the variables share a high proportion of variance, and disregarding one or more of them in the analysis may hamper the quality of the factor extraction. Therefore, according to a widely accepted criterion found in the existing literature, Table 12.2 gives us an indication of the relationship between the KMO statistic and the overall adequacy of the factor analysis.

Table 12.2

Relationship Between the KMO Statistic and the Overall Adequacy of the Factor Analysis
KMO StatisticOverall Adequacy of the Factor Analysis
Between 1.00 and 0.90Marvelous
Between 0.90 and 0.80Meritorious
Between 0.80 and 0.70Middling
Between 0.70 and 0.60Mediocre
Between 0.60 and 0.50Miserable
Less than 0.50Unacceptable

On the other hand, Bartlett’s test of sphericity (Bartlett, 1954) consists in comparing correlation matrix ρ to an identity matrix I of the same dimension. If the differences between the corresponding values outside the main diagonal of each matrix are not statistically different from 0, at a certain significance level, we may consider that the factor extraction will not be adequate. In other words, in this case, Pearson’s correlations between each pair of variables are statistically equal to 0, which makes any attempt of performing a factor extraction from the original variables unfeasible. So, we can define the null and alternative hypotheses of Bartlett’s test of sphericity the following way:

H0:ρ=(1ρ12ρ1kρ211ρ2kρk1ρk21)=I=(100010001)

si40_e

H1:ρ=(1ρ12ρ1kρ211ρ2kρk1ρk21)I=(100010001)

si41_e

The statistic corresponding to Bartlett’s test of sphericity is an χ2 statistic, which has the following expression:

χ2Bartlett=[(n1)(2k+56)]ln|D|

si42_e  (12.9)

with k(k1)2si43_e degrees of freedom. We know that n is the sample size and k is the number of variables. In addition, D represents the determinant of correlation matrix ρ.

Thus, for a certain number of degrees of freedom and a certain significance level, Bartlett’s test of sphericity allows us to check if the total value of the χBartlett2 statistic is higher than the statistic’s critical value. If this is true, we may state that Pearson’s correlations between the pairs of variables are statistically different from 0 and that, therefore, factors can be extracted from the original variables and the factor analysis is adequate. When we develop a practical example in Section 12.2.6, we will also discuss the calculations of the χBartlett2 statistic and the result of Bartlett’s test of sphericity.

It is important to emphasize that we should always favor Bartlett’s test of sphericity over the KMO statistic to take a decision about the factor analysis’s overall adequacy. Given that the former is a test with a certain significance level, and the latter is only a coefficient (a statistic) calculated without any set distribution of probabilities or hypotheses that allow us to evaluate the corresponding significance level to make a decision.

In addition, it is important to mention that for only two original variables the KMO statistic will always be equal to 0.50. Conversely, the χBartlett2 statistic may indicate if the null hypothesis of the test of sphericity was rejected or not, depending on the magnitude of Pearson’s correlation between both variables. Thus, while the KMO statistic will be 0.50 in these situations, Bartlett’s test of sphericity will allow researchers to decide whether to extract one factor from the two original variables or not. In contrast, for three original variables, it is very common for researchers to extract two factors with the statistical significance of Bartlett’s test of sphericity, however, with the KMO statistic less than 0.50. These two situations emphasize even more the greater relevance of Bartlett’s test of sphericity in relation to the KMO statistic in the decision-making process.

Finally, we must mention that the recommendation to study Cronbach’s alpha’s magnitude, before studying the overall adequacy of the factor analysis, is commonly found in the existing literature, so that the reliability with which a factor can be extracted from original variables can be evaluated. We would like to highlight that Cronbach’s alpha only offers researchers indications of the internal consistency of the variables in the dataset so that a single factor can be extracted. Therefore, determining it is not a mandatory requisite for developing the factor analysis, since this technique allows the extraction of most factors. Nevertheless, for pedagogical purposes, we will discuss the main concepts of Cronbach’s alpha in the Appendix of this chapter, with its algebraic determination and corresponding applications in SPSS and Stata software.

Having discussed these concepts and verified the overall adequacy of the factor analysis, we can now move on to the definition of the factors.

12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores

Since a factor represents the linear combination of the original variables, for k variables, we can define a maximum number of k factors (F1, F2, …, Fk), analogous to the maximum number of clusters that can be defined from a sample with n observations, as we discussed in the previous chapter, since a factor can also be understood as the result of the clustering of variables. Therefore, for k variables, we have:

F1i=s11X1i+s21X2i++sk1XkiF2i=s12X1i+s22X2i++sk2XkiFki=s1kX1i+s2kX2i++skkXki

si44_e  (12.10)

where the terms s are known as factor scores, which represent the parameters of a linear model that relates a certain factor to the original variables. Calculating the factor scores is essential in the context of the factor analysis technique and is elaborated by determining the eigenvalues and eigenvectors of correlation matrix ρ. In Expression (12.11), we once again show correlation matrix ρ, which has already been presented in Expression (12.1).

ρ=(1ρ12ρ1kρ211ρ2kρk1ρk21)

si30_e  (12.11)

This correlation matrix, with dimensions k × k, shows k eigenvalues λ2 (λ12 ≥ λ22 ≥ … ≥ λk2), which can be obtained from solving the following equation:

det(λ2Iρ)=0

si46_e  (12.12)

where I is the identity matrix, also with dimensions k × k.

Since a certain factor represents the result of the clustering of variables, it is important to highlight that:

λ21+λ22++λ2k=k

si47_e  (12.13)

Expression (12.12) can be rewritten as follows:

|λ21ρ12ρ1kρ21λ21ρ2kρk1ρk2λ21|=0

si48_e  (12.14)

from which we can define the eigenvalue matrix Λ2 the following way:

Λ2=(λ21000λ22000λ2k)

si49_e  (12.15)

In order to define the eigenvectors of matrix ρ based on the eigenvalues, we must solve the following equation system for each eigenvalue λ2 (λ12, λ22, …, λk2):

  •  Determining eigenvectors v11, v21, …, vk1 from the first eigenvalue (λ12):

(λ211ρ12ρ1kρ21λ211ρ2kρk1ρk2λ211)(v11v21vk1)=(000)

si50_e  (12.16)

from where we obtain:

{(λ211)v11ρ12v21ρ1kvk1=0ρ21v11+(λ211)v21ρ2kvk1=0ρk1v11ρk2v21+(λ211)vk1=0

si51_e  (12.17)

  •  Determining eigenvectors v12, v22, …, vk2 from the second eigenvalue (λ22):

(λ221ρ12ρ1kρ21λ221ρ2kρk1ρk2λ221)(v12v22vk2)=(000)

si52_e  (12.18)

from where we obtain:

{(λ221)v12ρ12v22ρ1kvk2=0ρ21v12+(λ221)v22ρ2kvk2=0ρk1v12ρk2v22+(λ221)vk2=0

si53_e  (12.19)

  •  Determining eigenvectors v1k, v2k, …, vkk from the kth eigenvalue (λk2):

(λ2k1ρ12ρ1kρ21λ2k1ρ2kρk1ρk2λ2k1)(v1kv2kvkk)=(000)

si54_e  (12.20)

from where we obtain:

{(λ2k1)v1kρ12v2kρ1kvkk=0ρ21v1k+(λ2k1)v2kρ2kvkk=0ρk1v1kρk2v2k+(λ2k1)vkk=0

si55_e  (12.21)

Thus, we can calculate the factor scores of each factor by determining the eigenvalues and eigenvectors of correlation matrix ρ. The factor scores vectors can be defined as follows:

  •  Factor scores of the first factor:

S1=(s11s21sk1)=(v11λ21v21λ21vk1λ21)

si56_e  (12.22)

  •  Factor scores of the second factor:

S2=(s12s22sk2)=(v12λ22v22λ22vk2λ22)

si57_e  (12.23)

  •  Factor scores of the kth factor:

Sk=(s1ks2kskk)=(v1kλ2kv2kλ2kvkkλ2k)

si58_e  (12.24)

Since the factor scores of each factor are standardized by the respective eigenvalues, the factors of the set of equations presented in Expression (12.10) must be obtained by multiplying each factor score by the corresponding original variable, standardized by using the Z-scores procedure. Thus, we can obtain each one of the factors based on the following equations:

F1i=v11λ21ZX1i+v21λ21ZX2i++vk1λ21ZXkiF2i=v12λ22ZX1i+v22λ22ZX2i++vk2λ22ZXkiFki=v1kλ2kZX1i+v2kλ2kZX2i++vkkλ2kZXki

si59_e  (12.25)

where ZXi represents the standardized value of each variable X for a certain observation i. It is important to emphasize that all the factors extracted show, between themselves, Pearson correlations equal to 0, that is, they are orthogonal to one another.

A more perceptive researcher will notice that the factor scores of each factor correspond exactly to the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself and, as explanatory variables, the standardized variables.

Mathematically, it is also possible to verify the existing relationship between the eigenvectors, correlation matrix ρ, and eigenvalue matrix Λ2. Consequently, defining eigenvector matrix V as follows:

V=(v11v12v1kv21v22v2kvk1vk2vkk)

si60_e  (12.26)

we can prove that:

V'ρV=Λ2

si61_e  (12.27)

or:

(v11v21vk1v12v22vk2v1kv2kvkk)(1ρ12ρ1kρ211ρ2kρk1ρk21)(v11v12v1kv21v22v2kvk1vk2vkk)=(λ21000λ22000λ2k)

si62_e  (12.28)

In Section 12.2.6, we will discuss a practical example from which this relationship may be demonstrated.

While in Section 12.2.2, we discussed the factor analysis’s overall adequacy, in this section, we will discuss the procedures for carrying out the factor extraction, if the technique is considered adequate. Even knowing that the maximum number of factors is also equal to k for k variables, it is essential for researchers to define, based on a certain criterion, the adequate number of factors that, in fact, represent the original variables. In our hypothetical example in Section 12.2.1, we saw that only two factors (F1 and F2) would be enough to represent the three original variables (X1, X2, and X3).

Although researchers are free to determine the number of factors to be extracted in the analysis, in a preliminary way, since they may wish to verify the validity of a previously established construct (procedure known as a priori criterion), for instance, it is essential to carry out an analysis based on the magnitude of the eigenvalues calculated from correlation matrix ρ.

As the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, as we will discuss in Section 12.2.4, since λ12 ≥ λ22 ≥ … ≥ λk2 and bearing in mind that factors F1, F2, …, Fk are obtained from the respective eigenvalues, factors extracted from smaller eigenvalues are formed from smaller proportions of variance shared by the original variables. Since a factor represents a certain cluster of variables, factors extracted from eigenvalues less than 1 may possibly not be able to represent the behavior of a single original variable (of course there are exceptions to this rule, which occur in cases in which a certain eigenvalue is less than, but also very close to 1). The criterion for choosing the number of factors, in which only the factors that correspond to eigenvalues greater than 1 are considered, is often used and known as the latent root criterion or Kaiser criterion.

The factor extraction method presented in this chapter is known as principal components, and the first factor F1, formed by the highest proportion of variance shared by the original variables, is also called principal factor. This method is often mentioned in the existing literature and is used in practical applications whenever researchers wish to elaborate a structural reduction of the data in order to create orthogonal factors, to define observation rankings by using the factors generated, and even to confirm the validity of previously established constructs. Other factor extraction methods, such as, the generalized least squares, unweighted least squares, maximum likelihood, alpha factoring, and image factoring, have different criteria and certain specificities and, even though they can also be found in the existing literature, they will not be discussed in this book.

Moreover, it is common to discuss the need to apply the factor analysis to variables that have multivariate normal distribution, in order to show consistency when determining the factor scores. Nevertheless, it is important to emphasize that multivariate normality is a very rigid assumption, only necessary for a few factor extraction methods, such as, the maximum likelihood method. Most factor extraction methods do not require the assumption of data multivariate normality and, as discussed by Gorsuch (1983), the principal component factor analysis seems to be, in practice, very robust against breaks in normality.

12.2.4 Factor Loadings and Communalities

Having established the factors, we can now define the factor loadings, which simply are Pearson correlations between the original variables and each one of the factors. Table 12.3 shows the factor loadings for each variable-factor pair.

Table 12.3

Factor Loadings Between Original Variables and Factors
VariableFactor
F1F2Fk
X1c11c12c1k
X2c21c22c2k
Xkck1ck2ckk

Table 12.3

Based on the latent root criterion (in which only factors resulting from eigenvalues greater than 1 are considered), we assume that the factor loadings between the factors that correspond to eigenvalues less than 1 and all the original variables are low, since they will have already presented higher Pearson correlations (loadings) with factors previously extracted from greater eigenvalues. In the same way, original variables that only share a small portion of variance with the other variables will have high factor loadings in only a single factor. If this occurs for all original variables, there will not be significant differences between correlation matrix ρ and identity matrix I, making the χBartlett2 statistic very low. This fact allows us to state that the factor analysis will not be adequate, and, in this situation, researchers may choose not to extract factors from the original variables.

As the factor loadings are Pearson’s correlations between each variable and each factor, the sum of the squares of these loadings in each row of Table 12.3 will always be equal to 1, since each variable shares part of its proportion of variance with all the k factors, and the sum of the proportions of variance (factor loadings or squared Pearson correlations) will be 100%.

Conversely, if less than k factors are extracted, due to the latent root criterion, the sum of the squared factor loadings in each row will not be equal to 1. This sum is called communality, which represents the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, we can say that:

c211+c212+=communalityX1c221+c222+=communalityX2c2k1+c2k2+=communalityXk

si63_e  (12.29)

The main objective of the analysis of communalities is to check if any variable ends up not sharing a significant proportion of variance with the factors extracted. Even though there is no cutoff point from which a certain communality can be considered high or low, since the sample size can interfere in this assessment, the existence of considerably low communalities in relation to the others can indicate to researchers that they may need to reconsider including the respective variable into the factor analysis.

Therefore, after defining the factors based on the factor scores, we can state that the factor loadings will be exactly the same as the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable ZX and, as explanatory variables, the factors themselves, and the coefficient of determination R2 of each model is equal to the communality of the respective original variable.

The sum of the squared factor loadings in each column of Table 12.3, on the other hand, will be equal to the respective eigenvalue, since the ratio between each eigenvalue and the total number of variables can be understood as the proportion of variance shared by all k original variables to form each factor. So, we can say that:

c211+c221++c2k1=λ21c212+c222++c2k2=λ22c21k+c22k++c2kk=λ2k

si64_e  (12.30)

After establishing the factors and the calculation of the factor loadings, it is also possible for some variables to have intermediate (neither very high nor very low) Pearson correlations (factor loadings) with all the factors extracted, although its communality is relatively not so low. In this case, although the solution of the factor analysis has already been obtained in an adequate way and considered concluded, researchers can, in the cases in which the factor loadings table shows intermediate values for one or more variables in all the factors, elaborate a rotation of these factors, so that Pearson’s correlations between the original variables and the new factors generated can be increased. In the following section, we will discuss factor rotation.

12.2.5 Factor Rotation

Once again, let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). After preparing the principal component factor analysis, two factors, orthogonal to one another, are extracted, with factor loadings (Pearson correlations) with each one of the three original variables, according to Table 12.4.

Table 12.4

Factor Loadings Between Three Variables and Two Factors
VariableFactor
F1F2
X1c11c12
X2c21c22
X3c31c32

Table 12.4

In order to construct a chart with the relative positions of each variable in each factor (a chart known as loading plot), we can consider the factor loadings to be coordinates (abscissas and ordinates) of the variables in a Cartesian plane formed by both orthogonal factors. The plot can be seen in Fig. 12.5.

Fig. 12.5
Fig. 12.5 Loading plot for a hypothetical situation with three variables and two factors.

In order to better visualize the variables better represented by a certain factor, we can think about a rotation around the origin of the originally extracted factors F1 and F2, so that we can bring the points corresponding to variables X1, X2, and X3 closer to one of the new factors. These are called rotated factors F1′ and F2′. Fig. 12.6 shows this process in a simplified way.

Fig. 12.6
Fig. 12.6 Defining the rotated factors from the factors original.

Based on Fig. 12.6, for each variable under analysis, we can see that while the loading for one factor increases, for the other, it decreases. Table 12.5 shows the loading redistribution for our hypothetical situation.

Table 12.5

Original and Rotated Factor Loadings for Our Hypothetical Situation
VariableFactor
Original Factor LoadingsRotated Factor Loadings
F1F2F1F2
X1c11c12| c11′| > | c11 || c12′| < | c12 |
X2c21c22| c21′| > | c21 || c22′| < | c22 |
X3c31c32| c31′| < | c31 || c32′| > | c32 |

Table 12.5

Thus, for a generic situation, we can say that rotation is a procedure that maximizes the loadings of each variable in a certain factor, to the detriment of the others. In this regard, the final effect of rotation is the redistribution of factor loadings to factors that initially had smaller proportions of variance shared by all the original variables. The main objective is to minimize the number of variables with high loadings in a certain factor, since each one of the factors will start having more significant loadings only with some of the original variables. Consequently, rotation may simplify the interpretation of the factors.

Despite the fact that communalities and the total proportion of variance shared by all the variables in all the factors are not modified by the rotation (and neither are the KMO statistic or χBartlett2), the proportion of variance shared by the original variables in each factor is redistributed and, therefore, modified. In other words, new eigenvalues are set λ′(λ1′,  λ2′,  …,  λk′) from the rotated factor loadings. Thus, we can say that:

c211+c212+=communalityX1c221+c222+=communalityX2c2k1+c2k2+=communalityXk

si65_e  (12.31)

and that:

c211+c221++c2k1=λ21λ21c212+c222++c2k2=λ22λ22c21k+c22k++c2kk=λ2kλ2k

si66_e  (12.32)

even if Expression (12.13) is respected, that is:

λ21+λ22++λ2k=λ21+λ22++λ2k=k

si67_e  (12.33)

Besides, new rotated factor scores are obtained from the rotation of factors, s′, such that the final expressions of the rotated factors will be:

F1i=s11ZX1i+s21ZX2i++sk1ZXkiF2i=s12ZX1i+s22ZX2i++sk2ZXkiFki=s1kZX1i+s2kZX2i++skkZXki

si68_e  (12.34)

It is important to highlight that the overall adequacy of the factor analysis (KMO statistic and Bartlett’s test of sphericity) is not altered by the rotation, since correlation matrix ρ continues the same.

Even though there are several factor rotation methods, the orthogonal rotation method, also known as Varimax, whose main purpose is to minimize the number of variables that have high loadings on a certain factor through the redistribution of the factor loadings and maximization of the variance shared in factors that correspond to lower eigenvalues, is the most frequently used and will be used in this chapter to solve a practical example. That is where the name Varimax comes from. This method was proposed by Kaiser (1958).

The algorithm behind the Varimax rotation method consists in determining a rotation angle θ in which pairs of factors are equally rotated. Thus, as discussed by Harman (1976), for a certain pair of factors F1 and F2, for example, the rotated factor loadings c’ between the two factors and the k original variables are obtained from the original factor loadings c, through the following matrix multiplication:

(c11c12c21c22ck1ck2)(cosθsenθsenθcosθ)=(c11c12c21c22ck1ck2)

si69_e  (12.35)

where θ, the counterclockwise rotation angle, is obtained by the following expression:

θ=0.25arctan[2(DkAB)Ck(A2B2)]

si70_e  (12.36)

where:

A=kl=1(c21lcommunalitylc22lcommunalityl)

si71_e  (12.37)

B=kl=1(2c1lc2lcommunalityl)

si72_e  (12.38)

C=kl=1[(c21lcommunalitylc22lcommunalityl)2(2c1lc2lcommunalityl)2]

si73_e  (12.39)

D=kl=1[(c21lcommunalitylc22lcommunalityl)(2c1lc2lcommunalityl)]

si74_e  (12.40)

In Section 12.2.6, we will use these Varimax rotation method expressions to determine the rotated factor loadings from the original loadings.

Besides Varimax, we can also mention other orthogonal rotation methods, such as, Quartimax and Equamax, even though they are less frequently mentioned in the existing literature and less used in practice. In addition to them, the researcher may also use oblique rotation methods, in which nonorthogonal factors are generated. Although they are not discussed in this chapter, we should also mention the Direct Oblimin and Promax methods in this category.

Since oblique rotation methods can sometimes be used when we wish to validate a certain construct, whose initial factors are not correlated, we recommend that an orthogonal rotation method be used so that factors extracted in other multivariate techniques can be used later, such as, certain confirmatory models, in which the lack of multicollinearity of the explanatory variables is a mandatory premise.

12.2.6 A Practical Example of the Principal Component Factor Analysis

Imagine that the same professor, deeply engaged in academic and pedagogical activities, is now interested in studying how his students’ grades behave so that, afterwards, he can propose the creation of a school performance ranking.

In order to do that, he collected information on the final grades, which vary from 0 to 10, of each one of his 100 students in the following subjects: Finance, Costs, Marketing, and Actuarial Science. Part of the dataset can be seen in Table 12.6.

Table 12.6

Example: Final Grades in Finance, Costs, Marketing, and Actuarial Science
StudentFinal Grade in Finance
(X1i)
Final Grade in Costs
(X2i)
Final Grade in Marketing
(X3i)
Final Grade in Actuarial Science
(X4i)
Gabriela5.84.01.06.0
Luiz Felipe3.13.010.02.0
Patricia3.14.04.04.0
Gustavo10.08.08.08.0
Leticia3.42.03.23.2
Ovidio10.010.01.010.0
Leonor5.05.08.05.0
Dalila5.46.06.06.0
Antonio5.94.04.04.0
Estela8.95.02.08.0

Table 12.6

The complete dataset can be found in the file FactorGrades.xls. Through this dataset, it is possible to construct Table 12.7, which shows Pearson’s correlation coefficients between each pair of variables, calculated by using the logic presented in Expression (12.2).

Table 12.7

Pearson’s Correlation Coefficients for Each Pair of Variables
financecostsmarketingactuarial science
finance1.0000.756− 0.0300.711
costs0.7561.0000.0030.809
marketing− 0.0300.0031.000− 0.044
actuarial science0.7110.809− 0.0441.000

Table 12.7

Therefore, we can write the expression of the correlation matrix ρ as follows:

ρ=(1ρ12ρ13ρ14ρ211ρ23ρ24ρ31ρ321ρ34ρ41ρ42ρ431)=(1.0000.7560.0300.7110.7561.0000.0030.8090.0300.0031.0000.0440.7110.8090.0441.000)

si75_e

which has determinant D = 0.137.

By analyzing correlation matrix ρ, it is possible to verify that only the grades corresponding to the variable marketing do not have correlations with the grades in the other subjects, represented by the other variables. On the other hand, these show relatively high correlations with one another (0.756 between finance and costs, 0.711 between finance and actuarial, and 0.809 between costs and actuarial), which indicates that they may share significant variance to form one factor. Although this preliminary analysis is important, it cannot represent more than a simple diagnostic, since the overall adequacy of the factor analysis needs to be evaluated based on the KMO statistic and, mainly, by using the result of Bartlett’s test of sphericity.

As we discussed in Section 12.2.2, the KMO statistic provides the proportion of variance considered common to all the variables present in the analysis, and, in order to establish its calculation, we need to determine partial correlation coefficients φ between each pair of variables. In this case, it will be second-order correlation coefficients, since we are working with four variables simultaneously.

Consequently, based on Expression (12.7), first, we need to determine the first-order correlation coefficients used to calculate of the second-order correlation coefficients. Table 12.8 shows these coefficients.

Table 12.8

First-Order Correlation Coefficients
φ12,3=ρ12ρ13ρ23(1ρ213)(1ρ223)=0.756si1_eφ13,2=ρ13ρ12ρ23(1ρ212)(1ρ223)=0.049si2_eφ14,2=ρ14ρ12ρ24(1ρ212)(1ρ224)=0.258si3_e
φ14,3=ρ14ρ13ρ34(1ρ213)(1ρ234)=0.711si4_eφ23,1=ρ23ρ12ρ13(1ρ212)(1ρ213)=0.039si5_eφ24,1=ρ24ρ12ρ14(1ρ212)(1ρ214)=0.590si6_e
φ24,3=ρ24ρ23ρ34(1ρ223)(1ρ234)=0.810si7_eφ34,1=ρ34ρ13ρ14(1ρ213)(1ρ214)=0.033si8_eφ34,2=ρ34ρ23ρ24(1ρ223)(1ρ224)=0.080si9_e

Hence, from these coefficients and by using Expression (12.8), we can calculate the second-order correlation coefficients considered in the KMO statistic’s expression. Table 12.9 shows these coefficients.

Table 12.9

Second-Order Correlation Coefficients
φ12,34=φ12,3φ14,3φ24,3(1φ214,3)(1φ224,3)=0.438si10_e
φ13,24=φ13,2φ14,2φ34,2(1φ214,2)(1φ234,2)=0.029si11_eφ23,14=φ23,1φ24,1φ34,1(1φ224,1)(1φ234,1)=0.072si12_e
φ14,23=φ14,2φ13,2φ34,2(1φ213,2)(1φ234,2)=0.255si13_eφ24,13=φ24,1φ23,1φ34,1(1φ223,1)(1φ234,1)=0.592si14_eφ34,12=φ34,1φ23,1φ24,1(1φ223,1)(1φ224,1)=0.069si15_e

So, based on Expression (12.3), we can calculate the KMO statistic. The terms of the expression are given by:

kl=1kc=1ρ2lc=(0.756)2+(0.030)2+(0.711)2+(0.003)2+(0.809)2+(0.044)2=1.734

si76_e

kl=1kc=1φ2lc=(0.438)2+(0.029)2+(0.255)2+(0.072)2+(0.592)2+(0.069)2=0.619

si77_e

from where we obtain:

KMO=1.7341.734+0.619=0.737

si78_e

Based on the criterion presented in Table 12.2, the value of the KMO statistic suggests that the overall adequacy of the factor analysis is middling. To test whether, in fact, correlation matrix ρ is statistically different from identity matrix I with the same dimension, we must use Bartlett’s test of sphericity, whose χBartlett2 statistic is given by Expression (12.9). For n = 100 observations, k = 4 variables, and correlation matrix ρ determinant D = 0.137, we have:

χ2Bartlett=[(1001)(24+56)]ln(0.137)=192.335

si79_e

with 4(41)2=6si80_e degrees of freedom. Therefore, by using Table D in the Appendix, we have χc2 = 12.592 (critical χ2 for 6 degrees of freedom and with a significance level of 0.05). Thus, since χBartlett2 = 192.335 > χc2 = 12.592, we can reject the null hypothesis that correlation matrix ρ is statistically equal to identity matrix I, at a significance level of 0.05.

Software packages like SPSS and Stata do not offer the χc2 for the defined degrees of freedom and a certain significance level. However, they offer the significance level of χBartlett2 for these degrees of freedom. So, instead of analyzing if χBartlett2 > χc2, we must verify if the significance level of χBartlett2 is less than 0.05 (5%) so that we can continue performing the factor analysis. Thus:

If P-value (either Sig. χBartlett2, or Prob. χBartlett2) < 0.05, correlation matrix ρ is not statistically equal to identity matrix I with the same dimension.

The significance level of χBartlett2 can be obtained in Excel by using the command Formulas → Insert Function→ CHIDIST, which will open a dialog box, as shown in Fig. 12.7.

Fig. 12.7
Fig. 12.7 Obtaining the significance level of χ2 (command Insert Function).

As we can see in Fig. 12.7, the P-value of the χBartlett2 statistic is considerably less than 0.05 (χBartlett2 P-value = 8.11 × 10− 39), that is, Pearson’s correlations between the pairs of variables are statistically different from 0 and, therefore, factors can be extracted from the original variables, and the factor analysis very adequate.

Having verified the factor analysis’s overall adequacy, we can move on to the definition of the factors. In order to do that, we must initially determine the four eigenvalues λ2 (λ12 ≥ λ22 ≥ λ32 ≥ λ42) of correlation matrix ρ, which can be obtained from solving Expression (12.12). Therefore, we have:

|λ210.7560.0300.7110.756λ210.0030.8090.0300.003λ210.0440.7110.8090.044λ21|=0

si81_e

from where we obtain:

{λ21=2.519λ22=1.000λ23=0.298λ24=0.183

si82_e

Consequently, based on Expression (12.15), eigenvalue matrix Λ2 can be written as follows:

Λ2=(2.51900001.00000000.29800000.183)

si83_e

Note that Expression (12.13) is satisfied, that is:

λ21+λ22++λ2k=2.519+1.000+0.298+0.183=4

si84_e

Since the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, we can construct a shared variance table (Table 12.10).

Table 12.10

Variance Shared by the Original Variables to Form Each Factor
FactorEigenvalue λ2Shared Variance (%)Cumulative Shared Variance (%)
12.519(2.5194)100=62.975si16_e62.975
21.000(1.0004)100=25,010si17_e87.985
30.298(0.2984)100=7.444si18_e95.428
40.183(0.1834)100=4.572si19_e100.000

Table 12.10

By analyzing Table 12.10, we can say that while 62.975% of the total variance are shared to form the first factor, 25.010% are shared to form the second factor. The third and fourth factors, whose eigenvalues are less than 1, are formed through smaller proportions of shared variance. Since the most common criterion used to choose the number of factors is the latent root criterion (Kaiser criterion), in which only the factors that correspond to eigenvalues greater than 1 are taken into consideration, the researcher can choose to conduct all the subsequent analysis with only the first two factors, formed by sharing 87.985% of the total variance of the original variables, that is, with a total variance loss of 12.015%. Nonetheless, for pedagogical purposes, let’s discuss how to calculate the factor scores by determining the eigenvectors that correspond to the four eigenvalues.

Consequently, in order to define the eigenvectors of matrix ρ based on the four eigenvalues calculated, we must solve the following equation systems for each eigenvalue, based on Expressions (12.16)(12.21):

  •  Determining eigenvectors v11, v21, v31, v41 from the first eigenvalue (λ12 = 2.519):

{(2.5191.000)v110.756v21+0.030v310.711v41=00.756v11+(2.5191.000)v210.003v310.809v41=00.030v110.003v21+(2.5191.000)v31+0.044v41=00.711v110.809v21+0.044v31+(2.5191.000)v41=0

si85_e

from where we obtain:

(v11v21v31v41)=(0.56410.58870.02670.5783)

si86_e

  •  Determining eigenvectors v12, v22, v32, v42 from the second eigenvalue (λ22 = 1.000):

{(1.0001.000)v120.756v22+0.030v320.711v42=00.756v12+(1.0001.000)v220.003v320.809v42=00.030v120.003v22+(1.0001.000)v32+0.044v42=00.711v120.809v22+0.044v32+(1.0001.000)v42=0

si87_e

from where we obtain:

(v12v22v32v42)=(0.00680.04870.99870.0101)

si88_e

  •  Determining eigenvectors v13, v23, v33, v43 from the third eigenvalue (λ32 = 0.298):

{(0.2981.000)v130.756v23+0.030v330.711v43=00.756v13+(0.2981.000)v230.003v330.809v43=00.030v130.003v23+(0.2981.000)v33+0.044v43=00.711v130.809v23+0.044v33+(0.2981.000)v43=0

si89_e

from where we obtain:

(v13v23v33v43)=(0.80080.22010.00030.5571)

si90_e

  •  Determining eigenvectors v14, v24, v34, v44 from the fourth eigenvalue (λ42 = 0.183):

{(0.1831.000)v140.756v24+0.030v340.711v44=00.756v14+(0.1831.000)v240.003v340.809v44=00.030v140.003v24+(0.1831.000)v34+0.044v44=00.711v140.809v24+0.044v34+(0.1831.000)v44=0

si91_e

from where we obtain:

(v14v24v34v44)=(0.20120.77630.04250.5959)

si92_e

After having determined the eigenvectors, a more inquisitive researcher may prove the relationship presented in Expression (12.27), that is:

VρV=Λ2

si93_e

(0.56410.58870.02670.57830.00680.04870.99870.01010.80080.22010.00030.55710.20120.77630.04250.5959)(1.0000.7560.0300.7110.7561.0000.0030.8090.0300.0031.0000.0440.7110.8090.0441.000)(0.56410.00680.80080.20120.58870.04870.22010.77630.02670.99870.00030.04250.57830.01010.55710.5959)=(2.51900001.00000000.29800000.183)

si94_e

Based on Expressions (12.22)(12.24), we can calculate the factor scores that correspond to each one of the standardized variables for each one of the factors. Thus, from Expression (12.25), we are able to write the expressions for factors F1, F2, F3, and F4, as follows:

F1i=0.56412.519Zfinancei+0.58872.519Zcostsi0.2672.519Zmarketingi+0.57832.519ZactuarialiF2i=0.00681.000Zfinancei+0.04871.000Zcostsi+0.99871.000Zmarketingi0.01011.000ZactuarialiF3i=0.80080.298Zfinancei0.22010.298Zcostsi0.00030.298Zmarketingi0.55710.298ZactuarialiF4i=0.20120.183Zfinancei0.77630.183Zcostsi+0.04250.183Zmarketingi+0.59590.183Zactuariali

si95_e

from where we obtain:

F1i=0.355Zfinancei+0.371Zcostsi0.017Zmarketingi+0.364ZactuarialiF2i=0.007Zfinancei+0.049Zcostsi+0.999Zmarketingi0.010ZactuarialiF3i=1.468Zfinancei0.403Zcostsi0.001Zmarketingi1.021ZactuarialiF4i=0.470Zfinancei1.815Zcostsi+0.099Zmarketingi+1.394Zactuariali

si96_e

Based on the factor expressions and on the standardized variables, we can calculate the values corresponding to each factor for each observation. Table 12.11 shows these results for part of the dataset.

Table 12.11

Calculation of the Factors for Each Observation
StudentZfinanceiZcostsiZmarketingiZactuarialiF1iF2iF3iF4i
Gabriela− 0.011− 0.290− 1.6500.2730.016− 1.665− 0.1760.739
Luiz Felipe− 0.876− 0.6971.532− 1.319− 1.0761.5030.342− 0.831
Patricia− 0.876− 0.290− 0.590− 0.523− 0.600− 0.603− 0.634− 0.672
Gustavo1.3341.3370.8251.0691.3460.8870.327− 0.228
Leticia− 0.779− 1.104− 0.872− 0.841− 0.978− 0.9220.1610.379
Ovidio1.3342.150− 1.6501.8651.979− 1.553− 0.812− 0.841
Leonor− 0.2670.1160.825− 0.125− 0.1110.829− 0.312− 0.429
Dalila− 0.1390.5230.1180.2730.2420.139− 0.694− 0.623
Antonio0.021− 0.290− 0.590− 0.523− 0.281− 0.5970.682− 0.250
Estela0.9820.113− 1.2971.0690.802− 1.2930.3051.616
Mean0.0000.0000.0000.0000.0000.0000.0000.000
Standard deviation1.0001.0001.0001.0001.0001.0001.0001.000

Table 12.11

For the first observation in the sample (Gabriela), for example, we can see that:

F1Gabriela=0.355(0.011)+0.371(0.290)0.017(1.650)+0.364(0.273)=0.016

si97_e

F2Gabriela=0.007(0.011)+0.049(0.290)+0.999(1.650)0.010(0.273)=1.665

si98_e

F3Gabriela=1.468(0.011)0.403(0.290)0.001(1.650)1.021(0.273)=0.176

si99_e

F4Gabriela=0.470(0.011)1.815(0.290)+0.099(1.650)+1.394(0.273)=0.739

si100_e

It is important to emphasize that all the factors extracted have Pearson correlations equal to 0, between themselves, that is, they are orthogonal to one another.

A more inquisitive researcher may also verify that the factor scores that correspond to each factor are exactly the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself, and as explanatory variables, the standardized variables.

Having established the factors, we can define the factor loadings, which correspond to Pearson’s correlation coefficients between the original variables and each one of the factors. Table 12.12 shows the factor loadings for the data in our example.

Table 12.12

Factor Loadings (Pearson’s Correlation Coefficients) Between Variables and Factors
VariableFactor
F1F2F3F4
finance0.8950.0070.4370.086
costs0.9340.049− 0.120− 0.332
marketing− 0.0420.9990.0000.018
actuarial science0.918− 0.010− 0.3040.255

Table 12.12

For each original variable, the highest value of the factor loading was highlighted in Table 12.12. Consequently, while the variables finance, costs, and actuarial show stronger correlations with the first factor, we can see that only the variable marketing shows stronger correlation with the second factor. This proves the need for a second factor in order for all the variables to share significant proportions of variance. However, the third and fourth factors present relatively low correlations with the original variables, which explains the fact that the respective eigenvalues are less than 1. If the variable marketing had not been inserted into the analysis, only the first factor would be necessary to explain the joint behavior of the other variables, and the other factors would also have respective eigenvalues less than 1.

Therefore, as discussed in Section 12.2.4, we can verify that factor loadings between factors corresponding to eigenvalues less than 1 are relatively low, since they have already shown stronger Pearson correlations with factors previously extracted from greater eigenvalues.

Based on Expression (12.30), we can see that the sum of the squared factor loadings in each column in Table 12.12 will be the respective eigenvalue that, as discussed before, can be understood as the proportion of variance shared by the four original variables to form each factor. Therefore, we have:

(0.895)2+(0.934)2+(0.042)2+(0.918)2=2.519(0.007)2+(0.049)2+(0.999)2+(0.010)2=1.000(0.437)2+(0.120)2+(0.000)2+(0.304)2=0.298(0.086)2+(0.332)2+(0.018)2+(0.255)2=0.183

si101_e

from which we can prove that the second eigenvalue only reached value 1 due to the high existing factor loading for the variable marketing.

Furthermore, from the factor loadings presented in Table 12.12, we can also calculate the communalities, which represent the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, based on Expression (12.29), we can write:

communalityfinance=(0.895)2+(0.007)2=0.802communalitycosts=(0.934)2+(0.049)2=0.875communalitymarketing=(0.042)2+(0.999)2=1.000communalityactuarial=(0.918)2+(0.010)2=0.843

si102_e

Consequently, even though the variable marketing is the only one that has a high factor loading with the second factor, it is the variable in which the lowest proportion of variance is lost to form both factors. On the other hand, the variable finance is the one that presents the highest loss of variance to form these two factors (around 19.8%). If we had considered the factor loadings of the four factors, surely, all the communalities would be equal to 1.

As we discussed in Section 12.2.4, we can see that the factor loadings are exactly the parameters estimated in a multiple linear regression model, which shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves, in which the coefficient of determination R2 of each model is equal to the communality of the respective original variable.

Therefore, for the first two factors, we can construct a chart in which the factor loadings of each variable are plotted in each one of the orthogonal axes that represent factors F1 and F2, respectively. This chart, known as a loading plot, can be seen in Fig. 12.8.

Fig. 12.8
Fig. 12.8 Loading plot.

By analyzing the loading plot, the behavior of the correlations becomes clear. While the variables finance, costs, and actuarial show high correlation with the first factor (X-axis), the variable marketing shows strong correlation with the second factor (Y-axis). More inquisitive researchers may investigate the reasons why this phenomenon occurs, since, sometimes, while the subjects Finance, Costs, and Actuarial Science are taught in a more quantitative way, Marketing can be taught in a more qualitative and behavioral manner. However, it is important to mention that the definition of factors does not force researchers to name them, because, normally, this is not a simple task. Factor analysis does not have “naming factors” as one of its goals and, in case we intend to do that, researchers need to have vast knowledge about the phenomenon being studied, and confirmatory techniques can help them in this endeavor.

At this moment, we can consider the preparation of the principal component factor analysis concluded. Nevertheless, as discussed in Section 12.2.5, if researchers wish to obtain a clearer visualization of the variables better represented by a certain factor, they can elaborate a rotation using the Varimax orthogonal method, which maximizes the loadings of each variable in a certain factor. In our example, since we already have an excellent idea of the variables with high loadings in each factor, being the loading plot (Fig. 12.8) already very clear, rotation may be considered unnecessary. Therefore, it will only be elaborated for pedagogical purposes, since, sometimes, researchers may find themselves in situations in which such phenomenon is not so clear.

Consequently, based on the factor loadings for the first two factors (first two columns of Table 12.12), we will obtain rotated factor loadings c′ after rotating both factors for an angle θ. Thus, based on Expression (12.35), we can write:

(0.8950.0070.9340.0490.0420.9990.9180.010)(cosθsenθsenθcosθ)=(c11c12c21c22ck1ck2)

si103_e

where the counterclockwise rotation angle θ is obtained from Expression (12.36). Nevertheless, before that, we must determine the values of terms A, B, C, and D present in Expressions (12.37)(12.40). Constructing Tables 12.1312.16 helps us for this purpose.

Table 12.13

Obtaining Term A to Calculate Rotation Angle θ
Variablec1c2communality(c21lcommunalitylc22lcommunalityl)si20_e
finance0.8950.0070.8021.000
costs0.9340.0490.8750.995
marketing− 0.0420.9991.000− 0.996
actuarial science0.918− 0.0100.8431.000
A (sum)1.998

Table 12.13

Table 12.14

Obtaining Term B to Calculate Rotation Angle θ
Variablec1c2communality(2c1lc2lcommunalityl)si21_e
finance0.8950.0070.8020.015
costs0.9340.0490.8750.104
marketing− 0.0420.9991.000− 0.085
actuarial science0.918− 0.0100.843− 0.022
B (sum)0.012

Table 12.14

Table 12.15

Obtaining Term C to Calculate Rotation Angle θ
Variablec1c2communality(c21lcommunalitylc22lcommunalityl)2(2c1lc2lcommunalityl)2si22_e
finance0.8950.0070.8021.000
costs0.9340.0490.8750.978
marketing− 0.0420.9991.0000.986
actuarial science0.918− 0.0100.8430.999
C (sum)3.963

Table 12.15

Table 12.16

Obtaining Term D to Calculate Rotation Angle θ
Variablec1c2communality(c21lcommunalitylc22lcommunalityl)(2c1lc2lcommunalityl)si23_e
finance0.8950.0070.8020.015
costs0.9340.0490.8750.103
marketing− 0.0420.9991.0000.084
actuarial science0.918− 0.0100.843− 0.022
D (sum)0.181

Table 12.16

So, taking the k = 4 variables into consideration and based on Expression (12.36), we can calculate the counterclockwise rotation angle θ as follows:

θ=0.25arctan[2(DkAB)Ck(A2B2)]=025.arctan{2[(0.181)4(1.998)(0.012)](3.963)4[(1.998)2(0.012)2]}=0.029rad

si104_e

And, finally, we can calculate the rotated factor loadings:

(0.8950.0070.9340.0490.0420.9990.9180.010)(cos0.029sen0.029sen0.029cos0.029)=(c11c12c21c22c31c32c41c42)=(0.8950.0190.9350.0210.0131.0000.9170.037)

si105_e

Table 12.17 shows, in a consolidated way, the rotated factor loadings through the Varimax method for the data in our example.

Table 12.17

Rotated Factor Loadings Through the Varimax Method
VariableFactor
F1F2
finance0.895− 0.019
costs0.9350.021
marketing− 0.0131.000
actuarial science0.917− 0.037

Table 12.17

As we have already mentioned, even though the results without the rotation already showed which variables presented high loadings in each factor, rotation ended up distributing, even if lightly for the data in our example, the variable loadings to each one of the rotated factors. A new loading plot (now with rotated loadings) can also demonstrate this situation (Fig. 12.9).

Fig. 12.9
Fig. 12.9 Loading plot with rotated loadings.

Even though the plots in Figs. 12.8 and 12.9 are very similar, since rotation angle θ is very small in this example, it is common for the researcher to find situations in which the rotation will contribute considerably for an easier understanding of the loadings, which can, consequently, simplify the interpretation of the factors.

It is important to emphasize that the rotation does not change the communalities, that is, Expression (12.31) can be verified:

communalityfinance=(0.895)2+(0.019)2=0.802communalitycosts=(0.935)2+(0.021)2=0.875communalitymarketing=(0.013)2+(1.000)2=1.000communalityactuarial=(0.917)2+(0.037)2=0.843

si106_e

Nonetheless, rotation changes the eigenvalues corresponding to each factor. Thus, for the two rotated factors, we have:

(0.895)2+(0.935)2+(0.013)2+(0.917)2=λ21=2.518(0.019)2+(0.021)2+(1.000)2+(0.037)2=λ22=1.002

si107_e

Table 12.18 shows, based on the new eigenvalues λ12 and λ22, the proportions of variance shared by the original variables to form both rotated factors.

Table 12.18

Variance Shared by the Original Variables to Form Both Rotated Factors
FactorEigenvalue λ′2Shared Variance (%)Cumulative Shared Variance (%)
12.518(2.5184)100=62.942si24_e62.942
21.002(1.0024)100=25.043si25_e87.985

Table 12.18

In comparison to Table 12.10, we can see that even though there is no change in the sharing of 87.985% of the total variance of the original variables to form the rotated factors, the rotation redistributes the variance shared by the variables in each factor.

As we have already discussed, the factor loadings correspond to the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves. Therefore, through algebraic operations, we can arrive at the factor scores expressions from the loadings, since they represent the estimated parameters of the respective regression models that have, as a dependent variable, the factors and, as explanatory variables, the standardized variables. Consequently, from the rotated factor loadings (Table 12.17), we arrive at the following rotated factors expressions F1′ and F2′.

F1i=0.355Zfinancei+0.372Zcostsi+0.012Zmarketingi+0.364Zactuariali

si108_e

F2i=0.004Zfinancei+0.038Zcostsi+0.999Zmarketingi0.021Zactuariali

si109_e

Finally, the professor wishes to develop a school performance ranking of his students. Since the two rotated factors, F1′ and F2′, are formed by the higher proportions of variance shared by the original variables (in this case, 62.942% and 25.043% of the total variance, respectively, as shown in Table 12.18) and correspond to eigenvalues greater than 1, they will be used to create the desired school performance ranking.

A well-accepted criterion that is used to form rankings from factors is known as weighted rank-sum criterion, in which, for each observation, the values of all factors obtained (that have eigenvalues greater than 1) weighted by the respective proportions of shared variance are added, with the subsequent ranking of the observations based on the results obtained. This criterion is well accepted because it considers the performance of all the original variables, since only considering the first factor (principal factor criterion) may not consider the positive performance, for instance, obtained in a certain variable that may possibly share a considerable proportion of variance with the second factor. For 10 students chosen from the sample, Table 12.19 shows the result of the school performance ranking resulting from the ranking created after the sum of the values obtained from the factors weighted by the respective proportions of shared variance.

Table 12.19

School Performance Ranking Through the Weighted Rank-Sum Criterion
StudentZfinanceiZcostsiZmarketingiZactuarialiF1iF2i(F1i′ 0.62942) + (F2i′ 0.25043)Ranking
Adelino1.302.151.531.861.9591.5681.6261
Renata0.602.151.531.861.7091.5701.4692
Ovidio1.332.15− 1.651.861.932− 1.6110.81313
Kamal1.332.07− 1.651.861.902− 1.6140.79314
Itamar− 1.29− 0.551.53− 1.04− 1.0221.536− 0.25957
Luiz Felipe− 0.88− 0.701.53− 1.32− 1.0321.535− 0.26558
Gabriela− 0.01− 0.29− 1.650.27− 0.032− 1.665− 0.43773
Marina0.50− 0.50− 0.94− 1.16− 0.443− 0.939− 0.51474
Viviane− 1.64− 1.16− 1.01− 1.00− 1.390− 1.029− 1.13399
Gilmar− 1.52− 1.16− 1.40− 1.44− 1.512− 1.409− 1.304100

Table 12.19

The complete ranking can be found in the file FactorGradesRanking.xls.

It is essential to highlight that the creation of performance rankings from original variables is considered to be a static procedure, since the inclusion of new observations or variables may alter the factor scores, which makes the preparation of a new factor analysis mandatory. As time goes by, the evolution of the phenomena represented by the variables may change the correlation matrix, which makes it necessary to reapply the technique in order to generate new factors obtained from more precise and updated scores. Here, therefore, we express a criticism against socioeconomic indexes that use previously established static scores for each variable when calculating the factor to be used to define the ranking in situations in which new observations are constantly included; more than this, in situations in which there is an evolution throughout time, which changes the correlation matrix of the original variables in each period.

Finally, it is worth mentioning that the factors extracted are quantitative variables and, therefore, from them, other multivariate exploratory techniques can be elaborated, such as, a cluster analysis, depending on the researcher’s objectives. Besides, each factor can also be transformed into a qualitative variable as, for example, through its categorization into ranges, established based on a certain criterion and, from then on, a correspondence analysis could be elaborated, in order to assess a possible association between the generated categories and the categories of other qualitative variables.

Factors can also be used as explanatory variables of a certain phenomenon in confirmatory multivariate models as, for instance, multiple regression models, since orthogonality eliminates multicollinearity problems. On the other hand, such procedure only makes sense when we intend to elaborate a diagnostic regarding the behavior of the dependent variable, without aiming at having forecasts. Since new observations do not show the corresponding values of the factors generated, obtaining it is only possible if we include such observations in a new factor analysis, in order to obtain new factor scores, since it is an exploratory technique.

Furthermore, a qualitative variable obtained through the categorization of a certain factor into ranges can also be inserted as a dependent variable of a multinomial logistic regression model, allowing researchers to evaluate the probabilities each observation has of being in each range, due to the behavior of other explanatory variables not initially considered in the factor analysis. We would also like to highlight that this procedure has a diagnostic nature, trying to find out the behavior of the variables in the sample for the existing observations, without a predictive purpose.

Next, this same example will be elaborated in the software packages SPSS and Stata. In Section 12.3, the procedures for preparing the principal component factor analysis in SPSS will be presented, as well as their results. In Section 12.4, the commands for running the technique in Stata will be presented, with their respective outputs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset