Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12

Principal Component Factor Analysis

Abstract

This chapter discusses the circumstances in which the principal component factor analysis technique can be used. Several factor analysis concepts, calculations, and interpretations are presented: the concept of factor; an evaluation of the factor analysis’s overall adequacy through the KMO statistic and Bartlett’s test of sphericity; concepts of eigenvalues and eigenvectors in Pearson’s correlation matrices; the calculation and interpretation of factor scores, and, from these, the definition of factors; the calculation and interpretation of factor loadings and communalities; the construction of loading plots; concepts of factor rotation and preparation of the Varimax orthogonal rotation; construction of performance rankings from the joint behavior of variables. The principal component factor analysis technique is elaborated algebraically and also by using IBM SPSS Statistics Software® and Stata Statistical Software®, and their results will be interpreted.

Keywords

Principal component factor analysis; KMO statistic; Bartlett’s test of sphericity; Eigenvalues and eigenvectors in Pearson’s correlation matrices; Factor scores; Factor loadings and communalities; Factor rotation; Varimax orthogonal rotation; SPSS and Stata Software

Love and truth are so intertwined that it is practically impossible to disentangle and separate them.

They are like the two sides of a coin.

Mahatma Gandhi

12.1 Introduction

Exploratory factor analysis techniques are very useful when we intend to work with variables that have, between themselves, relatively high correlation coefficients and one wishes to establish new variables that capture the joint behavior of the original variables. Each one of these new variables is called factor, which can be understood as the cluster of variables from criteria previously established. Therefore, factor analysis is a multivariate technique that tries to identify a relatively small number of factors that represent the joint behavior of interdependent original variables. Thus, while cluster analysis, studied in the previous chapter, uses distance or similarity measures to group observations and form clusters, factor analysis uses correlation coefficients to group variables and generate factors.

Among the methods used to determine factors, the one known as principal components is, without a doubt, the most widely used in factor analysis, because it is based on the assumption that uncorrelated factors can be extracted from linear combinations of the original variables. Consequently, from a set of original variables correlated to one another, the principal component factor analysis allows another set of variables (factors) resulting from the linear combination of the first set to be determined.

Even though, as we know, the term confirmatory factor analysis often appears in the existing literature, factor analysis is essentially an exploratory multivariate technique, or an interdependence, since it does not have a predictive nature for other observations not initially present in the sample, and the inclusion of new observations in the dataset makes it necessary to reapply the technique, so that more accurate and updated new factors can be generated. According to Reis (2001), factor analysis can be used with the main exploratory goal of reducing the data dimension, aiming at creating factors from the original variables, as well as with the objective of confirming an initial hypothesis that the data may be reduced to a certain factor, or a certain dimension, which was previously established. Regardless of the objective, factor analysis will continue to be exploratory. If researchers aim to use a technique to, in fact, confirm the relationships found in the factor analysis, they can use structural equation modeling, for instance.

The principal component factor analysis has four main objectives: (1) to identify correlations between the original variables to create factors that represent the linear combination of those variables (structural reduction); (2) to verify the validity of previously established constructs, bearing in mind the allocation of the original variables to each factor; (3) to prepare rankings by generating performance indexes from the factors; and (4) to extract orthogonal factors for future use in confirmatory multivariate techniques that need the absence of multicollinearity.

Imagine that a researcher is interested in studying the interdependence between several quantitative variables that translate the socioeconomic behavior of a nation’s municipalities. In this situation, factors that may possibly explain the behavior of the original variables can be determined, and, in this regard, the factor analysis is used to reduce the data structurally and, later on, to create a socioeconomic index that captures the joint behavior of these variables. From this index, we may even propose a performance ranking of the municipalities, and the factors themselves can be used in a possible cluster analysis.

In another situation, factors extracted from the original variables can be used as explanatory variables of another variable (dependent), not initially considered in the analysis. For example, factors obtained from the joint behavior of grades in certain 12th grade subjects can be used as explanatory variables of students’ general classification in the college entrance exams, or whether students passed the exams or not. In these situations, note that the factors (orthogonal to one another) are used, instead of the original variables themselves, as explanatory variables of a certain phenomenon in confirmatory multivariate models, such as, multiple or logistic regression, in order to eliminate possible multicollinearity problems. Nevertheless, it is important to highlight that this procedure only makes sense when we intend to elaborate a diagnostic regarding the dependent variable’s behavior, without aiming at having forecasts for other observations not initially present in the sample. Since new observations do not have the corresponding values of the factors generated, obtaining these values is only possible if we include such observations in a new factor analysis.

In a third situation, imagine that a retailer is interested in assessing their clients’ level of satisfaction by applying a questionnaire in which the questions have been previously classified into certain groups. For instance, questions A, B, and C were classified into the group quality of services rendered, questions D and E, into the group positive perception of prices, and questions F, G, H, and I, into the group variety of goods. After applying the questionnaire to a significant number of customers, in which these nine variables are collected by attributing scores that vary from 0 to 10, the retailer has decided to elaborate a principal component factor analysis to verify if, in fact, the combination of variables reflects the construct previously established. If this occurs, the factor analysis will have been used to validate the construct, presenting a confirmatory objective.

In all of these situations, we can see that the original variables from which the factors will be extracted are quantitative, because a factor analysis begins with the study of the behavior of Pearson’s correlation coefficients between the variables. Nonetheless, it is common for researchers to use the incorrect arbitrary weighting procedure with qualitative variables, as, for example, variables on the Likert scale, and, from then on, to apply a factor analysis. This is a serious error! There are exploratory techniques meant exclusively for studying the behavior of qualitative variables as, for instance, the correspondence analysis and homogeneity analysis, and a factor analysis is definitely not meant for such purpose, as discussed by Fávero and Belfiore (2017).

In a historical context, the development of factor analyses is partly due to Pearson’s (1896) and Spearman’s (1904) pioneer work. While Karl Pearson developed a rigorous mathematical treatment regarding what we traditionally call correlation at the beginning of the 20th century, Charles Edward Spearman published highly original work in which the interrelationships between students’ performance in several subjects, such as, French, English, Mathematics and Music were evaluated. Since the grades in these subjects showed strong correlation, Spearman proposed that scores resulting from apparently incompatible tests shared a single general factor, and students who got good grades had a more developed psychological or intelligence component. Generally speaking, Spearman excelled in applying mathematical methods and correlation studies to the analysis of the human mind.

Decades later, in 1933, Harold Hotelling, a statistician, mathematician, and influential economics theoretician decided to call Principal Component Analysis the analysis that determines components from the maximization of the original data’s variance. Also in the first half of the 20th century, psychologist Louis Leon Thurstone, from an investigation of Spearman’s ideas and based on the application of certain psychological tests, whose results were submitted to a factor analysis, identified people’s seven primary mental abilities: spatial visualization, verbal meaning, verbal fluency, perceptual speed, numerical ability, reasoning, and rote memory. In psychology, the term mental factors is even used for variables that have greater influence over a certain behavior.

Currently, factor analysis is used in several fields of knowledge, such as, marketing, economics, strategy, finance, accounting, actuarial science, engineering, logistics, psychology, medicine, ecology and biostatistics, among others.

The principal component factor analysis must be defined based on the underlying theory and on the researcher’s experience, so that it can be possible to apply the technique correctly and to analyze the results obtained.

In this chapter, we will discuss the principal component factor analysis technique, with the following objectives: (1) to introduce the concepts; (2) to present the step by step of modeling in an algebraic and practical way; (3) to interpret the results obtained; and (4) to show the application of the technique in SPSS and Stata. Following the logic proposed in the book, first, we develop the algebraic solution of an example linked to the presentation of the concepts. Only after introducing these concepts, we present and discuss the procedures for running the technique in SPSS and Stata be presented.

12.2 Principal Component Factor Analysis

Many are the procedures inherent to the factor analysis, with different methods for determining (extraction) factors from Pearson’s correlation matrix. The most frequently used method, which was adopted in this chapter for extracting factors, is known as principal components, in which the consequent structural reduction is also called Karhunen-Loève transformation.

In the following sections, we will discuss the theoretical development of the technique, as well as a practical example. While the main concepts will be presented in Sections 12.2.1–12.2.5, Section 12.2.6 is meant for solving a practical example algebraically, from a dataset.

12.2.1 Pearson’s Linear Correlation and the Concept of Factor

Let’s imagine a dataset that has n observations and, for each observation i (i = 1, …, n), values corresponding to each one of the k metric variables X, as shown in Table 12.1.

Table 12.1

General Dataset Model for Developing a Factor Analysis
Observation i	X_1i	X_2i	…	X_ki
1	X₁₁	X₂₁	…	X_k1
2	X₁₂	X₂₂		X_k2
3	X₁₃	X₂₃		X_k3
⋮	⋮	⋮		⋮
n	X_1n	X_2n		X_kn

Table 12.1

From the dataset, and given our intention of extracting factors from k variables X, we must define correlation matrix ρ that displays the values of Pearson’s linear correlation between each pair of variables, as shown in Expression (12.1).

$ρ = (\begin{array}{c} 1 & ρ_{12} & \dots & ρ_{1 k} \\ ρ_{21} & 1 & \dots & ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{k 1} & ρ_{k 2} & \dots & 1 \end{array})$

si30_e (12.1)

Correlation matrix ρ is symmetrical in relation to the main diagonal that, obviously, shows values equal to 1. For example, for variables X₁ and X₂, Pearson’s correlation ρ₁₂ can be calculated by using Expression (12.2).

$ρ_{12} = \frac{\sum_{i = 1}^{n} (X_{1 i} - {\bar{X}}_{1}) \cdot (X_{2 i} - {\bar{X}}_{2})}{\sqrt{\sum_{i = 1}^{n} {(X_{1 i} - {\bar{X}}_{1})}^{2}} \cdot \sqrt{\sum_{i = 1}^{n} {(X_{2 i} - {\bar{X}}_{2})}^{2}}}$

si31_e (12.2)

where ${\bar{X}}_{1}$ and ${\bar{X}}_{2}$ represent the means of variables X₁ and X₂, respectively, and this expression is analogous to Expression (4.11), defined in Chapter 4.

Thus, since Pearson’s correlation is a measure of the level of linear relationship between two metric variables, which may vary between − 1 and 1, a value closer to one of these extreme values indicates the existence of a linear relationship between the two variables under analysis, which, therefore, may significantly contribute to the extraction of a single factor. On the other hand, a Pearson correlation that is very close to 0 indicates that the linear relationship between the two variables is practically nonexistent. Therefore, different factors can be extracted.

Let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). A three-dimensional scatter plot can be constructed from the values of each variable for each observation. The plot can be seen in Fig. 12.1.

Only based on the visual analysis of the chart in Fig. 12.1, it is difficult to assess the behavior of the linear relationships between each pair of variables. Thus, Fig. 12.2 shows the projection of the points that correspond to each observation in each one of the planes formed by the pairs of variables, highlighting, in the dotted line, the adjustment that represents the linear relationship between the respective variables.

While Fig. 12.2A shows that there is a significant linear relationship between variables X₁ and X₂ (a very high Pearson correlation), Fig. 12.2B and C make it very clear that there is no linear relationship between X₃ and these variables. Fig. 12.3 displays these projections in a three-dimensional plot, with the respective linear adjustments in each plane (the dotted lines).

Fig. 12.3 Projection of the points in a three-dimensional plot with linear adjustments per plane.

Thus, in this hypothetical example, while variables X₁ and X₂ may be represented by a single factor in a very significant way, which we will call F₁, variable X₃ may be represented by another factor, F₂, orthogonal to F₁. Fig. 12.4 illustrates the extraction of these new factors in a three-dimensional way.

So, factors can be understood as representations of latent dimensions that explain the behavior of the original variables.

Having presented these initial concepts, it is important to emphasize that in many cases researchers may choose to not extract a factor represented in a considerable way by only one variable (in this case, factor F₂), and what will define the extraction of each one of the factors is the calculation of the eigenvalues from correlation matrix ρ, as we will study in Section 12.2.3. Nevertheless, before that, it will be necessary to check the overall adequacy of the factor analysis, which will be discussed in the following section.

12.2.2 Overall Adequacy of the Factor Analysis: Kaiser-Meyer-Olkin Statistic and Bartlett’s Test of Sphericity

An adequate extraction of factors from the original variables requires correlation matrix ρ to have relatively high and statistically significant values. As discussed by Hair et al. (2009), even though visually analyzing correlation matrix ρ does not reveal if the factor extraction will in fact be adequate, a significant number of values less than 0.30 represent a preliminary indication that the factor analysis may not be adequate.

In order to verify the overall adequacy of the factor extraction itself, we must use the Kaiser-Meyer-Olkin statistic (KMO) and Bartlett’s test of sphericity.

The KMO statistic gives us the proportion of variance considered common to all the variables in the sample under analysis, that is, which can be attributed to the existence of a common factor. This statistic varies from 0 to 1 and, while values closer to 1 indicate that the variables share a very high proportion of variance (high Pearson correlations), values closer to 0 are a result of low Pearson correlations between the variables, which may indicate that the factor analysis will not be adequate. The KMO statistic, presented initially by Kaiser (1970), can be calculated through Expression (12.3).

$KMO = \frac{\sum_{l = 1}^{k} \sum_{c = 1}^{k} ρ_{lc}^{2}}{\sum_{l = 1}^{k} \sum_{c = 1}^{k} ρ_{lc}^{2} + \sum_{l = 1}^{k} \sum_{c = 1}^{k} φ_{lc}^{2}}, l \neq c$

si34_e (12.3)

where l and c represent the rows and columns of correlation matrix ρ, respectively, and the terms φ represent the partial correlation coefficients between two variables. While Pearson’s correlation coefficients ρ are also called zero-order correlation coefficients, partial correlation coefficients φ are also known as higher-order correlation coefficients. For three variables, they are also called first-order correlation coefficients, for four variables, second-order correlation coefficients, and so on.

Let’s imagine a hypothetical situation in which a certain dataset shows three variables once again (k = 3). Is it possible that, in fact, ρ₁₂ reflects the level of linear relationship between X₁ and X₂ if variable X₃ is related to the other two? In this situation, ρ₁₂ may not represent the true level of linear relationship between X₁ and X₂ when X₃ is present, which may provide a false impression regarding the nature of the relationship between the first two. Thus, partial correlation coefficients may contribute with the analysis, since, according to Gujarati and Porter (2008), they are used when researchers wish to find out the correlation between two variables, either by controlling or ignoring the effects of other variables present in the dataset. For our hypothetical situation, it is the correlation coefficient regardless of X₃’s influence over X₁ and X₂, if any.

Hence, for three variables X₁, X₂, and X₃, we can define the first-order correlation coefficients the following way:

$φ_{12, 3} = \frac{ρ_{12} - ρ_{13} \cdot ρ_{23}}{\sqrt{(1 - ρ_{13}^{2}) \cdot (1 - ρ_{23}^{2})}}$

si35_e (12.4)

where φ_12,3 represents the correlation between X₁ and X₂, maintaining X₃ constant,

$φ_{13, 2} = \frac{ρ_{13} - ρ_{12} \cdot ρ_{23}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{23}^{2})}}$

si36_e (12.5)

where φ_13,2 represents the correlation between X₁ and X₃, maintaining X₂ constant, and

$φ_{23, 1} = \frac{ρ_{23} - ρ_{12} \cdot ρ_{13}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{13}^{2})}}$

si37_e (12.6)

where φ_23,1 represents the correlation between X₂ and X₃, maintaining X₁ constant.

In general, a first-order correlation coefficient can be obtained through the following expression:

$φ_{ab, c} = \frac{ρ_{ab} - ρ_{ac} \cdot ρ_{bc}}{\sqrt{(1 - ρ_{ac}^{2}) \cdot (1 - ρ_{bc}^{2})}}$

si38_e (12.7)

where a, b, and c can assume values 1, 2, or 3, corresponding to the three variables under analysis.

Conversely, for a case in which there are four variables in the analysis, the general expression of a certain partial correlation coefficient (second-order correlation coefficient) is given by:

$φ_{ab, cd} = \frac{φ_{ab, c} - φ_{ad, c} \cdot φ_{bd, c}}{\sqrt{(1 - φ_{ad, c}^{2}) \cdot (1 - φ_{bd, c}^{2})}}$

si39_e (12.8)

where φ_ab,cd represents the correlation between X_a and X_b, maintaining X_c and X_d constant, bearing in mind that a, b, c, and d may take on values 1, 2, 3, or 4, which correspond to the four variables under analysis.

Obtaining higher-order correlation coefficients, in which five or more variables are considered in the analysis, should always be done based on the determination of lower-order partial correlation coefficients. In Section 12.2.6, we will propose a practical example by using four variables, in which the algebraic solution of the KMO statistic will be obtained through Expression (12.8).

It is important to highlight that, even if Pearson’s correlation coefficient between two variables is 0, the partial correlation coefficient between them may not be equal to 0, depending on the values of Pearson’s correlation coefficients between each one of these variables and the others present in the dataset.

In order for a factor analysis to be considered adequate, the partial correlation coefficients between the variables must be low. This fact denotes that the variables share a high proportion of variance, and disregarding one or more of them in the analysis may hamper the quality of the factor extraction. Therefore, according to a widely accepted criterion found in the existing literature, Table 12.2 gives us an indication of the relationship between the KMO statistic and the overall adequacy of the factor analysis.

Table 12.2

Relationship Between the KMO Statistic and the Overall Adequacy of the Factor Analysis
KMO Statistic	Overall Adequacy of the Factor Analysis
Between 1.00 and 0.90	Marvelous
Between 0.90 and 0.80	Meritorious
Between 0.80 and 0.70	Middling
Between 0.70 and 0.60	Mediocre
Between 0.60 and 0.50	Miserable
Less than 0.50	Unacceptable

On the other hand, Bartlett’s test of sphericity (Bartlett, 1954) consists in comparing correlation matrix ρ to an identity matrix I of the same dimension. If the differences between the corresponding values outside the main diagonal of each matrix are not statistically different from 0, at a certain significance level, we may consider that the factor extraction will not be adequate. In other words, in this case, Pearson’s correlations between each pair of variables are statistically equal to 0, which makes any attempt of performing a factor extraction from the original variables unfeasible. So, we can define the null and alternative hypotheses of Bartlett’s test of sphericity the following way:

$H_{0} : ρ = (\begin{array}{c} 1 & ρ_{12} & \dots & ρ_{1 k} \\ ρ_{21} & 1 & \dots & ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{k 1} & ρ_{k 2} & \dots & 1 \end{array}) = I = (\begin{array}{c} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 \end{array})$

si40_e

$H_{1} : ρ = (\begin{array}{c} 1 & ρ_{12} & \dots & ρ_{1 k} \\ ρ_{21} & 1 & \dots & ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{k 1} & ρ_{k 2} & \dots & 1 \end{array}) \neq I = (\begin{array}{c} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 \end{array})$

si41_e

The statistic corresponding to Bartlett’s test of sphericity is an χ² statistic, which has the following expression:

$χ_{Bartlett}^{2} = - [(n - 1) - (\frac{2 \cdot k + 5}{6})] \cdot ln |D|$

si42_e (12.9)

with $\frac{k \cdot (k - 1)}{2}$ degrees of freedom. We know that n is the sample size and k is the number of variables. In addition, D represents the determinant of correlation matrix ρ.

Thus, for a certain number of degrees of freedom and a certain significance level, Bartlett’s test of sphericity allows us to check if the total value of the χ_Bartlett² statistic is higher than the statistic’s critical value. If this is true, we may state that Pearson’s correlations between the pairs of variables are statistically different from 0 and that, therefore, factors can be extracted from the original variables and the factor analysis is adequate. When we develop a practical example in Section 12.2.6, we will also discuss the calculations of the χ_Bartlett² statistic and the result of Bartlett’s test of sphericity.

It is important to emphasize that we should always favor Bartlett’s test of sphericity over the KMO statistic to take a decision about the factor analysis’s overall adequacy. Given that the former is a test with a certain significance level, and the latter is only a coefficient (a statistic) calculated without any set distribution of probabilities or hypotheses that allow us to evaluate the corresponding significance level to make a decision.

In addition, it is important to mention that for only two original variables the KMO statistic will always be equal to 0.50. Conversely, the χ_Bartlett² statistic may indicate if the null hypothesis of the test of sphericity was rejected or not, depending on the magnitude of Pearson’s correlation between both variables. Thus, while the KMO statistic will be 0.50 in these situations, Bartlett’s test of sphericity will allow researchers to decide whether to extract one factor from the two original variables or not. In contrast, for three original variables, it is very common for researchers to extract two factors with the statistical significance of Bartlett’s test of sphericity, however, with the KMO statistic less than 0.50. These two situations emphasize even more the greater relevance of Bartlett’s test of sphericity in relation to the KMO statistic in the decision-making process.

Finally, we must mention that the recommendation to study Cronbach’s alpha’s magnitude, before studying the overall adequacy of the factor analysis, is commonly found in the existing literature, so that the reliability with which a factor can be extracted from original variables can be evaluated. We would like to highlight that Cronbach’s alpha only offers researchers indications of the internal consistency of the variables in the dataset so that a single factor can be extracted. Therefore, determining it is not a mandatory requisite for developing the factor analysis, since this technique allows the extraction of most factors. Nevertheless, for pedagogical purposes, we will discuss the main concepts of Cronbach’s alpha in the Appendix of this chapter, with its algebraic determination and corresponding applications in SPSS and Stata software.

Having discussed these concepts and verified the overall adequacy of the factor analysis, we can now move on to the definition of the factors.

12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores

Since a factor represents the linear combination of the original variables, for k variables, we can define a maximum number of k factors (F₁, F₂, …, F_k), analogous to the maximum number of clusters that can be defined from a sample with n observations, as we discussed in the previous chapter, since a factor can also be understood as the result of the clustering of variables. Therefore, for k variables, we have:

$\begin{array}{c} F_{1 i} = s_{11} \cdot X_{1 i} + s_{21} \cdot X_{2 i} + \dots + s_{k 1} \cdot X_{ki} \\ F_{2 i} = s_{12} \cdot X_{1 i} + s_{22} \cdot X_{2 i} + \dots + s_{k 2} \cdot X_{ki} \\ ⋮ \\ F_{ki} = s_{1 k} \cdot X_{1 i} + s_{2 k} \cdot X_{2 i} + \dots + s_{kk} \cdot X_{ki} \end{array}$

si44_e (12.10)

where the terms s are known as factor scores, which represent the parameters of a linear model that relates a certain factor to the original variables. Calculating the factor scores is essential in the context of the factor analysis technique and is elaborated by determining the eigenvalues and eigenvectors of correlation matrix ρ. In Expression (12.11), we once again show correlation matrix ρ, which has already been presented in Expression (12.1).

$ρ = (\begin{array}{c} 1 & ρ_{12} & \dots & ρ_{1 k} \\ ρ_{21} & 1 & \dots & ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{k 1} & ρ_{k 2} & \dots & 1 \end{array})$

si30_e (12.11)

This correlation matrix, with dimensions k × k, shows k eigenvalues λ² (λ₁² ≥ λ₂² ≥ … ≥ λ_k²), which can be obtained from solving the following equation:

$det (λ^{2} \cdot I - ρ) = 0$

(12.12)

where I is the identity matrix, also with dimensions k × k.

Since a certain factor represents the result of the clustering of variables, it is important to highlight that:

$λ_{1}^{2} + λ_{2}^{2} + \dots + λ_{k}^{2} = k$

(12.13)

Expression (12.12) can be rewritten as follows:

$|\begin{array}{c} λ^{2} - 1 & - ρ_{12} & \dots & - ρ_{1 k} \\ - ρ_{21} & λ^{2} - 1 & \dots & - ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - ρ_{k 1} & - ρ_{k 2} & \dots & λ^{2} - 1 \end{array}| = 0$

si48_e (12.14)

from which we can define the eigenvalue matrix Λ² the following way:

$Λ^{2} = (\begin{array}{c} λ_{1}^{2} & 0 & \dots & 0 \\ 0 & λ_{2}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{k}^{2} \end{array})$

si49_e (12.15)

In order to define the eigenvectors of matrix ρ based on the eigenvalues, we must solve the following equation system for each eigenvalue λ² (λ₁², λ₂², …, λ_k²):

• Determining eigenvectors v₁₁, v₂₁, …, v_k1 from the first eigenvalue (λ₁²):

$(\begin{array}{c} λ_{1}^{2} - 1 & - ρ_{12} & \dots & - ρ_{1 k} \\ - ρ_{21} & λ_{1}^{2} - 1 & \dots & - ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - ρ_{k 1} & - ρ_{k 2} & \dots & λ_{1}^{2} - 1 \end{array}) \cdot (\begin{array}{c} v_{11} \\ v_{21} \\ ⋮ \\ v_{k 1} \end{array}) = (\begin{array}{c} 0 \\ 0 \\ ⋮ \\ 0 \end{array})$

si50_e (12.16)

from where we obtain:

$\{\begin{array}{c} (λ_{1}^{2} - 1) \cdot v_{11} - ρ_{12} \cdot v_{21} \dots - ρ_{1 k} \cdot v_{k 1} = 0 \\ - ρ_{21} \cdot v_{11} + (λ_{1}^{2} - 1) \cdot v_{21} \dots - ρ_{2 k} \cdot v_{k 1} = 0 \\ ⋮ \\ - ρ_{k 1} \cdot v_{11} - ρ_{k 2} \cdot v_{21} \dots + (λ_{1}^{2} - 1) \cdot v_{k 1} = 0 \end{array}$

si51_e (12.17)

• Determining eigenvectors v₁₂, v₂₂, …, v_k2 from the second eigenvalue (λ₂²):

$(\begin{array}{c} λ_{2}^{2} - 1 & - ρ_{12} & \dots & - ρ_{1 k} \\ - ρ_{21} & λ_{2}^{2} - 1 & \dots & - ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - ρ_{k 1} & - ρ_{k 2} & \dots & λ_{2}^{2} - 1 \end{array}) \cdot (\begin{array}{c} v_{12} \\ v_{22} \\ ⋮ \\ v_{k 2} \end{array}) = (\begin{array}{c} 0 \\ 0 \\ ⋮ \\ 0 \end{array})$

si52_e (12.18)

from where we obtain:

$\{\begin{array}{c} (λ_{2}^{2} - 1) \cdot v_{12} - ρ_{12} \cdot v_{22} \dots - ρ_{1 k} \cdot v_{k 2} = 0 \\ - ρ_{21} \cdot v_{12} + (λ_{2}^{2} - 1) \cdot v_{22} \dots - ρ_{2 k} \cdot v_{k 2} = 0 \\ ⋮ \\ - ρ_{k 1} \cdot v_{12} - ρ_{k 2} \cdot v_{22} \dots + (λ_{2}^{2} - 1) \cdot v_{k 2} = 0 \end{array}$

si53_e (12.19)

• Determining eigenvectors v_1k, v_2k, …, v_kk from the kth eigenvalue (λ_k²):

$(\begin{array}{c} λ_{k}^{2} - 1 & - ρ_{12} & \dots & - ρ_{1 k} \\ - ρ_{21} & λ_{k}^{2} - 1 & \dots & - ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - ρ_{k 1} & - ρ_{k 2} & \dots & λ_{k}^{2} - 1 \end{array}) \cdot (\begin{array}{c} v_{1 k} \\ v_{2 k} \\ ⋮ \\ v_{kk} \end{array}) = (\begin{array}{c} 0 \\ 0 \\ ⋮ \\ 0 \end{array})$

si54_e (12.20)

from where we obtain:

$\{\begin{array}{c} (λ_{k}^{2} - 1) \cdot v_{1 k} - ρ_{12} \cdot v_{2 k} \dots - ρ_{1 k} \cdot v_{kk} = 0 \\ - ρ_{21} \cdot v_{1 k} + (λ_{k}^{2} - 1) \cdot v_{2 k} \dots - ρ_{2 k} \cdot v_{kk} = 0 \\ ⋮ \\ - ρ_{k 1} \cdot v_{1 k} - ρ_{k 2} \cdot v_{2 k} \dots + (λ_{k}^{2} - 1) \cdot v_{kk} = 0 \end{array}$

si55_e (12.21)

Thus, we can calculate the factor scores of each factor by determining the eigenvalues and eigenvectors of correlation matrix ρ. The factor scores vectors can be defined as follows:

• Factor scores of the first factor:

$S_{1} = (\begin{array}{c} s_{11} \\ s_{21} \\ ⋮ \\ s_{k 1} \end{array}) = (\begin{array}{c} \frac{v_{11}}{\sqrt{λ_{1}^{2}}} \\ \frac{v_{21}}{\sqrt{λ_{1}^{2}}} \\ ⋮ \\ \frac{v_{k 1}}{\sqrt{λ_{1}^{2}}} \end{array})$

si56_e (12.22)

• Factor scores of the second factor:

$S_{2} = (\begin{array}{c} s_{12} \\ s_{22} \\ ⋮ \\ s_{k 2} \end{array}) = (\begin{array}{c} \frac{v_{12}}{\sqrt{λ_{2}^{2}}} \\ \frac{v_{22}}{\sqrt{λ_{2}^{2}}} \\ ⋮ \\ \frac{v_{k 2}}{\sqrt{λ_{2}^{2}}} \end{array})$

si57_e (12.23)

• Factor scores of the kth factor:

$S_{k} = (\begin{array}{c} s_{1 k} \\ s_{2 k} \\ ⋮ \\ s_{kk} \end{array}) = (\begin{array}{c} \frac{v_{1 k}}{\sqrt{λ_{k}^{2}}} \\ \frac{v_{2 k}}{\sqrt{λ_{k}^{2}}} \\ ⋮ \\ \frac{v_{kk}}{\sqrt{λ_{k}^{2}}} \end{array})$

si58_e (12.24)

Since the factor scores of each factor are standardized by the respective eigenvalues, the factors of the set of equations presented in Expression (12.10) must be obtained by multiplying each factor score by the corresponding original variable, standardized by using the Z-scores procedure. Thus, we can obtain each one of the factors based on the following equations:

$\begin{array}{c} F_{1 i} = \frac{v_{11}}{\sqrt{λ_{1}^{2}}} \cdot {ZX}_{1 i} + \frac{v_{21}}{\sqrt{λ_{1}^{2}}} \cdot {ZX}_{2 i} + \dots + \frac{v_{k 1}}{\sqrt{λ_{1}^{2}}} \cdot {ZX}_{ki} \\ F_{2 i} = \frac{v_{12}}{\sqrt{λ_{2}^{2}}} \cdot {ZX}_{1 i} + \frac{v_{22}}{\sqrt{λ_{2}^{2}}} \cdot {ZX}_{2 i} + \dots + \frac{v_{k 2}}{\sqrt{λ_{2}^{2}}} \cdot {ZX}_{ki} \\ ⋮ \\ F_{ki} = \frac{v_{1 k}}{\sqrt{λ_{k}^{2}}} \cdot {ZX}_{1 i} + \frac{v_{2 k}}{\sqrt{λ_{k}^{2}}} \cdot {ZX}_{2 i} + \dots + \frac{v_{kk}}{\sqrt{λ_{k}^{2}}} \cdot {ZX}_{ki} \end{array}$

si59_e (12.25)

where ZX_i represents the standardized value of each variable X for a certain observation i. It is important to emphasize that all the factors extracted show, between themselves, Pearson correlations equal to 0, that is, they are orthogonal to one another.

A more perceptive researcher will notice that the factor scores of each factor correspond exactly to the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself and, as explanatory variables, the standardized variables.

Mathematically, it is also possible to verify the existing relationship between the eigenvectors, correlation matrix ρ, and eigenvalue matrix Λ². Consequently, defining eigenvector matrix V as follows:

$V = (\begin{array}{c} v_{11} & v_{12} & \dots & v_{1 k} \\ v_{21} & v_{22} & \dots & v_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{k 1} & v_{k 2} & \dots & v_{kk} \end{array})$

si60_e (12.26)

we can prove that:

$V^{'} \cdot ρ \cdot V = Λ^{2}$

(12.27)

or:

$(\begin{array}{c} v_{11} & v_{21} & \dots & v_{k 1} \\ v_{12} & v_{22} & \dots & v_{k 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{1 k} & v_{2 k} & \dots & v_{kk} \end{array}) \cdot (\begin{array}{c} 1 & ρ_{12} & \dots & ρ_{1 k} \\ ρ_{21} & 1 & \dots & ρ_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{k 1} & ρ_{k 2} & \dots & 1 \end{array}) \cdot (\begin{array}{c} v_{11} & v_{12} & \dots & v_{1 k} \\ v_{21} & v_{22} & \dots & v_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{k 1} & v_{k 2} & \dots & v_{kk} \end{array}) = (\begin{array}{c} λ_{1}^{2} & 0 & \dots & 0 \\ 0 & λ_{2}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{k}^{2} \end{array})$

si62_e (12.28)

In Section 12.2.6, we will discuss a practical example from which this relationship may be demonstrated.

While in Section 12.2.2, we discussed the factor analysis’s overall adequacy, in this section, we will discuss the procedures for carrying out the factor extraction, if the technique is considered adequate. Even knowing that the maximum number of factors is also equal to k for k variables, it is essential for researchers to define, based on a certain criterion, the adequate number of factors that, in fact, represent the original variables. In our hypothetical example in Section 12.2.1, we saw that only two factors (F₁ and F₂) would be enough to represent the three original variables (X₁, X₂, and X₃).

Although researchers are free to determine the number of factors to be extracted in the analysis, in a preliminary way, since they may wish to verify the validity of a previously established construct (procedure known as a priori criterion), for instance, it is essential to carry out an analysis based on the magnitude of the eigenvalues calculated from correlation matrix ρ.

As the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, as we will discuss in Section 12.2.4, since λ₁² ≥ λ₂² ≥ … ≥ λ_k² and bearing in mind that factors F₁, F₂, …, F_k are obtained from the respective eigenvalues, factors extracted from smaller eigenvalues are formed from smaller proportions of variance shared by the original variables. Since a factor represents a certain cluster of variables, factors extracted from eigenvalues less than 1 may possibly not be able to represent the behavior of a single original variable (of course there are exceptions to this rule, which occur in cases in which a certain eigenvalue is less than, but also very close to 1). The criterion for choosing the number of factors, in which only the factors that correspond to eigenvalues greater than 1 are considered, is often used and known as the latent root criterion or Kaiser criterion.

The factor extraction method presented in this chapter is known as principal components, and the first factor F₁, formed by the highest proportion of variance shared by the original variables, is also called principal factor. This method is often mentioned in the existing literature and is used in practical applications whenever researchers wish to elaborate a structural reduction of the data in order to create orthogonal factors, to define observation rankings by using the factors generated, and even to confirm the validity of previously established constructs. Other factor extraction methods, such as, the generalized least squares, unweighted least squares, maximum likelihood, alpha factoring, and image factoring, have different criteria and certain specificities and, even though they can also be found in the existing literature, they will not be discussed in this book.

Moreover, it is common to discuss the need to apply the factor analysis to variables that have multivariate normal distribution, in order to show consistency when determining the factor scores. Nevertheless, it is important to emphasize that multivariate normality is a very rigid assumption, only necessary for a few factor extraction methods, such as, the maximum likelihood method. Most factor extraction methods do not require the assumption of data multivariate normality and, as discussed by Gorsuch (1983), the principal component factor analysis seems to be, in practice, very robust against breaks in normality.

12.2.4 Factor Loadings and Communalities

Having established the factors, we can now define the factor loadings, which simply are Pearson correlations between the original variables and each one of the factors. Table 12.3 shows the factor loadings for each variable-factor pair.

Table 12.3

Factor Loadings Between Original Variables and Factors
Variable	Factor
Variable	F₁	F₂	…	F_k
X₁	c₁₁	c₁₂	…	c_1k
X₂	c₂₁	c₂₂		c_2k
⋮	⋮	⋮		⋮
X_k	c_k1	c_k2		c_kk

Table 12.3

Based on the latent root criterion (in which only factors resulting from eigenvalues greater than 1 are considered), we assume that the factor loadings between the factors that correspond to eigenvalues less than 1 and all the original variables are low, since they will have already presented higher Pearson correlations (loadings) with factors previously extracted from greater eigenvalues. In the same way, original variables that only share a small portion of variance with the other variables will have high factor loadings in only a single factor. If this occurs for all original variables, there will not be significant differences between correlation matrix ρ and identity matrix I, making the χ_Bartlett² statistic very low. This fact allows us to state that the factor analysis will not be adequate, and, in this situation, researchers may choose not to extract factors from the original variables.

As the factor loadings are Pearson’s correlations between each variable and each factor, the sum of the squares of these loadings in each row of Table 12.3 will always be equal to 1, since each variable shares part of its proportion of variance with all the k factors, and the sum of the proportions of variance (factor loadings or squared Pearson correlations) will be 100%.

Conversely, if less than k factors are extracted, due to the latent root criterion, the sum of the squared factor loadings in each row will not be equal to 1. This sum is called communality, which represents the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, we can say that:

$\begin{array}{c} c_{11}^{2} + c_{12}^{2} + \dots = {communality}_{X_{1}} \\ c_{21}^{2} + c_{22}^{2} + \dots = {communality}_{X_{2}} \\ ⋮ \\ c_{k 1}^{2} + c_{k 2}^{2} + \dots = {communality}_{X_{k}} \end{array}$

si63_e (12.29)

The main objective of the analysis of communalities is to check if any variable ends up not sharing a significant proportion of variance with the factors extracted. Even though there is no cutoff point from which a certain communality can be considered high or low, since the sample size can interfere in this assessment, the existence of considerably low communalities in relation to the others can indicate to researchers that they may need to reconsider including the respective variable into the factor analysis.

Therefore, after defining the factors based on the factor scores, we can state that the factor loadings will be exactly the same as the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable ZX and, as explanatory variables, the factors themselves, and the coefficient of determination R² of each model is equal to the communality of the respective original variable.

The sum of the squared factor loadings in each column of Table 12.3, on the other hand, will be equal to the respective eigenvalue, since the ratio between each eigenvalue and the total number of variables can be understood as the proportion of variance shared by all k original variables to form each factor. So, we can say that:

$\begin{array}{c} c_{11}^{2} + c_{21}^{2} + \dots + c_{k 1}^{2} = λ_{1}^{2} \\ c_{12}^{2} + c_{22}^{2} + \dots + c_{k 2}^{2} = λ_{2}^{2} \\ ⋮ \\ c_{1 k}^{2} + c_{2 k}^{2} + \dots + c_{kk}^{2} = λ_{k}^{2} \end{array}$

si64_e (12.30)

After establishing the factors and the calculation of the factor loadings, it is also possible for some variables to have intermediate (neither very high nor very low) Pearson correlations (factor loadings) with all the factors extracted, although its communality is relatively not so low. In this case, although the solution of the factor analysis has already been obtained in an adequate way and considered concluded, researchers can, in the cases in which the factor loadings table shows intermediate values for one or more variables in all the factors, elaborate a rotation of these factors, so that Pearson’s correlations between the original variables and the new factors generated can be increased. In the following section, we will discuss factor rotation.

12.2.5 Factor Rotation

Once again, let’s imagine a hypothetical situation in which a certain dataset only has three variables (k = 3). After preparing the principal component factor analysis, two factors, orthogonal to one another, are extracted, with factor loadings (Pearson correlations) with each one of the three original variables, according to Table 12.4.

Table 12.4

Factor Loadings Between Three Variables and Two Factors
Variable	Factor
Variable	F₁	F₂
X₁	c₁₁	c₁₂
X₂	c₂₁	c₂₂
X₃	c₃₁	c₃₂

Table 12.4

In order to construct a chart with the relative positions of each variable in each factor (a chart known as loading plot), we can consider the factor loadings to be coordinates (abscissas and ordinates) of the variables in a Cartesian plane formed by both orthogonal factors. The plot can be seen in Fig. 12.5.

Fig. 12.5 Loading plot for a hypothetical situation with three variables and two factors.

In order to better visualize the variables better represented by a certain factor, we can think about a rotation around the origin of the originally extracted factors F₁ and F₂, so that we can bring the points corresponding to variables X₁, X₂, and X₃ closer to one of the new factors. These are called rotated factors F₁′ and F₂′. Fig. 12.6 shows this process in a simplified way.

Based on Fig. 12.6, for each variable under analysis, we can see that while the loading for one factor increases, for the other, it decreases. Table 12.5 shows the loading redistribution for our hypothetical situation.

Table 12.5

Original and Rotated Factor Loadings for Our Hypothetical Situation
Variable	Factor
	Original Factor Loadings		Rotated Factor Loadings
	F₁	F₂	F₁′	F₂′
X₁	c₁₁	c₁₂	\| c₁₁′\| > \| c₁₁ \|	\| c₁₂′\| < \| c₁₂ \|
X₂	c₂₁	c₂₂	\| c₂₁′\| > \| c₂₁ \|	\| c₂₂′\| < \| c₂₂ \|
X₃	c₃₁	c₃₂	\| c₃₁′\| < \| c₃₁ \|	\| c₃₂′\| > \| c₃₂ \|

Table 12.5

Thus, for a generic situation, we can say that rotation is a procedure that maximizes the loadings of each variable in a certain factor, to the detriment of the others. In this regard, the final effect of rotation is the redistribution of factor loadings to factors that initially had smaller proportions of variance shared by all the original variables. The main objective is to minimize the number of variables with high loadings in a certain factor, since each one of the factors will start having more significant loadings only with some of the original variables. Consequently, rotation may simplify the interpretation of the factors.

Despite the fact that communalities and the total proportion of variance shared by all the variables in all the factors are not modified by the rotation (and neither are the KMO statistic or χ_Bartlett²), the proportion of variance shared by the original variables in each factor is redistributed and, therefore, modified. In other words, new eigenvalues are set λ′(λ₁′, λ₂′, …, λ_k′) from the rotated factor loadings. Thus, we can say that:

$\begin{array}{c} {c^{'}}_{11}^{2} + {c^{'}}_{12}^{2} + \dots = {communality}_{X_{1}} \\ {c^{'}}_{21}^{2} + {c^{'}}_{22}^{2} + \dots = {communality}_{X_{2}} \\ ⋮ \\ {c^{'}}_{k 1}^{2} + {c^{'}}_{k 2}^{2} + \dots = {communality}_{X_{k}} \end{array}$

si65_e (12.31)

and that:

$\begin{array}{c} {c^{'}}_{11}^{2} + {c^{'}}_{21}^{2} + \dots + {c^{'}}_{k 1}^{2} = {λ^{'}}_{1}^{2} \neq λ_{1}^{2} \\ {c^{'}}_{12}^{2} + {c^{'}}_{22}^{2} + \dots + {c^{'}}_{k 2}^{2} = {λ^{'}}_{2}^{2} \neq λ_{2}^{2} \\ ⋮ \\ {c^{'}}_{1 k}^{2} + {c^{'}}_{2 k}^{2} + \dots + {c^{'}}_{kk}^{2} = {λ^{'}}_{k}^{2} \neq λ_{k}^{2} \end{array}$

si66_e (12.32)

even if Expression (12.13) is respected, that is:

$λ_{1}^{2} + λ_{2}^{2} + \dots + λ_{k}^{2} = {λ^{'}}_{1}^{2} + {λ^{'}}_{2}^{2} + \dots + {λ^{'}}_{k}^{2} = k$

(12.33)

Besides, new rotated factor scores are obtained from the rotation of factors, s′, such that the final expressions of the rotated factors will be:

$\begin{array}{c} F_{1 i}^{'} = s_{11}^{'} \cdot {ZX}_{1 i} + s_{21}^{'} \cdot {ZX}_{2 i} + \dots + s_{k 1}^{'} \cdot {ZX}_{ki} \\ F_{2 i}^{'} = s_{12}^{'} \cdot {ZX}_{1 i} + s_{22}^{'} \cdot {ZX}_{2 i} + \dots + s_{k 2}^{'} \cdot {ZX}_{ki} \\ ⋮ \\ F_{ki}^{'} = s_{1 k}^{'} \cdot {ZX}_{1 i} + s_{2 k}^{'} \cdot {ZX}_{2 i} + \dots + s_{kk}^{'} \cdot {ZX}_{ki} \end{array}$

si68_e (12.34)

It is important to highlight that the overall adequacy of the factor analysis (KMO statistic and Bartlett’s test of sphericity) is not altered by the rotation, since correlation matrix ρ continues the same.

Even though there are several factor rotation methods, the orthogonal rotation method, also known as Varimax, whose main purpose is to minimize the number of variables that have high loadings on a certain factor through the redistribution of the factor loadings and maximization of the variance shared in factors that correspond to lower eigenvalues, is the most frequently used and will be used in this chapter to solve a practical example. That is where the name Varimax comes from. This method was proposed by Kaiser (1958).

The algorithm behind the Varimax rotation method consists in determining a rotation angle θ in which pairs of factors are equally rotated. Thus, as discussed by Harman (1976), for a certain pair of factors F₁ and F₂, for example, the rotated factor loadings c’ between the two factors and the k original variables are obtained from the original factor loadings c, through the following matrix multiplication:

$(\begin{array}{c} c_{11} & c_{12} \\ c_{21} & c_{22} \\ ⋮ & ⋮ \\ c_{k 1} & c_{k 2} \end{array}) \cdot (\begin{array}{c} cos θ & - sen θ \\ sen θ & cos θ \end{array}) = (\begin{array}{c} c_{11}^{'} & c_{12}^{'} \\ c_{21}^{'} & c_{22}^{'} \\ ⋮ & ⋮ \\ c_{k 1}^{'} & c_{k 2}^{'} \end{array})$

si69_e (12.35)

where θ, the counterclockwise rotation angle, is obtained by the following expression:

$θ = 0.25 \cdot arctan [\frac{2 (D \cdot k - A \cdot B)}{C \cdot k - (A^{2} - B^{2})}]$

si70_e (12.36)

where:

$A = \sum_{l = 1}^{k} (\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}})$

si71_e (12.37)

$B = \sum_{l = 1}^{k} (2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})$

si72_e (12.38)

$C = \sum_{l = 1}^{k} [{(\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}})}^{2} - {(2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})}^{2}]$

si73_e (12.39)

$D = \sum_{l = 1}^{k} [(\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}}) \cdot (2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})]$

si74_e (12.40)

In Section 12.2.6, we will use these Varimax rotation method expressions to determine the rotated factor loadings from the original loadings.

Besides Varimax, we can also mention other orthogonal rotation methods, such as, Quartimax and Equamax, even though they are less frequently mentioned in the existing literature and less used in practice. In addition to them, the researcher may also use oblique rotation methods, in which nonorthogonal factors are generated. Although they are not discussed in this chapter, we should also mention the Direct Oblimin and Promax methods in this category.

Since oblique rotation methods can sometimes be used when we wish to validate a certain construct, whose initial factors are not correlated, we recommend that an orthogonal rotation method be used so that factors extracted in other multivariate techniques can be used later, such as, certain confirmatory models, in which the lack of multicollinearity of the explanatory variables is a mandatory premise.

12.2.6 A Practical Example of the Principal Component Factor Analysis

Imagine that the same professor, deeply engaged in academic and pedagogical activities, is now interested in studying how his students’ grades behave so that, afterwards, he can propose the creation of a school performance ranking.

In order to do that, he collected information on the final grades, which vary from 0 to 10, of each one of his 100 students in the following subjects: Finance, Costs, Marketing, and Actuarial Science. Part of the dataset can be seen in Table 12.6.

Table 12.6

Example: Final Grades in Finance, Costs, Marketing, and Actuarial Science
Student	Final Grade in Finance (X_1i)	Final Grade in Costs (X_2i)	Final Grade in Marketing (X_3i)	Final Grade in Actuarial Science (X_4i)
Gabriela	5.8	4.0	1.0	6.0
Luiz Felipe	3.1	3.0	10.0	2.0
Patricia	3.1	4.0	4.0	4.0
Gustavo	10.0	8.0	8.0	8.0
Leticia	3.4	2.0	3.2	3.2
Ovidio	10.0	10.0	1.0	10.0
Leonor	5.0	5.0	8.0	5.0
Dalila	5.4	6.0	6.0	6.0
Antonio	5.9	4.0	4.0	4.0
…
Estela	8.9	5.0	2.0	8.0

Table 12.6

The complete dataset can be found in the file FactorGrades.xls. Through this dataset, it is possible to construct Table 12.7, which shows Pearson’s correlation coefficients between each pair of variables, calculated by using the logic presented in Expression (12.2).

Table 12.7

Pearson’s Correlation Coefficients for Each Pair of Variables
	finance	costs	marketing	actuarial science
finance	1.000	0.756	− 0.030	0.711
costs	0.756	1.000	0.003	0.809
marketing	− 0.030	0.003	1.000	− 0.044
actuarial science	0.711	0.809	− 0.044	1.000

Table 12.7

Therefore, we can write the expression of the correlation matrix ρ as follows:

$ρ = (\begin{array}{c} 1 & ρ_{12} & ρ_{13} & ρ_{14} \\ ρ_{21} & 1 & ρ_{23} & ρ_{24} \\ ρ_{31} & ρ_{32} & 1 & ρ_{34} \\ ρ_{41} & ρ_{42} & ρ_{43} & 1 \end{array}) = (\begin{array}{c} 1.000 & 0.756 & - 0.030 & 0.711 \\ 0.756 & 1.000 & 0.003 & 0.809 \\ - 0.030 & 0.003 & 1.000 & - 0.044 \\ 0.711 & 0.809 & - 0.044 & 1.000 \end{array})$

si75_e

which has determinant D = 0.137.

By analyzing correlation matrix ρ, it is possible to verify that only the grades corresponding to the variable marketing do not have correlations with the grades in the other subjects, represented by the other variables. On the other hand, these show relatively high correlations with one another (0.756 between finance and costs, 0.711 between finance and actuarial, and 0.809 between costs and actuarial), which indicates that they may share significant variance to form one factor. Although this preliminary analysis is important, it cannot represent more than a simple diagnostic, since the overall adequacy of the factor analysis needs to be evaluated based on the KMO statistic and, mainly, by using the result of Bartlett’s test of sphericity.

As we discussed in Section 12.2.2, the KMO statistic provides the proportion of variance considered common to all the variables present in the analysis, and, in order to establish its calculation, we need to determine partial correlation coefficients φ between each pair of variables. In this case, it will be second-order correlation coefficients, since we are working with four variables simultaneously.

Consequently, based on Expression (12.7), first, we need to determine the first-order correlation coefficients used to calculate of the second-order correlation coefficients. Table 12.8 shows these coefficients.

Table 12.8

First-Order Correlation Coefficients
$φ_{12, 3} = \frac{ρ_{12} - ρ_{13} \cdot ρ_{23}}{\sqrt{(1 - ρ_{13}^{2}) \cdot (1 - ρ_{23}^{2})}} = 0.756$	$φ_{13, 2} = \frac{ρ_{13} - ρ_{12} \cdot ρ_{23}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{23}^{2})}} = - 0.049$	$φ_{14, 2} = \frac{ρ_{14} - ρ_{12} \cdot ρ_{24}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{24}^{2})}} = 0.258$
$φ_{14, 3} = \frac{ρ_{14} - ρ_{13} \cdot ρ_{34}}{\sqrt{(1 - ρ_{13}^{2}) \cdot (1 - ρ_{34}^{2})}} = 0.711$	$φ_{23, 1} = \frac{ρ_{23} - ρ_{12} \cdot ρ_{13}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{13}^{2})}} = 0.039$	$φ_{24, 1} = \frac{ρ_{24} - ρ_{12} \cdot ρ_{14}}{\sqrt{(1 - ρ_{12}^{2}) \cdot (1 - ρ_{14}^{2})}} = 0.590$
$φ_{24, 3} = \frac{ρ_{24} - ρ_{23} \cdot ρ_{34}}{\sqrt{(1 - ρ_{23}^{2}) \cdot (1 - ρ_{34}^{2})}} = 0.810$	$φ_{34, 1} = \frac{ρ_{34} - ρ_{13} \cdot ρ_{14}}{\sqrt{(1 - ρ_{13}^{2}) \cdot (1 - ρ_{14}^{2})}} = - 0.033$	$φ_{34, 2} = \frac{ρ_{34} - ρ_{23} \cdot ρ_{24}}{\sqrt{(1 - ρ_{23}^{2}) \cdot (1 - ρ_{24}^{2})}} = - 0.080$

Hence, from these coefficients and by using Expression (12.8), we can calculate the second-order correlation coefficients considered in the KMO statistic’s expression. Table 12.9 shows these coefficients.

Table 12.9

Second-Order Correlation Coefficients
$φ_{12, 34} = \frac{φ_{12, 3} - φ_{14, 3} \cdot φ_{24, 3}}{\sqrt{(1 - φ_{14, 3}^{2}) \cdot (1 - φ_{24, 3}^{2})}} = 0.438$
$φ_{13, 24} = \frac{φ_{13, 2} - φ_{14, 2} \cdot φ_{34, 2}}{\sqrt{(1 - φ_{14, 2}^{2}) \cdot (1 - φ_{34, 2}^{2})}} = - 0.029$	$φ_{23, 14} = \frac{φ_{23, 1} - φ_{24, 1} \cdot φ_{34, 1}}{\sqrt{(1 - φ_{24, 1}^{2}) \cdot (1 - φ_{34, 1}^{2})}} = 0.072$
$φ_{14, 23} = \frac{φ_{14, 2} - φ_{13, 2} \cdot φ_{34, 2}}{\sqrt{(1 - φ_{13, 2}^{2}) \cdot (1 - φ_{34, 2}^{2})}} = 0.255$	$φ_{24, 13} = \frac{φ_{24, 1} - φ_{23, 1} \cdot φ_{34, 1}}{\sqrt{(1 - φ_{23, 1}^{2}) \cdot (1 - φ_{34, 1}^{2})}} = 0.592$	$φ_{34, 12} = \frac{φ_{34, 1} - φ_{23, 1} \cdot φ_{24, 1}}{\sqrt{(1 - φ_{23, 1}^{2}) \cdot (1 - φ_{24, 1}^{2})}} = - 0.069$

So, based on Expression (12.3), we can calculate the KMO statistic. The terms of the expression are given by:

$\sum_{l = 1}^{k} \sum_{c = 1}^{k} ρ_{lc}^{2} = {(0.756)}^{2} + {(- 0.030)}^{2} + {(0.711)}^{2} + {(0.003)}^{2} + {(0.809)}^{2} + {(- 0.044)}^{2} = 1.734$

si76_e

$\sum_{l = 1}^{k} \sum_{c = 1}^{k} φ_{lc}^{2} = {(0.438)}^{2} + {(- 0.029)}^{2} + {(0.255)}^{2} + {(0.072)}^{2} + {(0.592)}^{2} + {(- 0.069)}^{2} = 0.619$

si77_e

from where we obtain:

$KMO = \frac{1.734}{1.734 + 0.619} = 0.737$

si78_e

Based on the criterion presented in Table 12.2, the value of the KMO statistic suggests that the overall adequacy of the factor analysis is middling. To test whether, in fact, correlation matrix ρ is statistically different from identity matrix I with the same dimension, we must use Bartlett’s test of sphericity, whose χ_Bartlett² statistic is given by Expression (12.9). For n = 100 observations, k = 4 variables, and correlation matrix ρ determinant D = 0.137, we have:

$χ_{Bartlett}^{2} = - [(100 - 1) - (\frac{2 \cdot 4 + 5}{6})] \cdot ln (0.137) = 192.335$

si79_e

with $\frac{4 \cdot (4 - 1)}{2} = 6$ degrees of freedom. Therefore, by using Table D in the Appendix, we have χ_c² = 12.592 (critical χ² for 6 degrees of freedom and with a significance level of 0.05). Thus, since χ_Bartlett² = 192.335 > χ_c² = 12.592, we can reject the null hypothesis that correlation matrix ρ is statistically equal to identity matrix I, at a significance level of 0.05.

Software packages like SPSS and Stata do not offer the χ_c² for the defined degrees of freedom and a certain significance level. However, they offer the significance level of χ_Bartlett² for these degrees of freedom. So, instead of analyzing if χ_Bartlett² > χ_c², we must verify if the significance level of χ_Bartlett² is less than 0.05 (5%) so that we can continue performing the factor analysis. Thus:

If P-value (either Sig. χ_Bartlett², or Prob. χ_Bartlett²) < 0.05, correlation matrix ρ is not statistically equal to identity matrix I with the same dimension.

The significance level of χ_Bartlett² can be obtained in Excel by using the command Formulas → Insert Function→ CHIDIST, which will open a dialog box, as shown in Fig. 12.7.

As we can see in Fig. 12.7, the P-value of the χ_Bartlett² statistic is considerably less than 0.05 (χ_Bartlett² P-value = 8.11 × 10^− 39), that is, Pearson’s correlations between the pairs of variables are statistically different from 0 and, therefore, factors can be extracted from the original variables, and the factor analysis very adequate.

Having verified the factor analysis’s overall adequacy, we can move on to the definition of the factors. In order to do that, we must initially determine the four eigenvalues λ² (λ₁² ≥ λ₂² ≥ λ₃² ≥ λ₄²) of correlation matrix ρ, which can be obtained from solving Expression (12.12). Therefore, we have:

$|\begin{array}{c} λ^{2} - 1 & - 0.756 & 0.030 & - 0.711 \\ - 0.756 & λ^{2} - 1 & - 0.003 & - 0.809 \\ 0.030 & - 0.003 & λ^{2} - 1 & 0.044 \\ - 0.711 & - 0.809 & 0.044 & λ^{2} - 1 \end{array}| = 0$

si81_e

from where we obtain:

$\{\begin{array}{c} λ_{1}^{2} = 2.519 \\ λ_{2}^{2} = 1.000 \\ λ_{3}^{2} = 0.298 \\ λ_{4}^{2} = 0.183 \end{array}$

si82_e

Consequently, based on Expression (12.15), eigenvalue matrix Λ² can be written as follows:

$Λ^{2} = (\begin{array}{c} 2.519 & 0 & 0 & 0 \\ 0 & 1.000 & 0 & 0 \\ 0 & 0 & 0.298 & 0 \\ 0 & 0 & 0 & 0.183 \end{array})$

si83_e

Note that Expression (12.13) is satisfied, that is:

$λ_{1}^{2} + λ_{2}^{2} + \dots + λ_{k}^{2} = 2.519 + 1.000 + 0.298 + 0.183 = 4$

Since the eigenvalues correspond to the proportion of variance shared by the original variables to form each factor, we can construct a shared variance table (Table 12.10).

Table 12.10

Variance Shared by the Original Variables to Form Each Factor
Factor	Eigenvalue λ²	Shared Variance (%)	Cumulative Shared Variance (%)
1	2.519	$(\frac{2.519}{4}) \cdot 100 = 62.975$	62.975
2	1.000	$(\frac{1.000}{4}) \cdot 100 = 25, 010$	87.985
3	0.298	$(\frac{0.298}{4}) \cdot 100 = 7.444$	95.428
4	0.183	$(\frac{0.183}{4}) \cdot 100 = 4.572$	100.000

Table 12.10

By analyzing Table 12.10, we can say that while 62.975% of the total variance are shared to form the first factor, 25.010% are shared to form the second factor. The third and fourth factors, whose eigenvalues are less than 1, are formed through smaller proportions of shared variance. Since the most common criterion used to choose the number of factors is the latent root criterion (Kaiser criterion), in which only the factors that correspond to eigenvalues greater than 1 are taken into consideration, the researcher can choose to conduct all the subsequent analysis with only the first two factors, formed by sharing 87.985% of the total variance of the original variables, that is, with a total variance loss of 12.015%. Nonetheless, for pedagogical purposes, let’s discuss how to calculate the factor scores by determining the eigenvectors that correspond to the four eigenvalues.

Consequently, in order to define the eigenvectors of matrix ρ based on the four eigenvalues calculated, we must solve the following equation systems for each eigenvalue, based on Expressions (12.16)–(12.21):

• Determining eigenvectors v₁₁, v₂₁, v₃₁, v₄₁ from the first eigenvalue (λ₁² = 2.519):

$\{\begin{array}{c} (2.519 - 1.000) \cdot v_{11} - 0.756 \cdot v_{21} + 0.030 \cdot v_{31} - 0.711 \cdot v_{41} = 0 \\ - 0.756 \cdot v_{11} + (2.519 - 1.000) \cdot v_{21} - 0.003 \cdot v_{31} - 0.809 \cdot v_{41} = 0 \\ 0.030 \cdot v_{11} - 0.003 \cdot v_{21} + (2.519 - 1.000) \cdot v_{31} + 0.044 \cdot v_{41} = 0 \\ - 0.711 \cdot v_{11} - 0.809 \cdot v_{21} + 0.044 \cdot v_{31} + (2.519 - 1.000) \cdot v_{41} = 0 \end{array}$

si85_e

from where we obtain:

$(\begin{array}{c} v_{11} \\ v_{21} \\ v_{31} \\ v_{41} \end{array}) = (\begin{array}{c} 0.5641 \\ 0.5887 \\ - 0.0267 \\ 0.5783 \end{array})$

si86_e

• Determining eigenvectors v₁₂, v₂₂, v₃₂, v₄₂ from the second eigenvalue (λ₂² = 1.000):

$\{\begin{array}{c} (1.000 - 1.000) \cdot v_{12} - 0.756 \cdot v_{22} + 0.030 \cdot v_{32} - 0.711 \cdot v_{42} = 0 \\ - 0.756 \cdot v_{12} + (1.000 - 1.000) \cdot v_{22} - 0.003 \cdot v_{32} - 0.809 \cdot v_{42} = 0 \\ 0.030 \cdot v_{12} - 0.003 \cdot v_{22} + (1.000 - 1.000) \cdot v_{32} + 0.044 \cdot v_{42} = 0 \\ - 0.711 \cdot v_{12} - 0.809 \cdot v_{22} + 0.044 \cdot v_{32} + (1.000 - 1.000) \cdot v_{42} = 0 \end{array}$

si87_e

from where we obtain:

$(\begin{array}{c} v_{12} \\ v_{22} \\ v_{32} \\ v_{42} \end{array}) = (\begin{array}{c} 0.0068 \\ 0.0487 \\ 0.9987 \\ - 0.0101 \end{array})$

si88_e

• Determining eigenvectors v₁₃, v₂₃, v₃₃, v₄₃ from the third eigenvalue (λ₃² = 0.298):

$\{\begin{array}{c} (0.298 - 1.000) \cdot v_{13} - 0.756 \cdot v_{23} + 0.030 \cdot v_{33} - 0.711 \cdot v_{43} = 0 \\ - 0.756 \cdot v_{13} + (0.298 - 1.000) \cdot v_{23} - 0.003 \cdot v_{33} - 0.809 \cdot v_{43} = 0 \\ 0.030 \cdot v_{13} - 0.003 \cdot v_{23} + (0.298 - 1.000) \cdot v_{33} + 0.044 \cdot v_{43} = 0 \\ - 0.711 \cdot v_{13} - 0.809 \cdot v_{23} + 0.044 \cdot v_{33} + (0.298 - 1.000) \cdot v_{43} = 0 \end{array}$

si89_e

from where we obtain:

$(\begin{array}{c} v_{13} \\ v_{23} \\ v_{33} \\ v_{43} \end{array}) = (\begin{array}{c} 0.8008 \\ - 0.2201 \\ - 0.0003 \\ - 0.5571 \end{array})$

si90_e

• Determining eigenvectors v₁₄, v₂₄, v₃₄, v₄₄ from the fourth eigenvalue (λ₄² = 0.183):

$\{\begin{array}{c} (0.183 - 1.000) \cdot v_{14} - 0.756 \cdot v_{24} + 0.030 \cdot v_{34} - 0.711 \cdot v_{44} = 0 \\ - 0.756 \cdot v_{14} + (0.183 - 1.000) \cdot v_{24} - 0.003 \cdot v_{34} - 0.809 \cdot v_{44} = 0 \\ 0.030 \cdot v_{14} - 0.003 \cdot v_{24} + (0.183 - 1.000) \cdot v_{34} + 0.044 \cdot v_{44} = 0 \\ - 0.711 \cdot v_{14} - 0.809 \cdot v_{24} + 0.044 \cdot v_{34} + (0.183 - 1.000) \cdot v_{44} = 0 \end{array}$

si91_e

from where we obtain:

$(\begin{array}{c} v_{14} \\ v_{24} \\ v_{34} \\ v_{44} \end{array}) = (\begin{array}{c} 0.2012 \\ - 0.7763 \\ 0.0425 \\ 0.5959 \end{array})$

si92_e

After having determined the eigenvectors, a more inquisitive researcher may prove the relationship presented in Expression (12.27), that is:

$V^{'} \cdot ρ \cdot V = Λ^{2}$

$\begin{array}{l} (\begin{array}{c} 0.5641 & 0.5887 & - 0.0267 & 0.5783 \\ 0.0068 & 0.0487 & 0.9987 & - 0.0101 \\ 0.8008 & - 0.2201 & - 0.0003 & - 0.5571 \\ 0.2012 & - 0.7763 & 0.0425 & 0.5959 \end{array}) \cdot (\begin{array}{c} 1.000 & 0.756 & - 0.030 & 0.711 \\ 0.756 & 1.000 & 0.003 & 0.809 \\ - 0.030 & 0.003 & 1.000 & - 0.044 \\ 0.711 & 0.809 & - 0.044 & 1.000 \end{array}) \cdot (\begin{array}{c} 0.5641 & 0.0068 & 0.8008 & 0.2012 \\ 0.5887 & 0.0487 & - 0.2201 & - 0.7763 \\ - 0.0267 & 0.9987 & - 0.0003 & 0.0425 \\ 0.5783 & - 0.0101 & - 0.5571 & 0.5959 \end{array}) \\ = (\begin{array}{c} 2.519 & 0 & 0 & 0 \\ 0 & 1.000 & 0 & 0 \\ 0 & 0 & 0.298 & 0 \\ 0 & 0 & 0 & 0.183 \end{array}) \end{array}$

si94_e

Based on Expressions (12.22)–(12.24), we can calculate the factor scores that correspond to each one of the standardized variables for each one of the factors. Thus, from Expression (12.25), we are able to write the expressions for factors F₁, F₂, F₃, and F₄, as follows:

$\begin{array}{c} F_{1 i} = \frac{0.5641}{\sqrt{2.519}} \cdot {Zfinance}_{i} + \frac{0.5887}{\sqrt{2.519}} \cdot {Zcosts}_{i} - \frac{0.267}{\sqrt{2.519}} \cdot {Zmarketing}_{i} + \frac{0.5783}{\sqrt{2.519}} \cdot {Zactuarial}_{i} \\ F_{2 i} = \frac{0.0068}{\sqrt{1.000}} \cdot {Zfinance}_{i} + \frac{0.0487}{\sqrt{1.000}} \cdot {Zcosts}_{i} + \frac{0.9987}{\sqrt{1.000}} \cdot {Zmarketing}_{i} - \frac{0.0101}{\sqrt{1.000}} \cdot {Zactuarial}_{i} \\ F_{3 i} = \frac{0.8008}{\sqrt{0.298}} \cdot {Zfinance}_{i} - \frac{0.2201}{\sqrt{0.298}} \cdot {Zcosts}_{i} - \frac{0.0003}{\sqrt{0.298}} \cdot {Zmarketing}_{i} - \frac{0.5571}{\sqrt{0.298}} \cdot {Zactuarial}_{i} \\ F_{4 i} = \frac{0.2012}{\sqrt{0.183}} \cdot {Zfinance}_{i} - \frac{0.7763}{\sqrt{0.183}} \cdot {Zcosts}_{i} + \frac{0.0425}{\sqrt{0.183}} \cdot {Zmarketing}_{i} + \frac{0.5959}{\sqrt{0.183}} \cdot {Zactuarial}_{i} \end{array}$

si95_e

from where we obtain:

$\begin{array}{c} F_{1 i} = 0.355 \cdot {Zfinance}_{i} + 0.371 \cdot {Zcosts}_{i} - 0.017 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i} \\ F_{2 i} = 0.007 \cdot {Zfinance}_{i} + 0.049 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.010 \cdot {Zactuarial}_{i} \\ F_{3 i} = 1.468 \cdot {Zfinance}_{i} - 0.403 \cdot {Zcosts}_{i} - 0.001 \cdot {Zmarketing}_{i} - 1.021 \cdot {Zactuarial}_{i} \\ F_{4 i} = 0.470 \cdot {Zfinance}_{i} - 1.815 \cdot {Zcosts}_{i} + 0.099 \cdot {Zmarketing}_{i} + 1.394 \cdot {Zactuarial}_{i} \end{array}$

si96_e

Based on the factor expressions and on the standardized variables, we can calculate the values corresponding to each factor for each observation. Table 12.11 shows these results for part of the dataset.

Table 12.11

Calculation of the Factors for Each Observation
Student	Zfinance_i	Zcosts_i	Zmarketing_i	Zactuarial_i	F_1i	F_2i	F_3i	F_4i
Gabriela	− 0.011	− 0.290	− 1.650	0.273	0.016	− 1.665	− 0.176	0.739
Luiz Felipe	− 0.876	− 0.697	1.532	− 1.319	− 1.076	1.503	0.342	− 0.831
Patricia	− 0.876	− 0.290	− 0.590	− 0.523	− 0.600	− 0.603	− 0.634	− 0.672
Gustavo	1.334	1.337	0.825	1.069	1.346	0.887	0.327	− 0.228
Leticia	− 0.779	− 1.104	− 0.872	− 0.841	− 0.978	− 0.922	0.161	0.379
Ovidio	1.334	2.150	− 1.650	1.865	1.979	− 1.553	− 0.812	− 0.841
Leonor	− 0.267	0.116	0.825	− 0.125	− 0.111	0.829	− 0.312	− 0.429
Dalila	− 0.139	0.523	0.118	0.273	0.242	0.139	− 0.694	− 0.623
Antonio	0.021	− 0.290	− 0.590	− 0.523	− 0.281	− 0.597	0.682	− 0.250
⋮
Estela	0.982	0.113	− 1.297	1.069	0.802	− 1.293	0.305	1.616
Mean	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Standard deviation	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000

Table 12.11

For the first observation in the sample (Gabriela), for example, we can see that:

$F_{1 Gabriela} = 0.355 \cdot (- 0.011) + 0.371 \cdot (- 0.290) - 0.017 \cdot (- 1.650) + 0.364 \cdot (0.273) = 0.016$

$F_{2 Gabriela} = 0.007 \cdot (- 0.011) + 0.049 \cdot (- 0.290) + 0.999 \cdot (- 1.650) - 0.010 \cdot (0.273) = - 1.665$

$F_{3 Gabriela} = 1.468 \cdot (- 0.011) - 0.403 \cdot (- 0.290) - 0.001 \cdot (- 1.650) - 1.021 \cdot (0.273) = - 0.176$

$F_{4 Gabriela} = 0.470 \cdot (- 0.011) - 1.815 \cdot (- 0.290) + 0.099 \cdot (- 1.650) + 1.394 \cdot (0.273) = 0.739$

It is important to emphasize that all the factors extracted have Pearson correlations equal to 0, between themselves, that is, they are orthogonal to one another.

A more inquisitive researcher may also verify that the factor scores that correspond to each factor are exactly the estimated parameters of a multiple linear regression model that has, as a dependent variable, the factor itself, and as explanatory variables, the standardized variables.

Having established the factors, we can define the factor loadings, which correspond to Pearson’s correlation coefficients between the original variables and each one of the factors. Table 12.12 shows the factor loadings for the data in our example.

Table 12.12

Factor Loadings (Pearson’s Correlation Coefficients) Between Variables and Factors
Variable	Factor
Variable	F₁	F₂	F₃	F₄
finance	0.895	0.007	0.437	0.086
costs	0.934	0.049	− 0.120	− 0.332
marketing	− 0.042	0.999	0.000	0.018
actuarial science	0.918	− 0.010	− 0.304	0.255

Table 12.12

For each original variable, the highest value of the factor loading was highlighted in Table 12.12. Consequently, while the variables finance, costs, and actuarial show stronger correlations with the first factor, we can see that only the variable marketing shows stronger correlation with the second factor. This proves the need for a second factor in order for all the variables to share significant proportions of variance. However, the third and fourth factors present relatively low correlations with the original variables, which explains the fact that the respective eigenvalues are less than 1. If the variable marketing had not been inserted into the analysis, only the first factor would be necessary to explain the joint behavior of the other variables, and the other factors would also have respective eigenvalues less than 1.

Therefore, as discussed in Section 12.2.4, we can verify that factor loadings between factors corresponding to eigenvalues less than 1 are relatively low, since they have already shown stronger Pearson correlations with factors previously extracted from greater eigenvalues.

Based on Expression (12.30), we can see that the sum of the squared factor loadings in each column in Table 12.12 will be the respective eigenvalue that, as discussed before, can be understood as the proportion of variance shared by the four original variables to form each factor. Therefore, we have:

$\begin{array}{c} {(0.895)}^{2} + {(0.934)}^{2} + {(- 0.042)}^{2} + {(0.918)}^{2} = 2.519 \\ {(0.007)}^{2} + {(0.049)}^{2} + {(0.999)}^{2} + {(- 0.010)}^{2} = 1.000 \\ {(0.437)}^{2} + {(- 0.120)}^{2} + {(0.000)}^{2} + {(- 0.304)}^{2} = 0.298 \\ {(0.086)}^{2} + {(- 0.332)}^{2} + {(0.018)}^{2} + {(0.255)}^{2} = 0.183 \end{array}$

si101_e

from which we can prove that the second eigenvalue only reached value 1 due to the high existing factor loading for the variable marketing.

Furthermore, from the factor loadings presented in Table 12.12, we can also calculate the communalities, which represent the total shared variance of each variable in all the factors extracted from eigenvalues greater than 1. So, based on Expression (12.29), we can write:

$\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(0.007)}^{2} = 0.802 \\ {communality}_{costs} = {(0.934)}^{2} + {(0.049)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.042)}^{2} + {(0.999)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.918)}^{2} + {(- 0.010)}^{2} = 0.843 \end{array}$

si102_e

Consequently, even though the variable marketing is the only one that has a high factor loading with the second factor, it is the variable in which the lowest proportion of variance is lost to form both factors. On the other hand, the variable finance is the one that presents the highest loss of variance to form these two factors (around 19.8%). If we had considered the factor loadings of the four factors, surely, all the communalities would be equal to 1.

As we discussed in Section 12.2.4, we can see that the factor loadings are exactly the parameters estimated in a multiple linear regression model, which shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves, in which the coefficient of determination R² of each model is equal to the communality of the respective original variable.

Therefore, for the first two factors, we can construct a chart in which the factor loadings of each variable are plotted in each one of the orthogonal axes that represent factors F₁ and F₂, respectively. This chart, known as a loading plot, can be seen in Fig. 12.8.

By analyzing the loading plot, the behavior of the correlations becomes clear. While the variables finance, costs, and actuarial show high correlation with the first factor (X-axis), the variable marketing shows strong correlation with the second factor (Y-axis). More inquisitive researchers may investigate the reasons why this phenomenon occurs, since, sometimes, while the subjects Finance, Costs, and Actuarial Science are taught in a more quantitative way, Marketing can be taught in a more qualitative and behavioral manner. However, it is important to mention that the definition of factors does not force researchers to name them, because, normally, this is not a simple task. Factor analysis does not have “naming factors” as one of its goals and, in case we intend to do that, researchers need to have vast knowledge about the phenomenon being studied, and confirmatory techniques can help them in this endeavor.

At this moment, we can consider the preparation of the principal component factor analysis concluded. Nevertheless, as discussed in Section 12.2.5, if researchers wish to obtain a clearer visualization of the variables better represented by a certain factor, they can elaborate a rotation using the Varimax orthogonal method, which maximizes the loadings of each variable in a certain factor. In our example, since we already have an excellent idea of the variables with high loadings in each factor, being the loading plot (Fig. 12.8) already very clear, rotation may be considered unnecessary. Therefore, it will only be elaborated for pedagogical purposes, since, sometimes, researchers may find themselves in situations in which such phenomenon is not so clear.

Consequently, based on the factor loadings for the first two factors (first two columns of Table 12.12), we will obtain rotated factor loadings c′ after rotating both factors for an angle θ. Thus, based on Expression (12.35), we can write:

$(\begin{array}{c} 0.895 & 0.007 \\ 0.934 & 0.049 \\ - 0.042 & 0.999 \\ 0.918 & - 0.010 \end{array}) \cdot (\begin{array}{c} cos θ & - sen θ \\ sen θ & cos θ \end{array}) = (\begin{array}{c} c_{11}^{'} & c_{12}^{'} \\ c_{21}^{'} & c_{22}^{'} \\ ⋮ & ⋮ \\ c_{k 1}^{'} & c_{k 2}^{'} \end{array})$

si103_e

where the counterclockwise rotation angle θ is obtained from Expression (12.36). Nevertheless, before that, we must determine the values of terms A, B, C, and D present in Expressions (12.37)–(12.40). Constructing Tables 12.13–12.16 helps us for this purpose.

Table 12.13

Obtaining Term A to Calculate Rotation Angle θ
Variable	c₁	c₂	communality	$(\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}})$
finance	0.895	0.007	0.802	1.000
costs	0.934	0.049	0.875	0.995
marketing	− 0.042	0.999	1.000	− 0.996
actuarial science	0.918	− 0.010	0.843	1.000
			A (sum)	1.998

Table 12.13

Table 12.14

Obtaining Term B to Calculate Rotation Angle θ
Variable	c₁	c₂	communality	$(2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})$
finance	0.895	0.007	0.802	0.015
costs	0.934	0.049	0.875	0.104
marketing	− 0.042	0.999	1.000	− 0.085
actuarial science	0.918	− 0.010	0.843	− 0.022
			B (sum)	0.012

Table 12.14

Table 12.15

Obtaining Term C to Calculate Rotation Angle θ
Variable	c₁	c₂	communality	${(\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}})}^{2} - {(2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})}^{2}$
finance	0.895	0.007	0.802	1.000
costs	0.934	0.049	0.875	0.978
marketing	− 0.042	0.999	1.000	0.986
actuarial science	0.918	− 0.010	0.843	0.999
			C (sum)	3.963

Table 12.15

Table 12.16

Obtaining Term D to Calculate Rotation Angle θ
Variable	c₁	c₂	communality	$(\frac{c_{1 l}^{2}}{{communality}_{l}} - \frac{c_{2 l}^{2}}{{communality}_{l}}) \cdot (2 \cdot \frac{c_{1 l} \cdot c_{2 l}}{{communality}_{l}})$
finance	0.895	0.007	0.802	0.015
costs	0.934	0.049	0.875	0.103
marketing	− 0.042	0.999	1.000	0.084
actuarial science	0.918	− 0.010	0.843	− 0.022
			D (sum)	0.181

Table 12.16

So, taking the k = 4 variables into consideration and based on Expression (12.36), we can calculate the counterclockwise rotation angle θ as follows:

$θ = 0.25 \cdot arctan [\frac{2 \cdot (D \cdot k - A \cdot B)}{C \cdot k - (A^{2} - B^{2})}] = 0 \cdot 25 . arctan \{\frac{2 \cdot [(0.181) \cdot 4 - (1.998) \cdot (0.012)]}{(3.963) \cdot 4 - [{(1.998)}^{2} - {(0.012)}^{2}]}\} = 0.029 rad$

si104_e

And, finally, we can calculate the rotated factor loadings:

$(\begin{array}{c} 0.895 & 0.007 \\ 0.934 & 0.049 \\ - 0.042 & 0.999 \\ 0.918 & - 0.010 \end{array}) \cdot (\begin{array}{c} cos 0.029 & - sen 0.029 \\ sen 0.029 & cos 0.029 \end{array}) = (\begin{array}{c} c_{11}^{'} & c_{12}^{'} \\ c_{21}^{'} & c_{22}^{'} \\ c_{31}^{'} & c_{32}^{'} \\ c_{41}^{'} & c_{42}^{'} \end{array}) = (\begin{array}{c} 0.895 & - 0.019 \\ 0.935 & 0.021 \\ - 0.013 & 1.000 \\ 0.917 & - 0.037 \end{array})$

si105_e

Table 12.17 shows, in a consolidated way, the rotated factor loadings through the Varimax method for the data in our example.

Table 12.17

Rotated Factor Loadings Through the Varimax Method
Variable	Factor
Variable	F₁′	F₂′
finance	0.895	− 0.019
costs	0.935	0.021
marketing	− 0.013	1.000
actuarial science	0.917	− 0.037

Table 12.17

As we have already mentioned, even though the results without the rotation already showed which variables presented high loadings in each factor, rotation ended up distributing, even if lightly for the data in our example, the variable loadings to each one of the rotated factors. A new loading plot (now with rotated loadings) can also demonstrate this situation (Fig. 12.9).

Fig. 12.9 Loading plot with rotated loadings.

Even though the plots in Figs. 12.8 and 12.9 are very similar, since rotation angle θ is very small in this example, it is common for the researcher to find situations in which the rotation will contribute considerably for an easier understanding of the loadings, which can, consequently, simplify the interpretation of the factors.

It is important to emphasize that the rotation does not change the communalities, that is, Expression (12.31) can be verified:

$\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(- 0.019)}^{2} = 0.802 \\ {communality}_{costs} = {(0.935)}^{2} + {(0.021)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.013)}^{2} + {(1.000)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.917)}^{2} + {(- 0.037)}^{2} = 0.843 \end{array}$

si106_e

Nonetheless, rotation changes the eigenvalues corresponding to each factor. Thus, for the two rotated factors, we have:

$\begin{array}{l} {(0.895)}^{2} + {(0.935)}^{2} + {(- 0.013)}^{2} + {(0.917)}^{2} = {λ^{'}}_{1}^{2} = 2.518 \\ {(- 0.019)}^{2} + {(0.021)}^{2} + {(1.000)}^{2} + {(- 0.037)}^{2} = {λ^{'}}_{2}^{2} = 1.002 \end{array}$

si107_e

Table 12.18 shows, based on the new eigenvalues λ′₁² and λ′₂², the proportions of variance shared by the original variables to form both rotated factors.

Table 12.18

Variance Shared by the Original Variables to Form Both Rotated Factors
Factor	Eigenvalue λ^′2	Shared Variance (%)	Cumulative Shared Variance (%)
1	2.518	$(\frac{2.518}{4}) \cdot 100 = 62.942$	62.942
2	1.002	$(\frac{1.002}{4}) \cdot 100 = 25.043$	87.985

Table 12.18

In comparison to Table 12.10, we can see that even though there is no change in the sharing of 87.985% of the total variance of the original variables to form the rotated factors, the rotation redistributes the variance shared by the variables in each factor.

As we have already discussed, the factor loadings correspond to the parameters estimated in a multiple linear regression model that shows, as a dependent variable, a certain standardized variable and, as explanatory variables, the factors themselves. Therefore, through algebraic operations, we can arrive at the factor scores expressions from the loadings, since they represent the estimated parameters of the respective regression models that have, as a dependent variable, the factors and, as explanatory variables, the standardized variables. Consequently, from the rotated factor loadings (Table 12.17), we arrive at the following rotated factors expressions F₁′ and F₂′.

$F_{1 i}^{'} = 0.355 \cdot {Zfinance}_{i} + 0.372 \cdot {Zcosts}_{i} + 0.012 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i}$

$F_{2 i}^{'} = - 0.004 \cdot {Zfinance}_{i} + 0.038 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.021 \cdot {Zactuarial}_{i}$

Finally, the professor wishes to develop a school performance ranking of his students. Since the two rotated factors, F₁′ and F₂′, are formed by the higher proportions of variance shared by the original variables (in this case, 62.942% and 25.043% of the total variance, respectively, as shown in Table 12.18) and correspond to eigenvalues greater than 1, they will be used to create the desired school performance ranking.

A well-accepted criterion that is used to form rankings from factors is known as weighted rank-sum criterion, in which, for each observation, the values of all factors obtained (that have eigenvalues greater than 1) weighted by the respective proportions of shared variance are added, with the subsequent ranking of the observations based on the results obtained. This criterion is well accepted because it considers the performance of all the original variables, since only considering the first factor (principal factor criterion) may not consider the positive performance, for instance, obtained in a certain variable that may possibly share a considerable proportion of variance with the second factor. For 10 students chosen from the sample, Table 12.19 shows the result of the school performance ranking resulting from the ranking created after the sum of the values obtained from the factors weighted by the respective proportions of shared variance.

Table 12.19

School Performance Ranking Through the Weighted Rank-Sum Criterion
Student	Zfinance_i	Zcosts_i	Zmarketing_i	Zactuarial_i	F_1i′	F_2i′	(F_1i′ 0.62942) + (F_2i′ 0.25043)	Ranking
Adelino	1.30	2.15	1.53	1.86	1.959	1.568	1.626	1
Renata	0.60	2.15	1.53	1.86	1.709	1.570	1.469	2
⋮
Ovidio	1.33	2.15	− 1.65	1.86	1.932	− 1.611	0.813	13
Kamal	1.33	2.07	− 1.65	1.86	1.902	− 1.614	0.793	14
⋮
Itamar	− 1.29	− 0.55	1.53	− 1.04	− 1.022	1.536	− 0.259	57
Luiz Felipe	− 0.88	− 0.70	1.53	− 1.32	− 1.032	1.535	− 0.265	58
⋮
Gabriela	− 0.01	− 0.29	− 1.65	0.27	− 0.032	− 1.665	− 0.437	73
Marina	0.50	− 0.50	− 0.94	− 1.16	− 0.443	− 0.939	− 0.514	74
⋮
Viviane	− 1.64	− 1.16	− 1.01	− 1.00	− 1.390	− 1.029	− 1.133	99
Gilmar	− 1.52	− 1.16	− 1.40	− 1.44	− 1.512	− 1.409	− 1.304	100

Table 12.19

The complete ranking can be found in the file FactorGradesRanking.xls.

It is essential to highlight that the creation of performance rankings from original variables is considered to be a static procedure, since the inclusion of new observations or variables may alter the factor scores, which makes the preparation of a new factor analysis mandatory. As time goes by, the evolution of the phenomena represented by the variables may change the correlation matrix, which makes it necessary to reapply the technique in order to generate new factors obtained from more precise and updated scores. Here, therefore, we express a criticism against socioeconomic indexes that use previously established static scores for each variable when calculating the factor to be used to define the ranking in situations in which new observations are constantly included; more than this, in situations in which there is an evolution throughout time, which changes the correlation matrix of the original variables in each period.

Finally, it is worth mentioning that the factors extracted are quantitative variables and, therefore, from them, other multivariate exploratory techniques can be elaborated, such as, a cluster analysis, depending on the researcher’s objectives. Besides, each factor can also be transformed into a qualitative variable as, for example, through its categorization into ranges, established based on a certain criterion and, from then on, a correspondence analysis could be elaborated, in order to assess a possible association between the generated categories and the categories of other qualitative variables.

Factors can also be used as explanatory variables of a certain phenomenon in confirmatory multivariate models as, for instance, multiple regression models, since orthogonality eliminates multicollinearity problems. On the other hand, such procedure only makes sense when we intend to elaborate a diagnostic regarding the behavior of the dependent variable, without aiming at having forecasts. Since new observations do not show the corresponding values of the factors generated, obtaining it is only possible if we include such observations in a new factor analysis, in order to obtain new factor scores, since it is an exploratory technique.

Furthermore, a qualitative variable obtained through the categorization of a certain factor into ranges can also be inserted as a dependent variable of a multinomial logistic regression model, allowing researchers to evaluate the probabilities each observation has of being in each range, due to the behavior of other explanatory variables not initially considered in the factor analysis. We would also like to highlight that this procedure has a diagnostic nature, trying to find out the behavior of the variables in the sample for the existing observations, without a predictive purpose.

Next, this same example will be elaborated in the software packages SPSS and Stata. In Section 12.3, the procedures for preparing the principal component factor analysis in SPSS will be presented, as well as their results. In Section 12.4, the commands for running the technique in Stata will be presented, with their respective outputs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12: Principal Component Factor Analysis

Create new playlist

Sign In

Sign Up

12.1 Introduction

12.2 Principal Component Factor Analysis

12.2.1 Pearson’s Linear Correlation and the Concept of Factor

12.2.2 Overall Adequacy of the Factor Analysis: Kaiser-Meyer-Olkin Statistic and Bartlett’s Test of Sphericity

12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores

12.2.4 Factor Loadings and Communalities

12.2.5 Factor Rotation

12.2.6 A Practical Example of the Principal Component Factor Analysis

Table of Contents for
Chapter 12: Principal Component Factor Analysis