12.3 Principal Component Factor Analysis in SPSS

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12.3 Principal Component Factor Analysis in SPSS

In this section, we will discuss the step by step for developing our example in the IBM SPSS Statistics Software. Following the logic proposed in this book, the main objective is to give researchers an opportunity to elaborate the principal component factor analysis in this software package, given how easy it is to use it and how didactical the operations are. Every time we present an output, we will mention the respective result obtained when performing the algebraic solution of the technique in the previous section, so that researchers can compare them and broaden their own knowledge and understanding about it. The use of the images in this section has been authorized by the International Business Machines Corporation©.

Going back to the example presented in Section 12.2.6, remember that the professor is interested in creating a school performance ranking of his students based on the joint behavior of their final grades in four subjects. The data can be found in the file FactorGrades.sav and are exactly the same as the ones partially presented in Table 12.6 in Section 12.2.6.

Therefore, in order for the factor analysis to be elaborated, let’s click on Analyze → Dimension Reduction →Factor …. A dialog box as the one shown in Fig. 12.10 will open.

Fig. 12.10 Dialog box for running a factor analysis in SPSS.

Next, we must insert the original variables finance, costs, marketing, and actuarial into Variables, as shown in Fig. 12.11.

Different from what was discussed in the previous chapter, when developing the cluster analysis, it is important to mention that the researcher does not need to worry about with the Z-scores standardization of the original variables to elaborate the factor analysis, since the correlations between original variables or between their corresponding standardized variables are exactly the same. Even so, if researchers choose to standardize each one of the variables, they will see that the outputs will be exactly the same.

In Descriptives …, first, let’s select the option Initial solution in Statistics …, which makes all the eigenvalues of the correlation matrix be presented in the outputs, even the ones that are less than 1. In addition, let’s select the options Coefficients, Determinant, and KMO and Bartlett’s test of sphericity in Correlation Matrix, as shown in Fig. 12.12.

Fig. 12.12 Selecting the initial options for running the factor analysis.

When we click on Continue, we will go back to the main dialog box of the factor analysis. Next, we must click on Extraction …. As shown in Fig. 12.13, we will maintain the options regarding the factor extraction method selected (Method: Principal components) and the choice criterion of the number of factors. In this case, as discussed in Section 12.2.3, only the factors that correspond to eigenvalues greater than 1 will be considered (latent root criterion or Kaiser criterion), and, therefore, we must maintain the option Based on Eigenvalue → Eigenvalues greater than: 1 in Extract selected. Moreover, we will also maintain the options Unrotated factor solution, in Display, and Correlation matrix, in Analyze, selected.

In the same way, let’s click on Continue so that we can go back to the main dialog box of the factor analysis. In Rotation …, for now, let’s select the option Loading plot(s) in Display, while still maintaining the option None in Method selected, as shown in Fig. 12.14.

Fig. 12.14 Dialog box for selecting the rotation method and the loading plot.

Choosing the extraction of unrotated factors at this moment is didactical, since the outputs generated may be compared to the ones obtained algebraically in Section 12.2.6. Nevertheless, researchers can choose to extract rotated factors at this opportunity.

After clicking on Continue, we can select the button Scores … in the technique’s main dialog box. At this moment, let’s select the option Display factor score coefficient matrix, as shown in Fig. 12.15, which makes the factor scores that correspond to each factor extracted be presented in the outputs.

Fig. 12.15 Selecting the option to present the factor scores.

Next, we can click on Continue and on OK.

The first output (Fig. 12.16) shows correlation matrix ρ, equal to the one in Table 12.7 in Section 12.2.6, through which we can see that the variable marketing is the only one that shows low Pearson’s correlation coefficients with all the other variables. As we have already discussed, it is a first indication that the variables finance, costs, and actuarial can be correlated with a certain factor, while the variable marketing can correlate strongly with another one.

We can also verify that the output seen in Fig. 12.16 shows the value of the determinant of correlation matrix ρ too, used to calculate the χ_Bartlett² statistic, as discussed when we presented Expression (12.9).

In order to study the overall adequacy of the factor analysis, let’s analyze the outputs in Fig. 12.17, which shows the results of the calculations that correspond to the KMO statistic and χ_Bartlett². While the first suggests that the overall adequacy of the factor analysis is considered middling (KMO = 0.737), based on the criterion presented in Table 12.2, the χ_Bartlett² statistic = 192.335 (Sig. χ_Bartlett² < 0.05 for 6 degrees of freedom) allows us to reject that correlation matrix ρ is statistically equal to identity matrix I with the same dimension, at a significance level of 0.05 and based on the hypotheses of Bartlett’s test of sphericity. Thus, we can conclude that the factor analysis is adequate.

The values of the KMO and χ_Bartlett² statistics are calculated through Expressions (12.3) and (12.9), respectively, presented in Section 12.2.2, and are exactly the same as the ones obtained algebraically in Section 12.2.6.

Next, Fig. 12.18 shows the four eigenvalues of correlation matrix ρ that correspond to each one of the factors extracted initially, with the respective proportions of variance shared by the original variables.

Note that the eigenvalues are exactly the same as the ones obtained algebraically in Section 12.2.6, such that:

$λ_{1}^{2} + λ_{2}^{2} + \dots + λ_{k}^{2} = 2.519 + 1.000 + 0.298 + 0.183 = 4$ $λ_{1}^{2} + λ_{2}^{2} + \dots + λ_{k}^{2} = 2.519 + 1.000 + 0.298 + 0.183 = 4$

Since in the analysis we will only consider the factors whose eigenvalues are greater than 1, the right-hand side of Fig. 12.18 shows the proportion of variance shared by the original variables to only form these factors. Therefore, analogous to what was the presented in Table 12.10, we can state that, while 62.975% of the total variance are shared to form the first factor, 25.010% are shared to form the second. Thus, to form these two factors, the total loss of variance of the original variables is equal to 12.015%.

Having extracted two factors, Fig. 12.19 shows the factor scores that correspond to each one of the standardized variables for each one of these factors.

Hence, we are able to write the expressions of factors F₁ and F₂ as follows:

$F_{1 i} = 0.355 \cdot {Zfinance}_{i} + 0.371 \cdot {Zcosts}_{i} - 0.017 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i}$ $F_{1 i} = 0.355 \cdot {Zfinance}_{i} + 0.371 \cdot {Zcosts}_{i} - 0.017 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i}$

$F_{2 i} = 0.007 \cdot {Zfinance}_{i} + 0.049 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.010 \cdot {Zactuarial}_{i}$ $F_{2 i} = 0.007 \cdot {Zfinance}_{i} + 0.049 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.010 \cdot {Zactuarial}_{i}$

Note that the expressions are identical to the ones obtained in Section 12.2.6 from the algebraic definition of unrotated factor scores.

Fig. 12.20 shows the factor loadings, which correspond to Pearson’s correlation coefficients between the original variables and each one of the factors. The values shown in Fig. 12.20 are equal to the ones presented in the first two columns of Table 12.12.

The highest factor loading is highlighted for each variable and, therefore, we can verify that, while the variables finance, costs, and actuarial show stronger correlations with the first factor, only the variable marketing shows stronger correlation with the second factor.

As we also discussed in Section 12.2.6, the sum of the squared factor loadings in the columns results in the eigenvalue of the corresponding factor, that is, it represents the proportion of variance shared by the four original variables to form each factor. Thus, we can verify that:

${(0.895)}^{2} + {(0.934)}^{2} + {(- 0.042)}^{2} + {(0.918)}^{2} = 2.519$ ${(0.895)}^{2} + {(0.934)}^{2} + {(- 0.042)}^{2} + {(0.918)}^{2} = 2.519$

${(0.007)}^{2} + {(0.049)}^{2} + {(0.999)}^{2} + {(- 0.010)}^{2} = 1.000$ ${(0.007)}^{2} + {(0.049)}^{2} + {(0.999)}^{2} + {(- 0.010)}^{2} = 1.000$

On the other hand, the sum of the squared factor loadings in the rows results in the communality of the respective variable, that is, it represents the proportion of shared variance of each original variable in the two factors extracted. Therefore, we can also see that:

$\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(0.007)}^{2} = 0.802 \\ {communality}_{costs} = {(0.934)}^{2} + {(0.049)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.042)}^{2} + {(0.999)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.918)}^{2} + {(- 0.010)}^{2} = 0.843 \end{array}$ $\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(0.007)}^{2} = 0.802 \\ {communality}_{costs} = {(0.934)}^{2} + {(0.049)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.042)}^{2} + {(0.999)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.918)}^{2} + {(- 0.010)}^{2} = 0.843 \end{array}$

si102_e

In the SPSS outputs, the communalities table is also presented, as shown in Fig. 12.21.

The loading plot that shows the relative position of each variable in each factor, based on the respective factor loadings, is also shown in the outputs, as shown in Fig. 12.22 (equivalent to Fig. 12.8 in Section 12.2.6), in which the X-axis represents factor F₁, and the Y-axis, factor F₂.

Even though the relative position of the variables in each axis is very clear, that is, the magnitude of the correlations between each one of them and each factor, for pedagogical purposes, we chose to elaborate the rotation of the axes, which can sometimes facilitate the interpretation of the factors because it provides a better distribution of the variables’ factor loadings in each factor.

Thus, once again, let’s click on Analyze → Dimension Reduction → Factor … and, on the button Rotation …, select the option Varimax, as shown in Fig. 12.23.

When we click on Continue, we will go back to the main dialog box of the factor analysis. In Scores …, let’s select the option Save as variables, as shown in Fig. 12.24, so that the factors generated, now rotated, can be made available in the dataset as new variables. From these factors, the students’ school performance ranking will be created.

Fig. 12.24 Selecting the option to save the factors as new variables in the dataset.

Next, we can click on Continue and on OK.

Figs. 12.25–12.29 show the outputs that present differences in relation to the previous ones, due to the rotation. In this regard, the results of the correlation matrix, of the KMO statistic, of Bartlett’s test of sphericity, and of the communalities table are not presented again, which, even though they were calculated from the rotated loadings, do not show changes in their values.

Fig. 12.25 Rotated factor loadings through the Varimax method.

Fig. 12.26 Loading plot with rotated loadings.

Fig. 12.28 Eigenvalues and variance shared by the original variables to form both rotated factors.

Fig. 12.25 shows these rotated factor loadings and, through them, it is possible to verify, even if very tenuously, a certain redistribution of the variable loadings in each factor.

Note that the rotated factor loadings in Fig. 12.25 are exactly the same as the ones obtained algebraically in Section 12.2.6, from Expressions (12.35) to (12.40), and presented in Table 12.17.

The new loading plot, constructed from the rotated factor loadings and equivalent to Fig. 12.9, can be seen in Fig. 12.26.

The rotation angle calculated algebraically in Section 12.2.6 is also a part of the SPSS outputs and can be found in Fig. 12.27.

As we have already discussed, from the rotated factor loadings, we can verify that there are no changes in the communality values of the variables considered in the analysis, that is:

$\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(- 0.019)}^{2} = 0.802 \\ {communality}_{costs} = {(0.935)}^{2} + {(0.021)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.013)}^{2} + {(1.000)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.917)}^{2} + {(- 0.037)}^{2} = 0.843 \end{array}$ $\begin{array}{c} {communality}_{finance} = {(0.895)}^{2} + {(- 0.019)}^{2} = 0.802 \\ {communality}_{costs} = {(0.935)}^{2} + {(0.021)}^{2} = 0.875 \\ {communality}_{marketing} = {(- 0.013)}^{2} + {(1.000)}^{2} = 1.000 \\ {communality}_{actuarial} = {(0.917)}^{2} + {(- 0.037)}^{2} = 0.843 \end{array}$

si106_e

On the other hand, the new eigenvalues can be obtained as follows:

$\begin{array}{l} {(0.895)}^{2} + {(0.935)}^{2} + {(- 0.013)}^{2} + {(0.917)}^{2} = {λ^{'}}_{1}^{2} = 2.518 \\ {(- 0.019)}^{2} + {(0.021)}^{2} + {(1.000)}^{2} + {(- 0.037)}^{2} = {λ^{'}}_{2}^{2} = 1.002 \end{array}$ $\begin{array}{l} {(0.895)}^{2} + {(0.935)}^{2} + {(- 0.013)}^{2} + {(0.917)}^{2} = {λ^{'}}_{1}^{2} = 2.518 \\ {(- 0.019)}^{2} + {(0.021)}^{2} + {(1.000)}^{2} + {(- 0.037)}^{2} = {λ^{'}}_{2}^{2} = 1.002 \end{array}$

si107_e

Fig. 12.28 shows the results of the eigenvalues for the first two rotated factors in Rotation Sums of Squared Loadings, with their respective proportions of variance shared by the four original variables. The results are in accordance with the ones presented in Table 12.18.

In comparison to the results obtained before the rotation, we can see that, even though there is no change in the sharing of 87.985% of the total variance of the original variables to form both rotated factors, the rotation redistributed the variance shared by the variables to each factor.

Fig. 12.29 shows the rotated factor scores, from which the expressions of the new factors can be obtained.

Therefore, we can write the following rotated factors expressions:

$F_{1 i}^{'} = 0.355 \cdot {Zfinance}_{i} + 0.372 \cdot {Zcosts}_{i} + 0.012 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i}$ $F_{1 i}^{'} = 0.355 \cdot {Zfinance}_{i} + 0.372 \cdot {Zcosts}_{i} + 0.012 \cdot {Zmarketing}_{i} + 0.364 \cdot {Zactuarial}_{i}$

$F_{2 i}^{'} = - 0.004 \cdot {Zfinance}_{i} + 0.038 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.021 \cdot {Zactuarial}_{i}$ $F_{2 i}^{'} = - 0.004 \cdot {Zfinance}_{i} + 0.038 \cdot {Zcosts}_{i} + 0.999 \cdot {Zmarketing}_{i} - 0.021 \cdot {Zactuarial}_{i}$

When developing the procedure described, we can verify that two new variables are generated in the dataset, called FAC1_1 and FAC2_1 by SPSS, as shown in Fig. 12.30 for the first 20 observations.

Fig. 12.30 Dataset with the F₁′ (FAC1_1) and F₂′ (FAC2_1) values per observation.

These new variables, which show the values of both rotated factors for each one of the observations in the dataset, are orthogonal to one another, that is, they have a Pearson’s correlation coefficient equal to 0. This can be verified when we click on Analyze → Correlate → Bivariate …. In the dialog box that will open, we must insert the four original variables into Variables and select the options Pearson (in Correlation Coefficients) and Two-tailed (in Test of Significance), as shown in Fig. 12.31.

Fig. 12.31 Dialog box for determining Pearson’s correlation coefficient between both rotated factors.

When we click on OK, the output seen in Fig. 12.32 will be presented, in which it is possible to verify that Pearson’s correlation coefficient between both rotated factors is equal to 0.

According to what was studied in Sections 12.2.4 and 12.2.6, a more inquisitive researcher may also verify that the rotated factor scores can be obtained through the estimation of two multiple linear regression models, in which a certain factor is considered to be a dependent variable in each one of them, and as explanatory variables, the standardized variables. The factor scores will be the parameters estimated in each model.

In the same way, it is also possible to verify that the rotated factor loadings can be obtained by using the estimation of four multiple linear regression models as well, in which, in each one of them, a certain standardized variable is considered to be a dependent variable, and the factors, explanatory variables. While the factor loadings will be the parameters estimated in each model, the communalities will be the respective coefficients of determination R². Therefore, the following expressions can be obtained:

${Zfinance}_{i} = 0.895 \cdot F_{1 i}^{'} - 0.019 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.802$ ${Zfinance}_{i} = 0.895 \cdot F_{1 i}^{'} - 0.019 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.802$

${Zcosts}_{i} = 0.935 \cdot F_{1 i}^{'} + 0.021 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.875$ ${Zcosts}_{i} = 0.935 \cdot F_{1 i}^{'} + 0.021 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.875$

${Zmarketing}_{i} = - 0.013 \cdot F_{1 i}^{'} + 1.000 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1.000$ ${Zmarketing}_{i} = - 0.013 \cdot F_{1 i}^{'} + 1.000 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1.000$

${Zactuarial}_{i} = 0.917 \cdot F_{1 i}^{'} - 0.037 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.843$ ${Zactuarial}_{i} = 0.917 \cdot F_{1 i}^{'} - 0.037 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 0.843$

in which the terms u_i represent additional sources of variance, besides factors F₁′ and F₂′, to explain the behavior of each variable, and they are also called error terms or residuals.

In case there is any interest in verifying these facts, we must obtain the standardized variables by clicking on Analyze → Descriptive Statistics → Descriptives …. When we select all the original variables, we must click on Save standardized values as variables. Although this specific procedure is not shown here, after clicking on OK, the standardized variables will be generated in the dataset itself.

Therefore, based on the factors generated, we are able to create the desired school performance ranking. In order to do that, we will use the criterion described in Section 12.2.6, known as weighted rank-sum criterion, in which a new variable is generated from the multiplication of the values of each factor by the respective proportions of variance shared by the original variables. Thus, this new variable, which we call ranking, has the following expression:

${ranking}_{i} = 0.62942 \cdot F_{1 i}^{'} + 0.25043 \cdot F_{2 i}^{'}$ ${ranking}_{i} = 0.62942 \cdot F_{1 i}^{'} + 0.25043 \cdot F_{2 i}^{'}$

in which parameters 0.62942 and 0.25043 correspond to the proportions of variance shared by the first two factors, respectively, as shown in Fig. 12.28.

In order for the variable to be generated in the dataset, we must click on Transform → Compute Variable …. In Target Variable, we must type the name of the new variable (ranking) and, in Numeric Expression, we must type the weighted sum expression (FAC1_1⁎0.62942) + (FAC2_1⁎0.25043), as shown in Fig. 12.33. When we click on OK, the variable ranking will appear in the dataset.

Fig. 12.33 Creating the new variable (ranking).

Finally, to sort variable ranking, we must click on Data → Sort Cases …. In addition to selecting the option Descending, we must insert the variable ranking into Sort by, as shown in Fig. 12.34. When we click on OK, the observations will appear sorted out in the dataset, from the highest to the lowest value of variable ranking, as shown in Fig. 12.35 for the 20 observations with the best performance school.

Fig. 12.34 Dialog box for sorting the observations by variable ranking.

Fig. 12.35 Dataset with the school performance ranking.

We can see that the ranking constructed through the weighted rank-sum criterion points to Adelino as the student with the best school performance in that set of subjects, followed by Renata, Giulia, Felipe, and Cecilia.

Having presented the procedures for applying the principal component factor analysis in SPSS, let’s now discuss the technique in Stata, following the standard used in this book.

12.4 Principal Component Factor Analysis in Stata

We now present the step by step for preparing our example in Stata Statistical Software. In this section, our main goal is not to discuss the concepts of the principal component factor analysis once again, instead, it is to give researchers an opportunity to elaborate the technique by using the commands in this software. Every time we present an output, we will mention the respective result obtained when applying the technique in an algebraic way and also by using SPSS. The use of the images in this section has been authorized by StataCorp LP©.

Therefore, right away, we begin with the dataset constructed by the professor starting from the questions asked to each one of his 100 students. This dataset can be found in the file FactorGrades.dta and is exactly the same as the one partially presented in Table 12.6 in Section 12.2.6.

First of all, we can type the command desc, which makes the analysis of the dataset characteristics possible, such as, the number of observations, the number of variables, and the description of each of them. Fig. 12.36 shows this first output in Stata.

Fig. 12.36 Description of the FactorGrades.dta dataset.

The command pwcorr …, sig generates Pearson’s correlation coefficients between each pair of variables, with their respective significance levels. Therefore, we must type the following command:

pwcorr finance costs marketing actuarial, sig

Fig. 12.37 shows the output generated.

The outputs seen in Fig. 12.37 show that the correlations between the variable marketing and each one of the other variables are relatively low and not statistically significant, at a significance level of 0.05. On the other hand, the other variables have high and statistically significant correlations, between one another, at this significance level, which is a first indication that the factor analysis may group them in a certain factor without any substantial loss of their variances, while the variable marketing may show high correlation with another factor. This figure is in accordance with the one presented in Table 12.7 in Section 12.2.6, and also in Fig. 12.16, when we elaborated the technique in SPSS (Section 12.3).

The factor analysis’s overall adequacy can be evaluated through the results of the KMO statistic and Bartlett’s test of sphericity, which can be obtained by using the command factortest. Thus, let’s type:

factortest finance costs marketing actuarial

The outputs generated can be seen in Fig. 12.38.

Based on the result of the KMO statistic, the overall adequacy of the factor analysis can be considered middling. However, more important than this piece of information is the result of Bartlett’s test of sphericity. From the result of the χ_Bartlett² statistic, with a significance level of 0.05 and 6 degrees of freedom, we can say that Pearson’s correlation matrix is statistically different from the identity matrix with the same dimension, since χ_Bartlett² = 192.335 (χ² calculated for 6 degrees of freedom) and Prob. χ_Bartlett² (P-value) < 0.05. Note that the results of these statistics are in accordance with the ones calculated algebraically in Section 12.2.6 and also shown in Fig. 12.17 of Section 12.3. Fig. 12.38 also shows the value of the determinant of the correlation matrix, used to calculate of the χ_Bartlett² statistic.

Stata also allows us to obtain the eigenvalues and eigenvectors of the correlation matrix. In order to do that, we must type the following command:

pca finance costs marketing actuarial

Fig. 12.39 shows these eigenvalues and eigenvectors, and they are exactly the same as the ones calculated algebraically in Section 12.2.6. Since we have not elaborated the procedure for rotating the factors generated yet, we can verify that the proportions of variance shared by the original variables to form each factor correspond to the ones presented in Table 12.10.

After having presented these first outputs, we can now elaborate the principal component factor analysis itself by typing the following command, whose results are shown in Fig. 12.40.

Fig. 12.40 Outputs of the principal component factor analysis in Stata.

factor finance costs marketing actuarial, pcf

where the term pcf refers to the principal-component factor method.

While the upper part of Fig. 12.40 shows the eigenvalues of the correlation matrix once again, with the respective proportions of shared variance of the original variables, since researchers can choose not to use the command pca, the lower part of the figure shows the factor loadings, which represent the correlations between each variable and the factors that only have eigenvalues greater than 1. Therefore, we can see that Stata automatically considers the latent root criterion (Kaiser criterion) when choosing the number of factors. If for some reason researchers choose to extract a number of factors considering a smaller eigenvalue so that more factors can be extracted, they must type the term mineigen(#) at the end of the command factor, in which # will be a number that corresponds to the eigenvalue from which factors will be extracted.

The factor loadings shown in Fig. 12.40 are equal to the first two columns of Table 12.12 in Section 12.2.6, and in Fig. 12.20 of Section 12.3. Through them, we can see that, while the variables finance, costs, and actuarial show high correlations with the first factor, the variable marketing shows strong correlation with the second factor. Besides, in the factor loadings matrix, a column called Uniqueness is also presented, or exclusivity, whose values represent, for each variable, the proportion of variance lost to form the factors extracted, that is, corresponds to (1—communality) of each variable. Therefore, we have:

$\begin{array}{c} {uniqueness}_{finance} = 1 - [{(0.8953)}^{2} + {(0.0068)}^{2}] = 0.1983 \\ {uniqueness}_{costs} = 1 - [{(0.9343)}^{2} + {(0.0487)}^{2}] = 0.1246 \\ {uniqueness}_{marketing} = 1 - [{(- 0.0424)}^{2} + {(0.9989)}^{2}] = 0.0003 \\ {uniqueness}_{actuarial} = 1 - [{(0.9179)}^{2} + {(- 0.0101)}^{2}] = 0.1573 \end{array}$ $\begin{array}{c} {uniqueness}_{finance} = 1 - [{(0.8953)}^{2} + {(0.0068)}^{2}] = 0.1983 \\ {uniqueness}_{costs} = 1 - [{(0.9343)}^{2} + {(0.0487)}^{2}] = 0.1246 \\ {uniqueness}_{marketing} = 1 - [{(- 0.0424)}^{2} + {(0.9989)}^{2}] = 0.0003 \\ {uniqueness}_{actuarial} = 1 - [{(0.9179)}^{2} + {(- 0.0101)}^{2}] = 0.1573 \end{array}$

si125_e

Consequently, because the variable marketing has low correlations with each one of the other original variables, it ends up having a high Pearson’s correlation with the second factor. This makes its uniqueness value be very low, since its proportion of variance shared with the second factor is almost equal to 100%.

Knowing that two factors are extracted, at this moment, we will carry out the rotation by using the Varimax method. In order to do that, we must type the following command:

rotate, varimax horst

where the term horst defines the rotation angle from the standardized factor loadings. This procedure is in accordance with the one elaborated algebraically in Section 12.2.6. The outputs generated can be seen in Fig. 12.41.

From Fig. 12.41, as we have already discussed, we can verify that the proportion of variance shared by all the variables to form both factors is equal to 87.98%, even though the eigenvalue of each factor rotated is different from the one obtained previously. The same can be said regarding the uniqueness values of each variable, even if the rotated factor loadings are different in relation to their unrotated corresponding ones, since the Varimax method maximizes the loadings of each variable in a certain factor. Fig. 12.41 also shows the rotation angle at the end. All of these outputs are identical to the ones calculated in Section 12.2.6 and they were also presented when we elaborated the technique in SPSS, in Figs. 12.25, 12.27, and 12.28.

Thus, we can say that:

$\begin{array}{c} {uniqueness}_{finance} = 1 - [{(0.8951)}^{2} + {(- 0.0195)}^{2}] = 0.1983 \\ {uniqueness}_{costs} = 1 - [{(0.9354)}^{2} + {(0.0213)}^{2}] = 0.1246 \\ {uniqueness}_{marketing} = 1 - [{(- 0.0131)}^{2} + {(0.9997)}^{2}] = 0.0003 \\ {uniqueness}_{actuarial} = 1 - [{(0.9172)}^{2} + {(- 0.0370)}^{2}] = 0.1573 \end{array}$ $\begin{array}{c} {uniqueness}_{finance} = 1 - [{(0.8951)}^{2} + {(- 0.0195)}^{2}] = 0.1983 \\ {uniqueness}_{costs} = 1 - [{(0.9354)}^{2} + {(0.0213)}^{2}] = 0.1246 \\ {uniqueness}_{marketing} = 1 - [{(- 0.0131)}^{2} + {(0.9997)}^{2}] = 0.0003 \\ {uniqueness}_{actuarial} = 1 - [{(0.9172)}^{2} + {(- 0.0370)}^{2}] = 0.1573 \end{array}$

si126_e

and that:

$\begin{array}{l} {(0.8951)}^{2} + {(0.9354)}^{2} + {(- 0.0131)}^{2} + {(0.9172)}^{2} = {λ^{'}}_{1}^{2} = 2.51768 \\ {(- 0.0195)}^{2} + {(0.0213)}^{2} + {(0.9997)}^{2} + {(- 0.0370)}^{2} = {λ^{'}}_{2}^{2} = 1.00170 \end{array}$ $\begin{array}{l} {(0.8951)}^{2} + {(0.9354)}^{2} + {(- 0.0131)}^{2} + {(0.9172)}^{2} = {λ^{'}}_{1}^{2} = 2.51768 \\ {(- 0.0195)}^{2} + {(0.0213)}^{2} + {(0.9997)}^{2} + {(- 0.0370)}^{2} = {λ^{'}}_{2}^{2} = 1.00170 \end{array}$

si127_e

If the researcher wishes to, Stata also allows us to compare the rotated factor loadings to the ones obtained before the rotation in the same table. In order to do that, it is necessary to type the following command, after preparing the rotation:

estat rotatecompare

The outputs generated can be seen in Fig. 12.42.

At this moment, the loading plot of the rotated factor loadings can be obtained by typing the command loadingplot. This chart, which corresponds to the ones presented in Figs. 12.9 and 12.26, can be seen in Fig. 12.43.

Fig. 12.43 Loading plot with rotated loadings.

After developing these procedures, the researcher may want to generate two new variables in the dataset, which correspond to the rotated factors obtained through the factor analysis. Therefore, it is necessary to type the following command:

predict f1 f2

where f1 and f2 are the names of the corresponding variables to the first and second factors, respectively. When we type the command, in addition to creating these two new variables in the dataset, an output similar to the one in Fig. 12.44 will also be generated, in which the rotated factor scores are presented.

The results shown in Fig. 12.44 are equivalent to the ones in SPSS (Fig. 12.29). Besides, it is also possible to verify that both factors generated are orthogonal, that is, they have a Pearson’s correlation coefficient equal to 0. In order to do that, let’s type:

estat common

which results in the output seen in Fig. 12.45.

Only for pedagogical purposes, we can also obtain the scores and the rotated factor loadings from multiple linear regression models. In order to do that, first of all, we have to generate the standardized variables by using the Z-scores procedure in the dataset, from each one of the original variables, by typing the following sequence of commands:

egen zfinance = std(finance)
egen zcosts = std(costs)
egen zmarketing = std(marketing)
egen zactuarial = std(actuarial)

Having done this, we can type the two following commands, which represent two multiple linear regression models, in which each one of them shows a certain factor as a dependent variable and the standardized variables as explanatory variables.

reg f1 zfinance zcosts zmarketing zactuarial
reg f2 zfinance zcosts zmarketing zactuarial

The results of these models can be seen in Fig. 12.46.

By analyzing Fig. 12.46, we note that the parameters estimated in each model correspond to the rotated factor scores for each variable, according to what has already been shown in Fig. 12.44. Thus, since all the parameters of the intercept are practically equal to 0, we can write:

$F_{1 i}^{'} = 0.3554795 \cdot {Zfinance}_{i} + 0.3721907 \cdot {Zcosts}_{i} + 0.0124719 \cdot {Zmarketing}_{i} + 0.3639452 \cdot {Zactuarial}_{i}$ $F_{1 i}^{'} = 0.3554795 \cdot {Zfinance}_{i} + 0.3721907 \cdot {Zcosts}_{i} + 0.0124719 \cdot {Zmarketing}_{i} + 0.3639452 \cdot {Zactuarial}_{i}$

$F_{2 i}^{'} = - 0.0036389 \cdot {Zfinance}_{i} + 0.0377955 \cdot {Zcosts}_{i} + 0.9986053 \cdot {Zmarketing}_{i} - 0.020781 \cdot {Zactuarial}_{i}$ $F_{2 i}^{'} = - 0.0036389 \cdot {Zfinance}_{i} + 0.0377955 \cdot {Zcosts}_{i} + 0.9986053 \cdot {Zmarketing}_{i} - 0.020781 \cdot {Zactuarial}_{i}$

Obviously, since the four variables share variances to form each factor, the coefficients of determination R² of each model are equal to 1.

On the other hand, to obtain the rotated factor loadings, we must type the following four commands, which represent four multiple linear regression models, in which each one of them has a certain standardized variable as a dependent variable, and the rotated factors, as explanatory variables.

reg zfinance f1 f2
reg zcosts f1 f2
reg zmarketing f1 f2
reg zactuarial f1 f2

The results of these models can be seen in Fig. 12.47.

By analyzing this figure, note that the parameters estimated in each model correspond to the rotated factor loadings for each factor, according to what has already been shown in Fig. 12.41. Therefore, since all the parameters of the intercept are practically equal to 0, we can write:

${Zfinance}_{i} = 0.895146 \cdot F_{1 i}^{'} - 0.0194694 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8017$ ${Zfinance}_{i} = 0.895146 \cdot F_{1 i}^{'} - 0.0194694 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8017$

${Zcosts}_{i} = 0.935375 \cdot F_{1 i}^{'} + 0.0212916 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8754$ ${Zcosts}_{i} = 0.935375 \cdot F_{1 i}^{'} + 0.0212916 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8754$

${Zmarketing}_{i} = - 0.013053 \cdot F_{1 i}^{'} + 0.9997495 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.9997$ ${Zmarketing}_{i} = - 0.013053 \cdot F_{1 i}^{'} + 0.9997495 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.9997$

${Zactuarial}_{i} = 0.917223 \cdot F_{1 i}^{'} - 0.0370175 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8427$ ${Zactuarial}_{i} = 0.917223 \cdot F_{1 i}^{'} - 0.0370175 \cdot F_{2 i}^{'} + u_{i}, R^{2} = 1 - uniqueness = 0.8427$

where the terms u_i represent additional sources of variance, besides factors F₁′ and F₂′, to explain the behavior of each variable, since two other factors with eigenvalues less than 1 could also have been extracted. The coefficients of determination R² of each model different from 1 correspond to the communality values of each variable, that is, to (1 − uniqueness).

Although researchers can choose not to estimate multiple linear regression models when applying the factor analysis, since it is only a verification procedure, we believe that its didactical nature is essential for fully understanding the technique.

From the rotated factors extracted (variables f1 and f2), we can define the desired school performance ranking. As elaborated when applying the technique in SPSS, we will use the criterion described in Section 12.2.6, known as the weighted rank-sum criterion, in which a new variable is generated from the multiplication of the values of each factor by the respective proportions of variance shared by the original variables. Let’s type the following command:

gen ranking = f1⁎0.6294 + f2⁎0.2504

where the terms 0.6294 and 0.2504 correspond to the proportions of variance shared by the first two factors, respectively, as shown in Fig. 12.41. The new variable generated in the dataset is called ranking. Next, we can sort the observations, from the highest to the lowest value of variable ranking, by typing the following command:

gsort -ranking

After that, just as an example, we can list the school performance ranking of the best 20 students, based on the joint behavior of the final grades in all four subjects. In order to do that, we can type the following command:

list student ranking in 1/20

Fig. 12.48 shows the ranking of the top 20 students.

12.5 Final Remarks

Many are the situations in which researchers wish to group variables in one or more factors, to verify the validity of previously established constructs, to create orthogonal factors for future use in confirmatory multivariate techniques that need the absence of multicollinearity, or to create rankings by developing performance indexes. In these situations, the factor analysis procedures are highly recommended, and the most frequently used is known as the principal components.

Therefore, factor analysis allows us to improve decision-making processes based on the behavior and on the interdependence between quantitative variables that have a relative correlation intensity. Since the factors generated from the original variables are also quantitative variables, the outputs of the factor analysis can be inputs in other multivariate techniques, such as, the cluster analysis. The stratification of each factor into ranges may allow the association between these ranges and the categories of other qualitative variables to be evaluated through a correspondence analysis.

The use of factors in confirmatory multivariate techniques may also make sense when researchers intend to elaborate diagnostics about the behavior of a certain dependent variable and use the factors extracted as explanatory variables, fact that eliminates possible multicollinearity problems because the factors are orthogonal. The consideration of a certain qualitative variable obtained from the stratification of a certain factor into ranges can be used, for example, in a multinomial logistic regression model, which allows the preparation of a diagnostic on the probabilities each observation has of being in each range, due to the behavior of other explanatory variables not initially considered in the factor analysis.

Regardless of the main goal for applying the technique, factor analysis may bear good and interesting research fruits that can be useful for the decision-making process. Its preparation must always be carried out through the correct and conscious use of the software package chosen for the modeling, based on the underlying theory and on researchers’ experience and intuition.

12.6 Exercises

(1) From a dataset that contains certain clients’ variables (individuals), analysts from a bank’s Customer Relationship Management department (CRM) elaborated a principal component factor analysis aiming to study the joint behavior of these variables so that, afterwards, they can propose the creation of an investment profile index. The variables used to elaborate the modeling were:

Variable	Description
age	Client’s age i (years)
fixedif	Percentage of resources invested in fixed-income funds (%)
variableif	Percentage of resources invested in variable-income funds (%)
people	Number of people who live in the residence

In a certain management report, these analysts presented the factor loadings (Pearson’s correlation coefficients) between each original variable and both factors extracted by using the latent root criterion or Kaiser criterion. These factor loadings can be found in the table:

Variable	Factor 1	Factor 2
age	0.917	0.047
fixedif	0.874	0.077
variableif	− 0.844	0.197
people	0.031	0.979

We would like you to answer the following questions:

(a) Which eigenvalues correspond to the two factors extracted?
(b) What are the proportions of variance shared by all the variables to form each factor? What is the total proportion of variance lost by the four variables to extract these two factors?
(c) For each variable, what is the proportion of shared variance to form both factors (communality)?
(d) What is the expression of each standardized variable based on the two factors extracted?
(e) Construct a loading plot from the factor loadings.
(f) Interpret both factors based on the distribution of the loadings of each variable.
(2) A researcher specialized in analyzing the behavior of nations’ socioeconomic indexes would like to investigate the possible relationship between variables related to corruption, violence, income, and education, and, in order to do that, he collected data on 50 countries considered to be developed or emerging two years in a row. The data can be found in the files CountriesIndexes.sav and CountriesIndexes.dta, which have the following variables:

Variable	Period	Description
country		A string variable that identifies country i
cpi1	Year 1	Corruption perception index, which corresponds to citizens’ perception of abuses committed by the public sector as regards a nation’s private assets, including administrative and political aspects. The lower the index, the higher the perception of corruption in the country (Source: Transparency International)
cpi2	Year 2
violence1	Year 1	Number of murders per 100,000 inhabitants (Sources: World Health Organization, United Nations Office on Drugs and Crime, and GIMD Global Burden of Injuries)
violence2	Year 2
capita_gdp1	Year 1	Per capita GDP in US$ adjusted for inflation, using 2000 as the base year (Source: World Bank)
capita_gdp2	Year 2
school1	Year 1	Average number of years in school per person over 25 years of age, including primary, secondary, and higher education (Source: Institute for Health Metrics and Evaluation)
school2	Year 2

Unlabelled Table

In order to create a socioeconomic index that generates a country ranking for each year, the researcher has decided to elaborate a principal component factor analysis using the variables of each period. Based on the results obtained, we would like you to answer the following questions:

(a) By using the KMO statistic and Bartlett’s test of sphericity, is it possible to state that the principal component factor analysis is adequate for each one of the years of study? In the case of Bartlett’s test of sphericity, use a significance level of 0.05.
(b) How many factors are extracted in the analysis in each of the years, considering the latent root criterion? Which eigenvalue(s) correspond to the factor(s) extracted each year, as well as the proportion(s) of variance shared by all the variables to form this(these) factor(s)?
(c) For each variable, what is the proportion of shared variance to form one(more) factor(s) each year? Did any alterations in the communalities of each variable occur from one year to the next?
(d) What are the expression(s) of the factor(s) extracted each year, based on the standardized variables? From one year to the next, did any alterations in the factor scores of the variables occur in each factor? Discuss the importance of developing a specific factor analysis each year in order to create indexes.
(e) Considering the principal factor extracted as a socioeconomic index, create a country ranking from this index for each one of the years. From one year to the next, were there any changes regarding the countries’ positions in the ranking?
(3) The general manager of a store, which belongs to a chain of drugstores, wishes to find out its consumers’ perception of eight attributes, which are described below:

Attribute (Variable)	Description
assortment	Perception of the variety of goods
replacement	Perception of the quality and speed of inventory replacement
layout	Perception of the store’s layout
comfort	Perception of thermal, acoustic, and visual comfort inside the store
cleanliness	Perception of the store’s general cleanliness
services	Perception of the quality of the services rendered
prices	Perception of the store’s prices compared to the competition
discounts	Perception of the store’s discount policy

In order to do that, he carried out a survey with 1700 clients at the store for some time. The questionnaire was structured based on groups of attributes, and each question corresponding to an attribute asked the consumer to assign a score from 0 to 10 depending on his or her perception of that attribute, 0 corresponded to an entirely negative perception, and 10, to the best perception possible. Since the store’s general manager is rather experienced, he decided, in advance, to gather the questions in three groups, such that, the complete questionnaire would be as follows:

Based on your perception, fill out the questionnaire below with scores from 0 to 10, in which 0 means that your perception is entirely negative in relation to a certain attribute, and 10, that your perception is the best possible	Score
Products and store environment
Please rate the store’s variety of goods on a scale of 0–10
Please rate the store’s quality and speed of inventory replacement on a scale of 0–10
Please rate the store’s layout on a scale of 0–10
Please rate the store’s thermal, acoustic, and visual comfort on a scale of 0–10
Please rate the store’s general cleanliness on a scale of 0–10
Services
Please rate the quality of the services rendered in our store on a scale of 0–10
Prices and discount policy
Please rate the store’s prices compared to the competition on a scale of 0–10
Please rate our discount policy on a scale of 0–10

Unlabelled Table

The complete dataset developed by the store’s general manager can be seen in the files DrugstorePerception.sav and DrugstorePerception.dta. We would like you to:

(a) Present the correlation matrix between each pair of variables. Based on the magnitude of the values of Pearson’s correlation coefficients, is it possible to identify any indication that the factor analysis may group the variables into factors?
(b) By using the result of Bartlett’s test of sphericity, is it possible to state, at a significance level of 0.05, that the principal component factor analysis is adequate?
(c) How many factors are extracted in the analysis considering the latent root criterion? Which eigenvalue(s) correspond to the factor(s) extracted, as well as to the proportion(s) of variance shared by all the variables to form this(these) factor(s)?
(d) What is the total percentage of variance loss of the original variables resulting from the extraction of the factor(s) based on the latent root criterion?
(e) For each variable, what are the loading and the proportion of shared variance to form the factor(s)?
(f) By demanding the extraction of three factors, to the detriment of the latent root criterion, and based on the new factor loadings, is it possible to confirm the construct of the questionnaire proposed by the store’s general manager? In other words, do the variables of each group in the questionnaire, in fact, end up showing greater sharing of variance with a common factor?
(g) Discuss the impact of the decision to extract three factors on the communality values?
(h) Construct a Varimax rotation and discuss it once again, based on the redistribution of the factor loadings, the construct initially proposed in the questionnaire by the store’s general manager.
(i) Present the 3D loading plot with the rotated factor loadings.

Appendix: Cronbach’s Alpha

A.1 Brief Presentation

The alpha statistic, proposed by Cronbach (1951), is a measure used to assess the internal consistency of the variables in a dataset, that is, it is a measure of the level of reliability with which a certain scale, adopted to define the original variables, produces consistent results about the relationship between these variables. According to Nunnally and Bernstein (1994), the level of reliability is defined from the behavior of the correlations between the original variables (or standardized), and, therefore, Cronbach’s alpha can be used to evaluate the reliability with which a factor can be extracted from variables, thus, being related to the factor analysis.

According to Rogers et al. (2002), even though Cronbach’s alpha is not the only existing measure of reliability, since it has constraints related to multidimensionality, that is, with the identification of multiple factors, it can be defined as the measure that makes it possible to assess the intensity with which a certain construct or factor is present in the original variables. Therefore, a dataset with variables that share a single factor tends to have a high Cronbach’s alpha.

Hence, Cronbach’s alpha cannot be used to assess the overall adequacy of the factor analysis, different from the KMO statistic and Bartlett’s test of sphericity, since its magnitude offers the researcher an indication only of the internal consistency of the scale used to extract a single factor. If its value is low, not even the first factor will be adequately extracted, main reason why some researchers choose to study the magnitude of Cronbach’s alpha before running the factor analysis, even though this decision is not a mandatory requisite for developing the technique.

Cronbach’s alpha can be defined by the following expression:

$α = \frac{k}{k - 1} \cdot [1 - \frac{\sum_{k} {Var}_{k}}{{Var}_{sum}}]$ $α = \frac{k}{k - 1} \cdot [1 - \frac{\sum_{k} {Var}_{k}}{{Var}_{sum}}]$

si134_e (12.41)

where:

Var_k is the variance of the kth variable, and
${Var}_{sum} = \frac{\sum_{i = 1}^{n} {(\sum_{k} X_{ki})}^{2} - \frac{{(\sum_{i = 1}^{n} \sum_{k} X_{ki})}^{2}}{n}}{n - 1}$ ${Var}_{sum} = \frac{\sum_{i = 1}^{n} {(\sum_{k} X_{ki})}^{2} - \frac{{(\sum_{i = 1}^{n} \sum_{k} X_{ki})}^{2}}{n}}{n - 1}$

(12.42)

which represents the variance of the sum of each row in the dataset, that is, the variance of the sum of the values corresponding to each observation. Besides, we know that n is the sample size, and k, the number of variables X.

So, we can state that, if consistencies in the variable values occur, the term Var_sum will be large enough in order for alpha (α) to tend to 1. On the other hand, variables that have low correlations, possibly due to the presence of random observation values, will make the term Var_sum go back to the sum of the variances of each variable (Var_k), which will make alpha (α) tend to 0.

Although there is no consensus in the existing literature about the value of alpha from which there is internal consistency of the variables in the dataset, it is interesting that the result obtained is greater than 0.6 when we apply exploratory techniques.

Next, we will discuss the calculation of Cronbach’s alpha for the data in the example used throughout this chapter.

A.2 Determining Cronbach’s Alpha Algebraically

From the standardized variables in the example studied throughout this chapter, we can construct Table 12.20, which helps us calculate Cronbach’s alpha.

Table 12.20

Procedure for Calculating Cronbach’s Alpha
Student	Zfinance_i	Zcosts_i	Zmarketing_i	Zactuarial_i	$\sum_{k = 4} X_{ki}$ $\sum_{k = 4} X_{ki}$	${(\sum_{k = 4} X_{ki})}^{2}$ ${(\sum_{k = 4} X_{ki})}^{2}$
Gabriela	− 0.011	− 0.290	− 1.650	0.273	− 1.679	2.817
Luiz Felipe	− 0.876	− 0.697	1.532	− 1.319	− 1.360	1.849
Patricia	− 0.876	− 0.290	− 0.590	− 0.523	− 2.278	5.191
Gustavo	1.334	1.337	0.825	1.069	4.564	20.832
Leticia	− 0.779	− 1.104	− 0.872	− 0.841	− 3.597	12.939
Ovidio	1.334	2.150	− 1.650	1.865	3.699	13.682
Leonor	− 0.267	0.116	0.825	− 0.125	0.549	0.301
Dalila	− 0.139	0.523	0.118	0.273	0.775	0.600
Antonio	0.021	− 0.290	− 0.590	− 0.523	− 1.382	1.909
⋮
Estela	0.982	0.113	− 1.297	1.069	0.868	0.753
Variance	1.000	1.000	1.000	1.000	${(\sum_{i = 1}^{100} \sum_{k = 4} X_{ki})}^{2} = 0$ ${(\sum_{i = 1}^{100} \sum_{k = 4} X_{ki})}^{2} = 0$	$\sum_{i = 1}^{100} {(\sum_{k = 4} X_{ki})}^{2} = 832.570$ $\sum_{i = 1}^{100} {(\sum_{k = 4} X_{ki})}^{2} = 832.570$

Table 12.20

Thus, based on Expression (12.42), we have:

${Var}_{sum} = \frac{832.570}{99} = 8.410$ ${Var}_{sum} = \frac{832.570}{99} = 8.410$

si136_e

and, by using Expression (12.41), we can calculate Cronbach’s alpha:

$α = \frac{4}{3} \cdot [1 - \frac{4}{8.410}] = 0.699$ $α = \frac{4}{3} \cdot [1 - \frac{4}{8.410}] = 0.699$

si137_e

We can consider this value acceptable for the internal consistency of the variables in our dataset. Nevertheless, as we will see when determining Cronbach’s alpha in SPSS and in Stata, there is a considerable loss of reliability because the original variables are not measuring the same factor, that is, the same dimension, since this statistic has constraints related to multidimensionality. That is, if we did not include the variable marketing when calculating Cronbach’s alpha, its value would be considerably higher, which indicates that this variable does not contribute to the construct, or to the first factor, formed by the other variables (finance, costs, and actuarial).

The complete spreadsheet with the calculation of Cronbach’s alpha can be found in the file AlphaCronbach.xls.

Analogous to what was done throughout this chapter, next, we will present the procedures for obtaining Cronbach’s alpha in SPSS and in Stata.

A.3 Determining Cronbach’s Alpha in SPSS

Once again, let’s use the file FactorGrades.sav. In order for us to determine Cronbach’s alpha based on the standardized variables, first, we must standardize them by using the Z-scores procedure. To do that, let’s click on Analyze → Descriptive Statistics → Descriptives …. When we select all the original variables, we must click on Save standardized values as variables. Although this specific procedure is not shown here, after clicking on OK, the standardized variables will be generated in the dataset itself.

After that, let’s click on Analyze → Scale → Reliability Analysis …. A dialog box will open. We must insert the standardized variables into Items, as shown in Fig. 12.49.

Fig. 12.49 Dialog box for determining Cronbach’s alpha in SPSS.

Next, in Statistics …, we must select the option Scale if item deleted, as shown in Fig. 12.50. This option calculates the different values of Cronbach’s alpha when each variable in the analysis is eliminated. The term item is often mentioned in Cronbach’s work (1951), and it is used as a synonym for variable.

Next, we can click on Continue and on OK.

Fig. 12.51 shows the result of Cronbach’s alpha, whose value is exactly the same as the one calculated through Expressions (12.41) and (12.42) and shown in the previous section.

Furthermore, Fig. 12.52 also shows, in the last column, Cronbach’s alpha values that would be obtained if a certain variable were excluded from the analysis. Therefore, we can see that the presence of the variable marketing contributes negatively to the identification of only one factor, because, as we know, this variable shows strong correlation with the second factor extracted by the principal component factor analysis elaborated throughout this chapter. Since Cronbach’s alpha is a one-dimensional measure of reliability, excluding the variable marketing would make its value get to 0.904.

Next, we will obtain the same outputs by using specific commands in Stata.

A.4 Determining Cronbach’s Alpha in Stata

Now, let’s open the file FactorGrades.dta. In order to calculate Cronbach’s alpha, we must type the following command:

alpha finance costs marketing actuarial, asis std

where the term std makes Cronbach’s alpha be calculated from the standardized variables, even if the original variables were considered in the command alpha.

The output generated can be seen in Fig. 12.53.

If researchers choose to obtain Cronbach’s alpha values when excluding each one of the variables, as what is done in SPSS, they may type the following command:

alpha finance costs marketing actuarial, asis std item

The new outputs are shown in Fig. 12.54, in which the values of the last column are exactly the same as the ones presented in Fig. 12.52, which corroborates the fact that the variables finance, costs, and actuarial show high internal consistency for determining a single factor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12.3 Principal Component Factor Analysis in SPSS

Create new playlist

Sign In

Sign Up

12.3 Principal Component Factor Analysis in SPSS

12.4 Principal Component Factor Analysis in Stata

12.5 Final Remarks

12.6 Exercises

Appendix: Cronbach’s Alpha

A.1 Brief Presentation

A.2 Determining Cronbach’s Alpha Algebraically

A.3 Determining Cronbach’s Alpha in SPSS

A.4 Determining Cronbach’s Alpha in Stata

Table of Contents for
12.3 Principal Component Factor Analysis in SPSS