Introduction

Two or more variables can relate to one another in several different ways. While one researcher may be interested in the study of the interrelationship between categorical (or nonmetric) variables, for example, in order to assess the existence of possible associations between its categories, another researcher may wish to create performance indicators (new variables) from the existence of correlations between the original metric variables. A third researcher may be interested in identifying homogeneous groups possibly formed from the existence of similarities in the variables between the observations of a certain dataset. In all of these situations, researchers may use multivariate exploratory techniques.

Multivariate exploratory techniques, also known as interdependence methods, can probably be used in all fields of human knowledge in which researchers aim to study the relationship between the variables of a certain dataset, without intending to estimate confirmatory models. That is, without having to elaborate inferences regarding the findings for other observations, different from the ones considered in the analysis itself, since neither models nor equations are estimated to predict data behavior. This characteristic is crucial to distinguish the techniques studied in Part V of this book from those considered to be dependence methods, such as, the simple and multiple regression models, binary and multinomial logistic regression models, and regression models for count data, all of them studied in Part VI.

Therefore, there is no definition of a predictor variable in exploratory models and, thus, their main objectives refer to the reduction or structural simplification of data, to the classification or clustering of observations and variables, to the investigation of the existence of correlation between metric variables, or association between categorical variables and between their categories, to the creation of performance rankings of observations from variables, and to the elaboration of perceptual maps. Exploratory techniques are considered extremely relevant for developing diagnostics regarding the behavior of the data being analyzed. Thus, their varied procedures are commonly adopted in a preliminary way, or even simultaneously, with the application of a certain confirmatory model.

Based on pedagogical and conceptual criteria, we have chosen to discuss the two main sets of existing multivariate exploratory techniques in Part V; therefore, the chapters are structured in the following way:

Chapter 11: Cluster Analysis

Chapter 12: Principal Component Factor Analysis

The decision about the technique to be used also goes through the measurement scale of the variables available in the dataset, which can be categorical or metric (or even binary, a special case of categorization). The type of question itself, when collecting the data, in some situations, may result in a categorical or metric response, which will favor the use of one or more techniques to the detriment of others. Hence, the clear, precise, and preliminary definition of the research objectives is essential to obtain variables in the measurement scale suitable for the application of a certain technique that will serve as a tool for achieving the objectives proposed.

While the cluster analysis techniques (Chapter 11), whose procedures can be hierarchical or nonhierarchical, are used when we wish to study similar behavior between the observations (individuals, companies, municipalities, countries, among other examples) regarding certain metric or binary variables and the possible existence of homogeneous clusters (cluster of observations), the principal component factor analysis (Chapter 12) can be chosen as the technique to be used when the main goal is the creation of new variables (factors, or cluster of variables) that capture the joint behavior of the original metric variables. Chapter 11 also presents the procedures for elaborating the multidimensional scaling technique in SPSS and in Stata. It can be considered a natural extension of the cluster analysis, and it has as its main objectives to determine the relative positions (coordinates) of each observation in the dataset and to construct two-dimensional charts in which these coordinates are plotted.

It is important to mention that even though they are not discussed in this book, correspondence analysis techniques are very useful when researchers intend to study possible associations between the variables and between their respective categories. While the simple correspondence analysis is applied to the study of the interdependence relationship between only two categorical variables, which characterizes it as a bivariate technique, the multiple correspondence analysis can be used for a larger number of categorical variables, being, in fact, a multivariate technique. For more details on correspondence analysis techniques, we recommend Fávero and Belfiore (2017).

Box V.1 shows the main objectives of each one of the exploratory techniques discussed in Part V.

Box V.1

Exploratory Techniques and Main Objectives

Exploratory TechniqueMeasurement ScaleMain Objectives
Cluster AnalysisHierarchicalMetric
or
Binary
Sorting and allocation of the observations into internally homogeneous groups and heterogeneous between one another.
Definition of an interesting number of groups.
NonhierarchicalMetric
or
Binary
Evaluation of the representativeness of each variable for the formation of a previously established number of groups.
From a predefined number of groups, identification of the allocation of each observation.
Principal Component Factor AnalysisMetricIdentification of the correlations between the original variables for creating factors that represent the combination of those variables (reduction or structural simplification).
Verification of the validity of previously established constructs.
Construction of rankings through the creation of performance indicators from the factors.
Extraction of orthogonal factors for future use in multivariate confirmatory techniques that require the absence of multicollinearity.

Unlabelled Table

Each chapter is structured according to the same presentation logic. First, we introduce the concepts regarding each technique, always followed by the algebraic solution of some practical exercises, from datasets elaborated primarily with a more educational focus. Next, the same exercises are solved in the statistical software packages IBM SPSS Statistics Software and Stata Statistical Software. We believe that this logic facilitates the study and understanding of the correct use of each of the techniques and the analysis of the results obtained. In addition to this, the practical application of the models in SPSS and Stata also offers benefits to researchers, because, at any given moment, the results can be compared to the ones already obtained algebraically in the initial sections of each chapter, besides providing an opportunity to use these important software packages. At the end of each chapter, additional exercises are proposed, whose answers, presented through the outputs generated in SPSS, are available at the end of the book.

References

Fávero L.P., Belfiore P. Manual de análise de dados: estatística e modelagem multivariada com Excel®, SPSS® e Stata®. Rio de Janeiro: Elsevier; 2017.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset