The study of statistical distributions is not recent, and since the beginning of the 19th century, approximately up until the beginning of the 20th century, linear models that follow a normal distribution practically dominated the data-modeling scenario.
Nonetheless, since the period between both wars, models to represent the situations normal linear models could not satisfactorily represent started arising. McCullagh and Nelder (1989), Turkman and Silva (2000), and Cordeiro and Demetrio (2007) mention, in this context, Berkson’s (1944), Dyke and Patterson’s (1952) and Rasch’s (1960) work on the logistic models that involve the Bernoulli and binomial distributions; Birch’s (1963) work on the models for count data involving the Poisson distribution; Feigl and Zelen’s (1965), Zippin and Armitage’s (1966) and Glasser’s (1967) work on the exponential models; and Nelder’s (1966) work on polynomial models that include the Gamma distribution.
All of these models ended up being consolidated, from a theoretical and conceptual perspective, through Nelder and Wedderburn’s (1972) extremely important work, in which the Generalized Linear Models were defined. They represent a group of linear regression models and nonlinear exponential models, in which the dependent variable, for example, follows a normal, Bernoulli, binomial, Poisson, or a Poisson-Gamma distribution. The following models are special cases of Generalized Linear Models:
and the estimation of each one of them must be done respecting the characteristics of the data and the distribution of the variable that represents the phenomenon we wish to study, called dependent variable.
A Generalized Linear Model is defined as follows:
where η is known as a canonical link function, α represents the constant, βj (j = 1, 2, ..., k) are the coefficients of each explanatory variable and correspond to the parameters to be estimated, Xj are the explanatory variables (metric or dummies), and the subscripts i represent each one of the observations of the sample being analyzed (i = 1, 2, ..., n, where n is the sample size).
Box VI.1 relates each specific case of the generalized linear models to the characteristic of the dependent variable, its distribution and the respective canonical link function.
Therefore, for a given dependent variable Y that represents the phenomenon being studied (outcome variable), we can specify each one of the models presented in Box VI.1 in the following way:
where Ŷ is the expected value of the dependent variable Y.
where Ŷ is the expected value of the dependent variable Y and λ is the Box-Cox transformation parameter that maximizes the adherence to the normality of the distribution of the new variable generated from the original variable Y.
where p is the probability of the event of interest occurring, defined by Y = 1, given that the dependent variable Y is a dummy.
where pm (m = 0, 1, ..., M – 1) is the probability of occurrence of each one of the M categories of the dependent variable Y.
where λ is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson distribution.
where u is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson-Gamma distribution.
Thus, Part VI discusses the Generalized Linear Models. While Chapter 13 discusses the linear regression models and the models with Box-Cox transformation, Chapters 14 and 15 discuss the binary and multinomial logistic regression models and the Poisson and negative binomial regression models for count data, respectively, which are nonlinear exponential models, also called log-linear or semilogarithmic (to the left) models. Fig. VI.1 represents this logic.
The chapters in Part VI are structured in the same presentation logic, in which, initially, the concepts regarding each model and the criteria for estimating its parameters are presented, always using datasets that allow us to solve practical exercises in Excel. After that, the same exercises are solved, step by step, in Stata and in SPSS. At the end of each chapter, additional exercises are proposed, whose answers are available at the end of the book.