Introduction

The study of statistical distributions is not recent, and since the beginning of the 19th century, approximately up until the beginning of the 20th century, linear models that follow a normal distribution practically dominated the data-modeling scenario.

Nonetheless, since the period between both wars, models to represent the situations normal linear models could not satisfactorily represent started arising. McCullagh and Nelder (1989), Turkman and Silva (2000), and Cordeiro and Demetrio (2007) mention, in this context, Berkson’s (1944), Dyke and Patterson’s (1952) and Rasch’s (1960) work on the logistic models that involve the Bernoulli and binomial distributions; Birch’s (1963) work on the models for count data involving the Poisson distribution; Feigl and Zelen’s (1965), Zippin and Armitage’s (1966) and Glasser’s (1967) work on the exponential models; and Nelder’s (1966) work on polynomial models that include the Gamma distribution.

All of these models ended up being consolidated, from a theoretical and conceptual perspective, through Nelder and Wedderburn’s (1972) extremely important work, in which the Generalized Linear Models were defined. They represent a group of linear regression models and nonlinear exponential models, in which the dependent variable, for example, follows a normal, Bernoulli, binomial, Poisson, or a Poisson-Gamma distribution. The following models are special cases of Generalized Linear Models:

  •  Linear Regression Models and Models with Box-Cox Transformation;
  •  Binary and Multinomial Logistic Regression Models;
  •  Poisson and Binomial Negative Regression Models for Count Data;

and the estimation of each one of them must be done respecting the characteristics of the data and the distribution of the variable that represents the phenomenon we wish to study, called dependent variable.

A Generalized Linear Model is defined as follows:

ηi=α+β1X1i+β2X2i++βkXki

si5_e  (VI.1)

where η is known as a canonical link function, α represents the constant, βj (j = 1, 2, ..., k) are the coefficients of each explanatory variable and correspond to the parameters to be estimated, Xj are the explanatory variables (metric or dummies), and the subscripts i represent each one of the observations of the sample being analyzed (i = 1, 2, ..., n, where n is the sample size).

Box VI.1 relates each specific case of the generalized linear models to the characteristic of the dependent variable, its distribution and the respective canonical link function.

Box VI.1

Generalized Linear Models, Characteristics of the Dependent Variable, and Canonical Link Functions

Regression ModelCharacteristic of the Dependent VariableDistributionCanonical Link Function (η)
LinearQuantitativeNormalYˆsi1_e
With Box-Cox TransformationQuantitativeNormal after the TransformationYˆλ1λsi2_e
Binary LogisticQualitative with 2 Categories (Dummy)Bernoullilnp1psi3_e
Multinomial LogisticQualitative with M (M > 2) CategoriesBinomiallnpm1pmsi4_e
PoissonQuantitative with Integer and Non-Negative Values (Count Data)Poissonln(λ)
Negative BinomialQuantitative with Integer and Non-Negative Values (Count Data)Poisson-Gammaln(u)

Unlabelled Table

Therefore, for a given dependent variable Y that represents the phenomenon being studied (outcome variable), we can specify each one of the models presented in Box VI.1 in the following way:

  • Linear Regression Model

Yˆi=α+β1X1i+β2X2i++βkXki

si6_e  (VI.2)

where Ŷ is the expected value of the dependent variable Y.

  • Regression Model with Box-Cox Transformation

Yˆiλ1λ=α+β1X1i+β2X2i++βkXki

si7_e  (VI.3)

where Ŷ is the expected value of the dependent variable Y and λ is the Box-Cox transformation parameter that maximizes the adherence to the normality of the distribution of the new variable generated from the original variable Y.

  • Binary Logistic Regression Model

lnpi1pi=α+β1X1i+β2X2i++βkXki

si8_e  (VI.4)

where p is the probability of the event of interest occurring, defined by Y = 1, given that the dependent variable Y is a dummy.

  • Multinomial Logistic Regression Model

lnpim1pim=αm+β1mX1i+β2mX2i++βkmXki

si9_e  (VI.5)

where pm (m = 0, 1, ..., M – 1) is the probability of occurrence of each one of the M categories of the dependent variable Y.

  • Poisson Regression Model for Count Data

lnλi=α+β1X1i+β2X2i++βkXki

si10_e  (VI.6)

where λ is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson distribution.

  • Negative Binomial Regression Model for Count Data

lnui=α+β1X1i+β2X2i++βkXki

si11_e  (VI.7)

where u is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson-Gamma distribution.

Thus, Part VI discusses the Generalized Linear Models. While Chapter 13 discusses the linear regression models and the models with Box-Cox transformation, Chapters 14 and 15 discuss the binary and multinomial logistic regression models and the Poisson and negative binomial regression models for count data, respectively, which are nonlinear exponential models, also called log-linear or semilogarithmic (to the left) models. Fig. VI.1 represents this logic.

Fig. VI.1
Fig. VI.1 Generalized Linear Models and Structure of the Chapters in Part VI.

The chapters in Part VI are structured in the same presentation logic, in which, initially, the concepts regarding each model and the criteria for estimating its parameters are presented, always using datasets that allow us to solve practical exercises in Excel. After that, the same exercises are solved, step by step, in Stata and in SPSS. At the end of each chapter, additional exercises are proposed, whose answers are available at the end of the book.

References

Berkson J. Application of the logistic function to bioassay. J. Am. Stat. Assoc. 1944;39(227):357–365.

Birch M.W. Maximum likelihood in three-way contingency tables. J. Roy. Stat. Soc. Ser. B. 1963;25(1):220–233.

Cordeiro G.M., Demétrio C.G.B. Modelos lineares generalizados. Santa Maria: SEAGRO e Rbras; 2007.

Dyke G.V., Patterson H.D. Analysis of factorial arrangements when the data are proportions. Biometrics. 1952;8(1):1–12.

Feigl P., Zelen M. Estimation of exponential survival probabilities with concomitant information. Biometrics. 1965;21(4):826–838.

Glasser M. Exponential survival with covariance. J. Am. Stat. Assoc. 1967;62(318):561–568.

McCullagh P., Nelder J.A. Generalized Linear Models. 2nd ed. London: Chapman & Hall; 1989.

Nelder J.A. Inverse polynomials, a useful group of multi-factor response functions. Biometrics. 1966;22(1):128–141.

Nelder J.A., Wedderburn R.W.M. Generalized linear models. J. Roy. Stat. Soc. Ser. A. 1972;135(3):370–384.

Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Paedagogike Institut; 1960.

Turkman M.A.A., Silva G.L. Modelos lineares generalizados: da teoria à prática. Lisboa: Edições Spe; 2000.

Zippin C., Armitage P. Use of concomitant variables and incomplete survival information in the estimation of an exponential survival parameter. Biometrics. 1966;22(4):665–672.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset