Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Introduction

The study of statistical distributions is not recent, and since the beginning of the 19th century, approximately up until the beginning of the 20th century, linear models that follow a normal distribution practically dominated the data-modeling scenario.

Nonetheless, since the period between both wars, models to represent the situations normal linear models could not satisfactorily represent started arising. McCullagh and Nelder (1989), Turkman and Silva (2000), and Cordeiro and Demetrio (2007) mention, in this context, Berkson’s (1944), Dyke and Patterson’s (1952) and Rasch’s (1960) work on the logistic models that involve the Bernoulli and binomial distributions; Birch’s (1963) work on the models for count data involving the Poisson distribution; Feigl and Zelen’s (1965), Zippin and Armitage’s (1966) and Glasser’s (1967) work on the exponential models; and Nelder’s (1966) work on polynomial models that include the Gamma distribution.

All of these models ended up being consolidated, from a theoretical and conceptual perspective, through Nelder and Wedderburn’s (1972) extremely important work, in which the Generalized Linear Models were defined. They represent a group of linear regression models and nonlinear exponential models, in which the dependent variable, for example, follows a normal, Bernoulli, binomial, Poisson, or a Poisson-Gamma distribution. The following models are special cases of Generalized Linear Models:

– Linear Regression Models and Models with Box-Cox Transformation;
– Binary and Multinomial Logistic Regression Models;
– Poisson and Binomial Negative Regression Models for Count Data;

and the estimation of each one of them must be done respecting the characteristics of the data and the distribution of the variable that represents the phenomenon we wish to study, called dependent variable.

A Generalized Linear Model is defined as follows:

$η_{i} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ $η_{i} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

(VI.1)

where η is known as a canonical link function, α represents the constant, β_j (j = 1, 2, ..., k) are the coefficients of each explanatory variable and correspond to the parameters to be estimated, X_j are the explanatory variables (metric or dummies), and the subscripts i represent each one of the observations of the sample being analyzed (i = 1, 2, ..., n, where n is the sample size).

Box VI.1 relates each specific case of the generalized linear models to the characteristic of the dependent variable, its distribution and the respective canonical link function.

Box VI.1

Generalized Linear Models, Characteristics of the Dependent Variable, and Canonical Link Functions

Regression Model	Characteristic of the Dependent Variable	Distribution	Canonical Link Function (η)
Linear	Quantitative	Normal	$\hat{Y}$ $\hat{Y}$
With Box-Cox Transformation	Quantitative	Normal after the Transformation	$\frac{{\hat{Y}}^{λ} - 1}{λ}$ $\frac{{\hat{Y}}^{λ} - 1}{λ}$
Binary Logistic	Qualitative with 2 Categories (Dummy)	Bernoulli	$ln (\frac{p}{1 - p})$ $ln (\frac{p}{1 - p})$
Multinomial Logistic	Qualitative with M (M > 2) Categories	Binomial	$ln (\frac{p_{m}}{1 - p_{m}})$ $ln (\frac{p_{m}}{1 - p_{m}})$
Poisson	Quantitative with Integer and Non-Negative Values (Count Data)	Poisson	ln(λ)
Negative Binomial	Quantitative with Integer and Non-Negative Values (Count Data)	Poisson-Gamma	ln(u)

Unlabelled Table

Therefore, for a given dependent variable Y that represents the phenomenon being studied (outcome variable), we can specify each one of the models presented in Box VI.1 in the following way:

Linear Regression Model

${\hat{Y}}_{i} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ ${\hat{Y}}_{i} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

(VI.2)

where Ŷ is the expected value of the dependent variable Y.

Regression Model with Box-Cox Transformation

$\frac{{\hat{Y}}_{i}^{λ} - 1}{λ} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ $\frac{{\hat{Y}}_{i}^{λ} - 1}{λ} = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

si7_e (VI.3)

where Ŷ is the expected value of the dependent variable Y and λ is the Box-Cox transformation parameter that maximizes the adherence to the normality of the distribution of the new variable generated from the original variable Y.

Binary Logistic Regression Model

$ln (\frac{p_{i}}{1 - p_{i}}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ $ln (\frac{p_{i}}{1 - p_{i}}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

si8_e (VI.4)

where p is the probability of the event of interest occurring, defined by Y = 1, given that the dependent variable Y is a dummy.

Multinomial Logistic Regression Model

$ln (\frac{p_{i_{m}}}{1 - p_{i_{m}}}) = α_{m} + β_{1 m} \cdot X_{1 i} + β_{2 m} \cdot X_{2 i} + \dots + β_{km} \cdot X_{ki}$ $ln (\frac{p_{i_{m}}}{1 - p_{i_{m}}}) = α_{m} + β_{1 m} \cdot X_{1 i} + β_{2 m} \cdot X_{2 i} + \dots + β_{km} \cdot X_{ki}$

si9_e (VI.5)

where p_m (m = 0, 1, ..., M – 1) is the probability of occurrence of each one of the M categories of the dependent variable Y.

Poisson Regression Model for Count Data

$ln (λ_{i}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ $ln (λ_{i}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

(VI.6)

where λ is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson distribution.

Negative Binomial Regression Model for Count Data

$ln (u_{i}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$ $ln (u_{i}) = α + β_{1} \cdot X_{1 i} + β_{2} \cdot X_{2 i} + \dots + β_{k} \cdot X_{ki}$

(VI.7)

where u is the expected value of the number of occurrences of the phenomenon represented by the dependent variable Y, which presents count data with a Poisson-Gamma distribution.

Thus, Part VI discusses the Generalized Linear Models. While Chapter 13 discusses the linear regression models and the models with Box-Cox transformation, Chapters 14 and 15 discuss the binary and multinomial logistic regression models and the Poisson and negative binomial regression models for count data, respectively, which are nonlinear exponential models, also called log-linear or semilogarithmic (to the left) models. Fig. VI.1 represents this logic.

Fig. VI.1 Generalized Linear Models and Structure of the Chapters in Part VI.

The chapters in Part VI are structured in the same presentation logic, in which, initially, the concepts regarding each model and the criteria for estimating its parameters are presented, always using datasets that allow us to solve practical exercises in Excel. After that, the same exercises are solved, step by step, in Stata and in SPSS. At the end of each chapter, additional exercises are proposed, whose answers are available at the end of the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction