4.1. Introduction

Many response variables are counts of something: number of articles published by scientists, number of sex partners in the last year, number of arrests in a one-year period, number of students enrolled in a class, and so on. Some data analysts still treat count variables as continuous measures and apply ordinary linear regression. But that practice ignores two facts: the data are really discrete, and the distributions of count variables are typically highly skewed. For these reasons, it may be inappropriate to use models that assume normally distributed errors.

Nowadays, it's becoming increasingly popular to estimate Poisson regression models or negative binomial regression models, both of which are explicitly designed to model count data. In this chapter we'll see how to extend these count data methods to handle multiple observations per individual, with the inclusion of fixed effects to control for all stable predictor variables. Along the way, we'll revisit many of the issues that arose for dichotomous outcomes in chapter 3, although the problems encountered there turn out to be less serious for count data models.

Let's begin by describing the example that will carry us through the chapter. The data consist of 346 manufacturing firms with yearly counts of patents received in each of the years from 1975 through 1979. These data were previously analyzed by Hall, Griliches, and Hausman (1986) and later by Cameron and Trivedi (1998). There is one record per firm, with variables PAT75 through PAT79 containing the patent counts for the five years. As predictors we have the logarithm of research and development expenditures for each year from 1970 through 1979 (LOGR70 through LOGR79). There are also two time-invariant predictors: LOGSIZE, which is the natural logarithm of the book value of the firm in 1972, and SCIENCE, an indicator variable equal to 1 if the firm is in the science sector, and otherwise equal to 0.

