4.5. Comparison with Random Effects Models and GEE Estimation

As we saw in chapters 2 and 3, random effects models and GEE estimation are widely used alternatives to fixed effects methods for longitudinal data. Both methods can be applied to count data and are readily available in SAS. The principal attractions of these alternative methods are (1) the ability to estimate effects for time-invariant covariates, and (2) more efficient use of the data (if the assumptions are met). The major disadvantage is that neither method controls for unmeasured time-invariant covariates. I'll briefly describe these methods in this section, both to serve as a point of comparison with the fixed effects methods and because they will be needed for the hybrid method discussed in the next section.

As we've seen before, GEE is a form of iterated generalized least squares that allows for correlations among the repeated observations for each individual. GEE is easily invoked with the REPEATED statement in PROC GENMOD, and can be used with either a negative binomial model or a Poisson model. Here's the SAS code for GEE estimation of a negative binomial model for the patent data, with separate records for each firm-year:

PROC GENMOD DATA=patents2;
   CLASS id t;
   MODEL patent= rd_0-rd_5 t / D=NB;
   REPEATED SUBJECT=id / TYPE=MDEP(4) CORRW;
RUN;

The TYPE=MDEP(4) option specifies that the correlation matrix for patent counts among the five years of observation has a "banded" structure. There is one correlation for counts that are one year apart, another correlation for counts that are two years apart, and so on. The correlation for counts more than four years apart is set to 0 (hence the 4 in MDEP(4)), but four years is the maximum distance for these data anyway. This imposed structure can be seen in the estimated "Working Correlation Matrix," requested with the CORRW option and shown in Output 4.13. I also tried other correlation structures, but the TYPE=UN (for unstructured) could not be fitted with these data. The TYPE=EXCH (for exchangeable) specifies that all the inter-year correlations are identical. Although this specification yielded similar results, it seems unnecessarily restrictive.

Table 4.13. Output 4.13 GEE Estimates for a Negative Binomial Model
Working Correlation Matrix
 Col1Col2Col3Col4Col5
Row11.00000.75670.73490.66550.6909
Row20.75671.00000.75670.73490.6655
Row30.73490.75671.00000.75670.7349
Row40.66550.73490.75671.00000.7567
Row50.69090.66550.73490.75671.0000
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter EstimateStandard Error95% ConfidenceLimitsZPr>|Z|
Intercept 1.08390.08840.91061.257212.26<.0001
rd_0 0.49690.11310.27520.71864.39<.0001
rd_1 −0.04510.1162−0.27280.1826−0.390.6977
rd_2 0.16130.0855−0.00630.32891.890.0593
rd_3 0.07290.0944−0.11210.25790.770.4401
rd_4 0.13800.0735−0.00610.28211.880.0605
rd_5 0.02470.0544−0.08180.13130.450.6492
t10.23260.04970.13510.33014.68<.0001
t20.18250.04650.09140.27363.93<.0001
t30.18550.03830.11040.26064.84<.0001
t40.11690.04030.03800.19582.900.0037
t50.00000.00000.00000.0000..

Parameter estimates in Output 4.13 are roughly similar to those in Output 4.12 for the fixed effects negative binomial model. But unlike the fixed effects method, two of the lagged R & D measures have GEE coefficients that approach statistical significance. Interestingly, the standard errors for the GEE estimates are generally larger than those for the fixed effects method, which is the opposite of what would ordinarily be expected.

Random effects models can be fitted with PROC NLMIXED for either the Poisson or negative binomial distributions. Let's first consider a Poisson model. As before, we begin by assuming that yit has a Poisson distribution with expected value λit. As with the fixed effects model, we then assume that . Now, however, instead of treating αi as a set of fixed constants, we assume that it is a random variable, normally distributed with a mean of 0 and a variance σ2. We also assume that αi is independent of all measured variables in the model, and that the yit terms are independent of each other, conditional on i. Under these assumptions, NLMIXED produces maximum likelihood estimates of all parameters. Here's the code for the patent data:

PROC NLMIXED DATA=patents2;
lambda=EXP(int+brd0*rd_0+brd1*rd_1+brd2*rd_2+brd3*rd_3+brd4*rd_4+brd5*rd_5+d1*(t EQ 1)+d2*(t EQ 2)+d3*(t EQ 3)+
   d4*(t EQ 4)+alpha);
   MODEL PATENT~POISSON(lambda);
   RANDOM ALPHA~NORMAL(0,s2) SUBJECT=id;
   PARMS int=1 brd0=0 brd1=0 brd2=0 brd3=0 brd4=0 brd5=0 d1=0 d2=0 d3=0 d4=0 s2=1;
RUN;

The statement that begins with LAMBDA defines the expected patent count as a function of the explanatory variables. Note the inclusion of ALPHA, which is the random, firm-level effect. The MODEL statement says that patent counts have a Poisson distribution with parameter LAMBDA. The RANDOM statement declares that ALPHA has a normal distribution with a mean of 0 and variance of S2. This variance is assumed to be constant across firms and across time. Alternatively, it could be written as a function of other variables simply by including another assignment equation similar to the one for LAMBDA.

This model took about 19 seconds to estimate on my computer, as compared with about a quarter second for the GEE model with PROC GENMOD. Results are shown in Output 4.14. The coefficients are roughly similar to those we just saw with GEE estimation, but the standard errors are quite a bit smaller. This is probably because the GEE estimates presumed a negative binomial distribution, whereas the random effects model presumes a Poisson distribution, which allows for less overdispersion.

Table 4.14. Output 4.14 NLMIXED Output for a Random Effects Poisson Model
Fit Statistics
−2 Log Likelihood10410
AIC (smaller is better)10434
AICC (smaller is better)10435
BIC (smaller is better)10480
Parameter Estimates
ParameterEstimateStandard ErrorDFtValuePr>|t|AlphaLowerUpperGradient
int0.84600.0672932312.57<.00010.050.71360.9784−0.26972
brd00.47620.0422732311.26<.00010.050.39300.55930.043797
brd1−0.006840.04797323−0.140.88670.05−0.10120.087540.258257
brd20.13330.044733232.980.00310.050.045320.2213−0.08825
brd30.058250.041263231.410.15890.05−0.022910.13940.260459
brd40.025900.037613230.690.49160.05−0.048100.09989−0.02615
brd50.079110.031003232.550.01120.050.018120.14010.076259
d10.25200.0142232317.72<.00010.050.22400.27990.048431
d20.20530.0142232314.43<.00010.050.17730.2333−0.03654
d30.19620.0139432314.07<.00010.050.16870.22360.030349
d40.062180.013783234.51<.00010.050.035070.089290.006942
s20.81690.0758032310.78<.00010.050.66770.96600.149421

To get a fairer comparison, let's estimate a random effects negative binomial model. While this can also be done with PROC NLMIXED, it's a little tricky because the parameterization of the negative binomial distribution in NLMIXED is different from the one I've used here. NLMIXED labels the parameters N and p (Johnson and Kotz 1969) while I use λ and Θ. The functional relationship is N = Θ and p = q/ (λ+Θ). Here's how to set it up:

PROC NLMIXED DATA=patents2;
 lambda=EXP(int+brd0*rd_0+brd1*rd_1+brd2*rd_2+brd3*rd_3+brd4*rd_4
   +brd5*rd_5+d1*(t EQ 1)+d2*(t EQ 2)+d3*(t EQ 3)+
   d4*(t EQ 4)+alpha);
   MODEL patent~NEGBIN(theta,(theta/(lambda+theta)));
   RANDOM alpha~NORMAL(0,s2) SUBJECT=id;
   PARMS int=1 brd0=0 brd1=0 brd2=0 brd3=0 brd4=0 brd5=0 d1=0 d2=0 d3=0 d4=0 s2=1 theta=1;
RUN;

Results are shown in Output 4.15.

Table 4.15. Output 4.15 NLMIXED Output for a Random Effects Negative Binomial MODEL
Fit Statistics
−2 Log Likelihood9703.9
AIC (smaller is better)9729.9
AICC (smaller is better)9730.1
BIC (smaller is better)9779.0
Parameter Estimates
ParameterEstimateStandard ErrorDFtValuePr>|t|AlphaLowerUpperGradient
int0.70690.0696032310.16<.00010.050.56990.8438−0.01105
brd00.50210.062263238.06<.00010.050.37960.62450.024034
brd1−0.018350.07302323−0.250.80180.05−0.16200.12530.015229
brd20.12050.069233231.740.08280.05−0.015730.25670.026795
brd30.064030.064733230.990.32330.05−0.063310.19140.020925
brd40.10440.061423231.700.09010.05−0.016420.22520.057457
brd50.078230.047643231.640.10150.05−0.015480.17200.08812
d10.28020.0271932310.31<.00010.050.22680.3337−0.00773
d20.22440.027223238.24<.00010.050.17080.27790.032592
d30.20740.027023237.68<.00010.050.15420.2606−0.04431
d40.087090.026803233.250.00130.050.034360.13980.006565
s20.77200.0695632311.10<.00010.050.63510.90880.003151
theta30.27993.07013239.86<.00010.0524.240036.31990.000062

In Output 4.15, the coefficients are quite similar in magnitude to those in Output 4.14 for the Poisson model, but the standard errors are somewhat larger. These are about on par with those for the fixed effects negative binomial model in Output 4.12, but still not as large as those for the GEE estimates in Output 4.13. For this model, like the fixed effects model, the only significant R & D coefficient is for the contemporaneous year. A chi-square statistic for testing the Poisson random effects model versus the negative binomial random effects model can be obtained by calculating the difference in their −2 log-likehoods: 10410 – 9704 = 706. With 1 d.f., this chi-square is highly significant, implying a strong preference for the less restrictive negative binomial model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset