8.6 The general linear model revisited

8.6.1 An informative prior for the general linear model

This section follows on from Section 6.7 on ‘The general linear model’ and like that section presumes a knowledge of matrix theory.

We suppose as in that section that

Unnumbered Display Equation

(where  is r-dimensional, so  is  ), but this time we take a non-trivial prior for  , namely

Unnumbered Display Equation

(where  is s-dimensional, so  is  ). If the hyperparameters are known, we may as well take r=s and  , and in practice dispense with  , but although for the moment we assume that  is known, in due course we shall let  have a distribution, and it will then be useful to allow other values for  .

Assuming that  ,  and  are known, the log of the posterior density is (up to an additive constant)

Unnumbered Display Equation

Differentiating with respect to the components of  we get a set of equations which can be written as one vector equation

Unnumbered Display Equation

Equating this to zero to find the mode of the posterior distribution, which by symmetry equals its mean, we get

Unnumbered Display Equation

so that

Unnumbered Display Equation

In particular, if  is taken as zero, so that the vector of prior means vanishes, then this takes the form

Unnumbered Display Equation

The usual least squares estimators reappear if  .

8.6.2 Ridge regression

This result is related to a technique which has become popular in recent years among classical statisticians which is known as ridge regression. This was originally developed by Hoerl and Kennard (1970), and a good account of it can be found in the article entitled ‘Ridge Regression’ in Kotz et al. (2006); alternatively, see Weisberg (2005, Section 11.2). Some further remarks about the connection between ridge regression and Bayesian analysis can be found in Rubin (1988).

What they pointed out was that the appropriate (least squares) point estimator for  was

Unnumbered Display Equation

From a classical standpoint, it then matters to find the variance–covariance matrix of this estimator in repeated sampling, which is easily shown to be

Unnumbered Display Equation

(since  ), so that the sum of the variances of the regression coefficients  is

Unnumbered Display Equation

(the trace of a matrix being defined as the sum of the elements down its main diagonal) and the mean square error in estimating θ is

Unnumbered Display Equation

However, there can be considerable problems in carrying out this analysis. It has been found that the least squares estimates are sometimes inflated in magnitude, sometimes have the wrong sign, and are sometimes unstable in that radical changes to their values can result from small changes or additions to the data. Evidently if  is large, so is the mean-square error, which we can summarize by saying that the poorer the conditioning of the  matrix, the worse the deficiencies referred to above are likely to be. The suggestion of Hoerl and Kennard was to add small positive quantities to the main diagonal, that is to replace  by  where k> 0, so obtaining the estimator

Unnumbered Display Equation

which we derived earlier from a Bayesian standpoint. On the other hand, Hoerl and Kennard have some rather ad hoc mechanisms for deciding on a suitable value for k.

8.6.3 A further stage to the general linear model

We now explore a genuinely hierarchical model. We supposed that  , or slightly more generally that

Unnumbered Display Equation

(see the description of the multivariate normal distribution in Appendix A). Further a priori  , or slightly more generally

Unnumbered Display Equation

At the next stage, we can suppose that our knowledge of  is vague, so that  . We can then find the marginal density of  as

Unnumbered Display Equation

on completing the square by taking  such that

Unnumbered Display Equation

that is

Unnumbered Display Equation

Since the second exponential is proportional to a normal density, it integrates to a constant and we can deduce that

Unnumbered Display Equation

that is  , where

Unnumbered Display Equation

We can then find the posterior distribution of  given  as

Unnumbered Display Equation

Again completing the square, it is easily seen that this posterior distribution is  where

Unnumbered Display Equation

8.6.4 The one way model

If we take the formulation of the general linear model much as we discussed it in Section 6.7, so that

Unnumbered Display Equation

we note that  . We assume that the xi are independent and have variance  so that  reduces to  and hence  The situation where we assume that the  are independently  fits into this situation if take  (an r-dimensional column vector of 1s, so that  while  is an  matrix with 1s everywhere) and have just one scalar hyperparameter μ of which we have vague prior knowledge. Then  reduces to  and  to  giving

Unnumbered Display Equation

and so  has diagonal elements ai+b and all off-diagonal elements equal to b, where

Unnumbered Display Equation

These are of course the same values we found in Section 8.5 earlier. It is, of course, also be possible to deduce the form of the posterior means found there from the approach used here.

8.6.5 Posterior variances of the estimators


Unnumbered Display Equation

and  (remember that b< 0) it is easily seen that

Unnumbered Display Equation

and hence

Unnumbered Display Equation

using the Sherman–Morrison formula for the inverse of  with  . [This result is easily established; in case of difficulty, refer to Miller (1987, Section 3) or Horn and Johnson (1991, Section 0.7.4).]  Consequently the posterior variance of  is

Unnumbered Display Equation

Now substituting  and  ,we see that

Unnumbered Display Equation

from which it follows that

Unnumbered Display Equation

We thus confirm that the incorporation of prior information has resulted in a reduction of the variance.

