9.2 The EM algorithm

9.2.1 The idea of the EM algorithm

A useful numerical technique which finds the posterior mode, that is, the value at which  or equivalently  is a maximum, but does not provide full information on the posterior distribution is the EM algorithm. We can exemplify this by an example on genetic linkage due to Rao (1973, Section 5g) quoted by Dempster et al. (1977), by Gelfand and Smith (1990) and by Tanner (1996, Section 4.1). We have observations  with cell probabilities

Unnumbered Display Equation

and we want to estimate η. The likelihood is then

Unnumbered Display Equation

What we do is to augment the data  by adding further data  to produce augmented data  . It should be noted that to a considerable extent the distinction between parameters of a model and augmentations to the data is artificial (i.e. fictitious augmentations to the data can be regarded as parameters of a model). In the example under consideration, the augmentation consists simply in splitting the first cell into two cells with probabilities  and  . The advantage of this is that the likelihood then takes the much simpler form

Unnumbered Display Equation

and if we take the standard reference prior Be(0, 0), then the posterior is a beta distribution

Unnumbered Display Equation

Not much in life is free, and in this case we have to pay for the greater simplicity of the model by estimating the split of x1 into y0 and y1. The EM algorithm for finding the posterior mode is an iterative method starting from some plausible guess  for the value of η. At stage t, we suppose that the current guess is  . Each stage has two steps. In the first, the E-step (expectation step), we compute

Unnumbered Display Equation

that is, the expectation of the log-likelihood function, the expectation being computed at  , so that

Unnumbered Display Equation

In the second step, the M-step (the maximization step), we we find that value  of η which maximizes  . In this particular example as yi=xi for i> 1

Unnumbered Display Equation

For the M-step, we note that

Unnumbered Display Equation

and equating this to zero we deduce that

Unnumbered Display Equation

Since y1 has a binomial  distribution with

Unnumbered Display Equation

so that

Unnumbered Display Equation

the iteration becomes

Unnumbered Display Equation

The values actually observed were x1=125, x2=18, x3=20, x4=34. We can then estimate η by iteration starting from, for example,  and using

Unnumbered Display Equation

In fact, the iteration will converge to the positive root of  which is 0.630.

9.2.2 Why the EM algorithm works

We will first show that  increases with t and, presuming that  converges to some limit  , then show that  at  so that  will normally be the posterior mode.

From the obvious equation  considered conditional on  , it follows, on noting that  includes  and hence  , that

Unnumbered Display Equation

Multiplying both sides by the density  and integrating (noting that the left hand side does not depend on  ), we get for any fixed

Unnumbered Display Equation

(we note that K clearly does not depend on η). Taking  , we get

Unnumbered Display Equation

Now  because of the way in which we chose  . Moreover,

Unnumbered Display Equation

where

Unnumbered Display Equation

Because (as is easily seen by differentiation)  takes a minimum value of 0 when u=1, we see that  for all t. Consequently,

Unnumbered Display Equation

so that

Unnumbered Display Equation

We now show that the limit is a stationary point. As

Unnumbered Display Equation

we see that provided Q is a reasonably smooth function

Unnumbered Display Equation

Further

Unnumbered Display Equation

and in particular taking

Unnumbered Display Equation

Since  for any fixed  , and in particular for  , it now follows that

Unnumbered Display Equation

so that  is a stationary point, which will usually be the posterior mode.

More information about the EM algorithm can be found in Dempster et al. (1977) or Tanner (1996).

9.2.3 Semi-conjugate prior with a normal likelihood

Suppose that we have independent normally distributed observations x1, x2, … ,  . We use the notation

Unnumbered Display Equation

In Section 2.13, we noted that the conjugate joint prior  for the normal mean and variance is not a product of a function of θ and  , so that θ and  are not independent a priori. Nevertheless, an assumption of prior independence would seem appropriate for many problems, and so we are sometimes led to consider a situation in which

Unnumbered Display Equation

independently of one another. Such a prior is described as semi-conjugate by Gelman et al. (2004, Section 3.4) and as conditionally conjugate by O’Hagan (2004, Section 6.33). The latter term arises because conditional on knowledge of θ, it is a conjugate prior for  and vice versa.

With such a prior, we know that, conditional on knowledge of  , the posterior of θ is

Unnumbered Display Equation

where

Unnumbered Display Equation

(cf. Section 2.3) whereas, conditional on knowledge of θ, the posterior of  is

Unnumbered Display Equation

where

Unnumbered Display Equation

(cf. Section 2.7).

In such a case, we can use the EM algorithm to estimate the posterior mode of θ, effectively augmenting the data with the variance  . We begin by observing that (ignoring terms which do not depend on θ) the log posterior density is

Unnumbered Display Equation

To carry out the E-step we first note that since

Unnumbered Display Equation

and the expectation of a chi-squared variate is equal to its number of degrees of freedom, we know that

Unnumbered Display Equation

where  and  . For the M-step we must find that value  which maximizes  . But  is of the form of the log of the posterior density we would have got for θ had we had the same prior ( ) and observations  with mean θ and known variance S1/n1. Now it follows from Section 2.3 on ‘Several normal observations with a normal prior’ that the posterior mean of θ in this case is at

Unnumbered Display Equation

(the right-hand side depends on  through S1), and so we see that this is the value of θ giving the required maximum.

To illustrate this case, we return to the data on wheat yield considered in the example towards the end of Section 2.13 in which n=12,  and S=13 045. We will take the prior for the variance considered in that Section, namely,  with S0=2700 and  . For the mean, we will take a prior which is  where  and  , which approximates well to the values for the marginal distribution in that Section (according to which  ). The resulting marginal densities are nearly the same, but because we now assume prior independence the joint distribution is different.

It seems that a reasonable starting point for the iteration is  . Then we get  ,  , and  thereafter.

9.2.4 The EM algorithm for the hierarchical normal model

The EM algorithm is well suited to the analysis of hierarchical models and we can see this by examining the important case of the hierarchical normal model. Although the formulae we shall derive do not look pretty, they are very easy to use. Suppose that

Unnumbered Display Equation

with  , where

Unnumbered Display Equation

as in Section 8.5 and that we wish to estimate the hyperparameters μ,  and  . In this case  takes the place of η in the example on genetic linkage, while we augment the data by  to give augmented data  . We can use a reference prior for  , but (as mentioned in Section 8.5) this will not work for  . Accordingly, we will adopt the prior

Unnumbered Display Equation

It is then easily seen that (up to an additive constant)

Unnumbered Display Equation

To carry out the E-step, observe that conditional on the value of  the parameters  where

Unnumbered Display Equation

using the notation introduced in Section 6.5 for averaging over a suffix (cf. Section 2.3). We can now see that

Unnumbered Display Equation

and similarly

Unnumbered Display Equation

so that

Unnumbered Display Equation

It is now clear that the M-step gives

Unnumbered Display Equation

9.2.5 A particular case of the hierarchical normal model

Gelman et al. (2004, Table 11.2) quote the following data from Box et al. (1978, Table 6.1) on the coagulation time in seconds for blood drawn from N=24 animals randomly allocated to four different diets:

Unnumbered Display Equation

The data has been slightly adjusted, so that the averages come out to be whole numbers. It should perhaps be noted that the prior adopted in this and the preceding subsections differs slightly from that adopted by Gelman et al. The within-groups sum of squares is  and the overall mean is  . It, therefore, seems reasonable to start iterations from  and  . As for  , we can take

Unnumbered Display Equation

This results in

Unnumbered Display Equation

Similarly, we get  ,  ,  ,  ,  and  .

We can now feed these values in and commence the iteration, getting  ,  and  . Continuing the iteration, we get rapid convergence to  ,  ,  ,  ,  ,  and  .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset