3.10 Approximations based on the likelihood

3.10.1 Maximum likelihood

Suppose, as usual, that we have independent observations  whose distribution depends on an unknown parameter θ about which we want to make inferences. Sometimes it is useful to quote the posterior mode, that is, that value of θ at which the posterior density is a maximum, as a single number giving some idea of the location of the posterior distribution of θ; it could be regarded as the ultimate limit of the idea of an HDR. However, some Bayesians are opposed to the use of any single number in this way [see Box and Tiao (1992, Section A5.6)].

If the likelihood dominates the prior, the posterior mode will occur very close to the point  at which the likelihood is a maximum. Use of  is known as the method of maximum likelihood and is originally due to Fisher (1922). One notable point about maximum likelihood estimators is that if  is any function of θ then it is easily seen that

Unnumbered Display Equation

because the point at which  is a maximum is not affected by how it is labelled. This invariance is not true of the exact position of the maximum of the posterior, nor indeed of HDRs, because these are affected by the factor  .

You should note that the maximum likelihood estimator is often found by the Newton–Raphson method. Suppose that the likelihood is  and that its logarithm (in which it is often easier to work) is  . In order to simplify the notation, we may sometimes omit explicit reference to the data and write  for  . We seek  such that

Unnumbered Display Equation

or equivalently that it satisfies the so-called likelihood equation

Unnumbered Display Equation

so that the score vanishes.

3.10.2 Iterative methods

If  is an approximation to  then using Taylor’s Theorem

Unnumbered Display Equation

where  is between  and  . In most cases,  will not differ much from  and neither will differ much from its expectation over  . However,

Unnumbered Display Equation

where  is Fisher’s information which was introduced earlier in Section 3.3 in connection with Jeffreys’ rule. We note that, although  does depend on the value  observed, the information  depends on the distribution of the random variable  rather than on the value  observed on this particular occasion, and to this extent the notation, good though it is for other purposes, is misleading. However, the value of  does depend on  , because  does.

It follows that as  the value of  tends to  , so that a better approximation than  will usually be provided by either of

Unnumbered Display Equation

the Newton–Raphson method, or by

Unnumbered Display Equation

the method of scoring for parameters. The latter method was first published in a paper by Fisher (1925a).

It has been shown by Kale (1961) that the method of scoring will usually be the quicker process for large n unless high accuracy is ultimately required. In perverse cases both methods can fail to converge or can converge to a root which does not give the absolute maximum.

3.10.3 Approximation to the posterior density

We can also observe that, since  , in the neighbourhood of

Unnumbered Display Equation

so that approximately

Unnumbered Display Equation

Hence, the likelihood is approximately proportional to an  density, and so approximately to an  density. We can thus construct approximate HDRs by using this approximation to the likelihood and assuming that the likelihood dominates the prior.

3.10.4 Examples

Normal variance. For the normal variance (with known mean θ)

Unnumbered Display Equation

where  , so that

Unnumbered Display Equation

In this case, the likelihood equation is solved without recourse to iteration to give

Unnumbered Display Equation

Further

Unnumbered Display Equation

Alternatively

Unnumbered Display Equation

and as  , so that  , we have

Unnumbered Display Equation

Of course, there is no need to use an iterative method to find  in this case, but the difference between the formulae for  and  is illustrative of the extent to which the Newton–Raphson method and the method of scoring differ from one another. The results suggest that we approximate the posterior distribution of  [which we found to be  if we took a conjugate prior] by

Unnumbered Display Equation

With the data we considered in Section 2.8 on HDRs for the normal variance, we had n = 20 and S = 664, so that 2S2/n3=110.224. The approximation would suggest a 95% HDR between  , that is the interval (13, 54) as opposed to the interval (19, 67) which was found in Section 2.8.

This example is deceptively simple – the method is of greatest use when analytic solutions are difficult or impossible. Further, the accuracy is greater when sample sizes are larger.

Poisson distribution. We can get another deceptively simple example by supposing that  is an n-sample from  and that  , so that (as shown in Section 3.4)

Unnumbered Display Equation

and the likelihood equation is again solved without iteration, this time giving  . Further

Unnumbered Display Equation

and  . This suggests that we can approximate the posterior of λ (which we found to be  if we took a conjugate prior) by

Unnumbered Display Equation

Cauchy distribution. Suppose  is an n-sample from C , so that

Unnumbered Display Equation

It is easily seen that

Unnumbered Display Equation

On substituting  and using standard reduction formulae, it follows that

Unnumbered Display Equation

from which it can be seen that successive approximations to  can be found using the method of scoring by setting

Unnumbered Display Equation

The iteration could, for example, be started from the sample median, that is, the observation which is in the middle when they are arranged in increasing order. For small n the iteration may not converge, or may converge to the wrong answer (see Barnett, 1966), but the process usually behaves satisfactorily.

Real life data from a Cauchy distribution are rarely encountered, but the following values are simulated from a C distribution (the value of θ being, in fact, 0):

Unnumbered Display Equation

The sample median of the n = 9 values is 0.397. If we take this as our first approximation  to  , then

Unnumbered Display Equation

and all subsequent  equal 0.179 which is, in fact, the correct value of  . Since  , an approximate 95% HDR for θ is  , that is the interval (–0.74, 1.10). This does include the true value, which we happen to know is 0, but of course the value of n has been chosen unrealistically small in order to illustrate the method without too much calculation.

It would also be possible in this case to carry out an iteration based on the Newton–Raphson method

Unnumbered Display Equation

using the above formula for  , but as explained earlier, it is in general better to use the method of scoring.

3.10.5 Extension to more than one parameter

If we have two parameters, say θ and  , which are both unknown, a similar argument shows that the maximum likelihood occurs at  , where

Unnumbered Display Equation

Similarly, if  is an approximation, a better one is  , where

Unnumbered Display Equation

where the derivatives are evaluated at  and the matrix of second derivatives can be replaced by its expectation, which is minus the information matrix as defined in Section 3.3 on Jeffreys’ rule.

Further, the likelihood and hence the posterior can be approximated by a bivariate normal distribution of mean  and variance–covariance matrix whose inverse is equal to minus the matrix of second derivatives (or the information matrix) evaluated at  .

All of this extends in an obvious way to the case of more than two unknown parameters.

3.10.6 Example

We shall consider only one, very simple, case, that of a normal distribution of unknown mean and variance. In this case,

Unnumbered Display Equation

where  , so that

Unnumbered Display Equation

Further, it is easily seen that

Unnumbered Display Equation

which at  reduces to

Unnumbered Display Equation

Because the off-diagonal elements vanish, the posteriors for θ and  are approximately independent. Further, we see that approximately

Unnumbered Display Equation

In fact, we found in Section 2.12 on normal mean and variance both unknown that with standard reference priors, the posterior for θ and  is a normal/chi-squared distribution and the marginals are such that

Unnumbered Display Equation

which implies that the means and variances are

Unnumbered Display Equation

This shows that for large n the approximation is indeed valid.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset