2.2 Normal prior and likelihood
2.2.1 Posterior from a normal prior and likelihood
We say that x is normal of mean θ and variance and write
when
Suppose that you have an unknown parameter θ for which your prior beliefs can be expressed in terms of a normal distribution, so that
and suppose also that you have an observation x which is normally distributed with mean equal to the parameter of interest, that is
where θ0, and are known. As mentioned in Section 1.3, there are often grounds for suspecting that an observation might be normally distributed, usually related to the Central Limit Theorem, so this assumption is not implausible. If these assumptions are valid
and hence
regarding as a function of θ.
It is now convenient to write
so that
and hence,
Adding into the exponent
which is constant as far as θ is concerned, we see that
from which it follows that as a density must integrate to unity
that is that the posterior density is
In terms of the precision, which we recall can be defined as the reciprocal of the variance, the relationship can be remembered as
(It should be noted that this relationship has been derived assuming a normal prior and a normal likelihood.)
The relation for the posterior mean, , is only slightly more complicated. We have
which can be remembered as
2.2.2 Example
According to Kennett and Ross (1983), the first apparently reliable datings for the age of Ennerdale granophyre were obtained from the K/Ar method (which depends on observing the relative proportions of potassium 40 and argon 40 in the rock) in the 1960s and early 1970s, and these resulted in an estimate of million years. Later in the 1970s, measurements based on the Rb/Sr method (depending on the relative proportions of rubidium 87 and strontium 87) gave an age of million years. It appears that the errors marked are meant to be standard deviations, and it seems plausible that the errors are normally distributed. If then a scientist S had the K/Ar measurements available in the early 1970s, it could be said that (before the Rb/Sr measurements came in), S’s prior beliefs about the age of these rocks were represented by
We could then suppose that the investigations using the Rb/Sr method result in a measurement
We shall suppose for simplicity that the precisions of these measurements are known to be exactly those quoted, although this is not quite true (methods which take more of the uncertainty into account will be discussed later in the book). If we then use the above method, then, noting that the observation x turned out to be 421, we see that S’s posterior beliefs about θ should be represented by
where (retaining only one significant figure)
Thus the posterior for the age of the rocks is
that is million years.
Of course, all this assumes that the K/Ar measurements were available. If the Rb/Sr measurements were considered by another scientist who had no knowledge of these, but had a vague idea (in the light of knowledge of similar rocks) that their age was likely to be million years, that is
then would have a posterior variance
and a posterior mean of
so that ’s posterior distribution is
that is million years. We note that this calculation has been carried out assuming that the prior information available is rather vague, and that this is reflected in the fact that the posterior is almost entirely determined by the data.
The situation can be summarized as follows:
We note that in numerical work, it is usually more meaningful to think in terms of the standard deviation , whereas in theoretical work it is usually easier to work in terms of the variance itself.
We see that after this single observation the ideas of S and about θ as represented by their posterior distributions are much closer than before, although they still differ considerably.
2.2.3 Predictive distribution
In the case discussed in this section, it is easy to find the predictive distribution, since
and, independently of one another,
from which it follows that
using the standard fact that the sum of independent normal variates has a normal distribution. (The fact that the mean is the sum of the means and the variance the sum of the variances is of course true more generally as proved in Section 1.4 on ‘Several Random Variables’.)
2.2.4 The nature of the assumptions made
Although this example is very simple, it does exhibit the main features of Bayesian inference as outlined in the previous section. We have assumed that the distribution of the observation x is known to be normal but that there is an unknown parameter θ, in this case the mean of the normal distribution. The assumption that the variance is known is unlikely to be fully justified in a practical example, but it may provide a reasonable approximation. You should, however, beware that it is all too easy to concentrate on the parameters of a well-known family, in this case the normal family, and to forget that the assumption that the density is in that family for any values of the parameters may not be valid. The fact that the normal distribution is easy to handle, as witnessed by the way that normal prior and normal likelihood combine to give normal posterior, is a good reason for looking for a normal model when it does provide a fair approximation, but there can easily be cases where it does not.