2.10 Conjugate prior distributions

2.10.1 Definition and difficulties

When the normal variance was first mentioned, it was stated said that it helps if the prior is of such that the posterior is of a ‘nice’ form, and this led to the suggestion that if a reasonable approximation to your prior beliefs could be managed by using (a multiple of) an inverse chi-squared distribution, it would be sensible to employ this distribution. It is this thought which leads to the notion of conjugate families. The usual definition adopted is as follows:

Let l be a likelihood function  . A class Π of prior distributions is said to form a conjugate family if the posterior density

Unnumbered Display Equation

is in the class Π for all X whenever the prior density is in Π.

There is actually a difficulty with this definition, as was pointed out by Diaconis and Ylvisaker (1979 and 1985). If Π is a conjugate family and  is any fixed function, then the family  of densities proportional to  for  is also a conjugate family. While this is a logical difficulty, we are in practice only interested in ‘natural’ families of distributions which are at least simply related to the standard families that are tabulated. In fact, there is a more precise definition available when we restrict ourselves to the exponential family (discussed in Section 2.11), and there are not many cases discussed in this book that are not covered by that definition. Nevertheless, the usual definition gives the idea well enough.

2.10.2 Examples

Normal mean. In the case of several normal observations of known variance with a normal prior for the mean (discussed in Section 2.3), where

Unnumbered Display Equation

we showed that if the prior  is  then the posterior  is  for suitable  and  . Consequently if Π is the class of all normal distributions, then the posterior is in Π for all X whenever the prior is in Π. Note, however, that it would not do to let Π be the class of all normal distributions with any mean but fixed variance (at least unless we regard the sample size as fixed once and for all); Π must in some sense be ‘large enough.’

Normal variance. In the case of the normal variance, where

Unnumbered Display Equation

we showed that if the prior  is  then the posterior  is  . Consequently, if Π is the class of distributions of constant multiples of inverse chi-squares, then the posterior is in Π whenever the prior is. Again, it is necessary to take Π as a two-parameter rather than a one-parameter family.

Poisson distribution. Suppose  is a sample from the Poisson distribution of mean λ, that is  . Then as we noted in the last section

Unnumbered Display Equation

where  . If λ has a prior distribution of the form

Unnumbered Display Equation

that is  so that λ is a multiple of a chi-squared random variable, then the posterior is

Unnumbered Display Equation

Consequently, if Π is the class distributions of constant multiples of chi-squared random variables, then the posterior is in Π whenever the prior is. There are three points to be drawn to your attention. Firstly, this family is closely related to, but different from, the conjugate family in the previous example. Secondly, the conjugate family consists of a family of continuous distributions although the observations are discrete; the point is that this discrete distribution depends on a continuous parameter. Thirdly, the conjugate family in this case is usually referred to in terms of the gamma distribution, but the chi-squared distribution is preferred here in order to minimize the number of distributions you need to know about and because when you need to use tables, you are likely to refer to tables of chi-squared in any case; the two descriptions are of course equivalent.

Binomial distribution. Suppose that k has a binomial distribution of index n and parameter π. Then

Unnumbered Display Equation

We say that π has a beta distribution with parameters α and  , denoted  if its density is of the form

Unnumbered Display Equation

(the fact that  and  appear in the indices rather than α and  is for technical reasons). The beta distribution is described in more detail in Appendix A. If, then, π has a beta prior density, it is clear that it has a beta posterior density, so that the family of beta densities forms a conjugate family. It is a simple extension that this family is still conjugate if we have a sample of size k rather than just one observation from a binomial distribution.

2.10.3 Mixtures of conjugate densities

Suppose we have a likelihood  and  and  are both densities in a conjugate family Π which give rise to posteriors  and  respectively. Let α and  be any non-negative real numbers summing to unity, and write

Unnumbered Display Equation

Then (taking a little more care with constants of proportionality than usual) it is easily seen that the posterior corresponding to the prior  is

Unnumbered Display Equation

where

Unnumbered Display Equation

with the constant of proportionality being such that

Unnumbered Display Equation

More generally, it is clearly possible to take any convex combination of more than two priors in Π and get a corresponding convex combination of the respective posteriors. Strictly in accordance with the definition given, this would allow us to extend the definition of Π to include all such convex combinations, but this would not retain the ‘naturalness’ of families such as the normal or the inverse chi-squared.

The idea can, however, be useful if, for example, you have a bimodal prior distribution. An example quoted by Diaconis and Ylvisaker (1985) is as follows. To follow this example, it may help to refer to Section 3.1 on ‘The binomial distribution’, or to return to it after you have read that section. Diaconis and Ylvisaker observe that there is a big difference between spinning a coin on a table and tossing it in the air. While tossing often leads to about an even proportion of ‘heads’ and ‘tails’, spinning often leads to proportions like  or  ; we shall write π for the proportion of heads, They say that the reasons for this bias are not hard to infer, since the shape of the edge will be a strong determining factor – indeed magicians have coins that are slightly shaved; the eye cannot detect the shaving but the spun coin always comes up ‘heads’. Assuming that they were not dealing with one of the said magician’s coins, they thought that a fifty–fifty mixture (i.e.  of two beta densities, namely,  (proportional to  ) and  (proportional to  ), would seem a reasonable prior (actually they consider other possibilities as well). This is a bimodal distribution, which of course no beta density is, having modes, that is maxima of the density, near to the modes  and at  of the components.

They then spun a coin ten times, getting ‘heads’ three times. This gives a likelihood proportional to  and so

Unnumbered Display Equation

that is,

Unnumbered Display Equation

or, since 13+27=23+17,

Unnumbered Display Equation

From the fact that  , it is easily deduced that

Unnumbered Display Equation

We can deduce some properties of this posterior from those of the component betas. For example, the probability that π is greater than 0.5 is the sum 115/129 times the probability that a  is greater than 0.5 and 14/129 times the probability that a  is greater than 0.5; and similarly the mean is an appropriately weighted average of the means.

These ideas are worth bearing in mind if you have a complicated prior which is not fully dominated by the data, and yet want to obtain a posterior about which at least something can be said without complicated numerical integration.

2.10.4 Is your prior really conjugate?

The answer to this question is, almost certainly, ‘No’. Nevertheless, it is often the case that the family of conjugate priors is large enough that there is one that is sufficiently close to your real prior beliefs that the resulting posterior is barely distinguishable from the posterior that comes from using your real prior. When this is so, there are clear advantages in using a conjugate prior because of the greater simplicity of the computations. You should, however, be aware that cases can arise when no member of the conjugate family is, in the aforementioed sense, close enough, and then you may well have to proceed using numerical integration if you want to investigate the properties of the posterior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset