3.2 Reference prior for the binomial likelihood

3.2.1 Bayes’ postulate

The Rev. Thomas Bayes himself in Bayes (1763) put forward arguments in favour of a uniform prior

Unnumbered Display Equation

(which, unlike the choice of a prior uniform over  , is a proper density in that it integrates to unity) as the appropriate one to use when we are ‘completely ignorant’. This choice of prior has long been known as Bayes’ postulate, as distinct from his theorem. The same prior was used by Laplace (1774). It is a member of the conjugate family, to wit Be(1, 1).

Bayes’ arguments are quite intricate, and still repay study. Nevertheless, he seems to have had some doubts about the validity of the postulate, and these doubts appear to have been partly responsible for the fact that his paper was not published in his lifetime, but rather communicated posthumously by his friend Richard Price.

The postulate seems intuitively reasonable, in that it seems to treat all values on a level and thus reflect the fact that you so no reason for preferring any one value to any other. However, you should not be too hasty in endorsing it because ignorance about the value of π presumably implies ignorance about the value of any function of π, and yet when the change of variable rule is used a uniform prior for π will not usually imply a uniform prior for any function of π.

One possible argument for it is as follows. A ‘natural’ estimator for the parameter π of a binomial distribution of index n is the observed proportion x/n of successes, and it might seem a sensible estimator to use when we have no prior information. It is in fact the maximum likelihood estimator, that is, the value of π for which the likelihood

Unnumbered Display Equation

is a maximum. In classical or sampling theory statistics it is also commended for various reasons which do not usually carry much weight with Bayesians, for example that it is unbiased, that is,

Unnumbered Display Equation

(the expectation being taken over repeated sampling) whatever the value of π is. Indeed, it is not hard to show that it is a minimum variance unbiased estimator (MVUE).

Now if you have a  prior and so get a posterior which is  , it might seem natural to say that a good estimator for π would be obtained by finding that value at which the posterior density is a maximum, that is, the posterior mode. This procedure is clearly related to the idea of maximum likelihood. Since the posterior mode occurs at

Unnumbered Display Equation

as is easily checked by differentiation, this posterior mode coincides with x/n if and only if  , that is, the prior is uniform.

Jeffreys (1961, Section 3.1) argued that ‘Again, is there not a preponderence at the extremes. Certainly if we take the Bayes-Laplace rule right up to the extremes we are lead to results that do not correspond to anybody’s way of thinking.’

3.2.2 Haldane’s prior

Another suggestion, due to Haldane (1931), is to use a Be(0, 0) prior, which has density

Unnumbered Display Equation

which is an improper density and is equivalent (by the usual change of variable argument) to a prior uniform in the log-odds

Unnumbered Display Equation

An argument for this prior based on the ‘naturalness’ of the estimator x/n when  is that the mean of the posterior distribution for π, namely,  , is

Unnumbered Display Equation

which coincides with x/n if and only if  . (There is a connection here with the classical notion of the unbiasedness of x/n.)

Another argument that has been used for this prior is that since any observation always increases either α or β, it corresponds to the greatest possible ignorance to take α and β as small as possible. For a beta density to be proper (i.e. to have a finite integral and so be normalizable, so that its integral is unity) it is necessary and sufficient that α and β should both be strictly greater than 0. This can then be taken as an indication that the right reference prior is Be(0, 0).

A point against this choice of prior is that if we have one observation with probability of success π, then use of this prior results in a posterior which is Be(1, 0) if that observation is a success and Be(0, 1) if it is a failure. However, a Be(1, 0) distribution gives infinitely more weight to values near 1 than to values away from 1, and so it would seem that a sample with just one success in it would lead us to conclude that all future observations will result in successes, which seems unreasonable on the basis of so small an amount of evidence.

3.2.3 The arc-sine distribution

A possible compromise between Be(1, 1) and Be(0, 0) is  , that is, the (proper) density

Unnumbered Display Equation

This distribution is sometimes called the arc-sine distribution (cf. Feller 1968, 1; III.4). In Section 3.3, we will see that a general principle known as Jeffreys’ rule suggests that this is the correct reference prior to use. However, Jeffreys’ rule is a guideline which cannot be followed blindly, so that in itself does not settle the matter.

The  prior can easily be shown (by the usual change-of-variable rule  to imply a uniform prior for

Unnumbered Display Equation

This transformation is related to the transformation of the data when

Unnumbered Display Equation

in which z is defined by

Unnumbered Display Equation

This transformation was first introduced in 1.5 on ‘Means and Variances’, where we saw that it results in the approximations

Unnumbered Display Equation

Indeed it turns out that

Unnumbered Display Equation

where the symbol  means ‘is approximately distributed as’ (see Section 3.10 on ‘Approximations based on the Likelihood’). To the extent that this is so, it follows that the transformation  puts the likelihood in data translated form, and hence that a uniform prior in  , that is, a  prior for π, is an appropriate reference prior.

3.2.4 Conclusion

The three aforementioned possibilities are not the only ones that have been suggested. For example, Zellner (1977) suggested the use of a prior

Unnumbered Display Equation

[see also the references in Berger (1985, Section 3.3.4)]. However, this is difficult to work with because it is not in the conjugate family.

In fact, the three suggested conjugate priors Be(0, 0),  and Be(1, 1) (and for that matter Zellner’s prior) do not differ enough to make much difference with even a fairly small amount of data, and the aforementioned discussion on the problem of a suitable reference prior may be too lengthy, except for the fact that the discussion does underline the difficulty in giving a precise meaning to the notion of a prior distribution that represents ‘knowing nothing’. It may be worth your while trying a few examples to see how little difference there is between the possible priors in particular cases.

In practice, the use of Be(0, 0) is favoured here, although it must be admitted that one reason for this is that it ties in with the use of HDRs found from tables of values of F based on HDRs for log F and hence obviates the need for a separate set of tables for the beta distribution. But in any case, we could use the method based on these tables and the results would not be very different from those based on any other appropriate tables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset