7.8 Empirical Bayes methods
7.8.1 Von Mises’ example
Only a very brief idea about empirical Bayes methods will be given in this chapter; more will be said about this topic in Chapter 8 and a full account can be found in Maritz and Lwin (1989). One of the reasons for this brief treatment is that, despite their name, very few empirical Bayes procedures are, in fact, Bayesian; for a discussion of this point see, for example, Deely and Lindley (1981).
The problems we will consider in this section are concerned with a sequence xi of observations such that the distribution of the ith observation xi depends on a parameter , typically in such a way that has the same functional form for all i. The parameters are themselves supposed to be a random sample from some (unknown) distribution, and it is this unknown distribution that plays the role of a prior distribution and so accounts for the use of the name of Bayes. There is a clear contrast with the situation in the rest of the book, where the prior distribution represents our prior beliefs, and so by definition it cannot be unknown. Further, the prior distribution in empirical Bayes methods is usually given a frequency interpretation, by contrast with the situation arising in true Bayesian methods.
One of the earliest examples of an empirical Bayes procedure was due to von Mises (1942). He supposed that in examining the quality of a batch of water for possible contamination by certain bacteria, m = 5 samples of a given volume were taken, and he was interested in determining the probability θ that a sample contains at least one bacterium. Evidently, the probability of x positive result in the 5 samples is
for a given value of θ. If the same procedure is to be used with a number of batches of different quality, then the predictive distribution (denoted to avoid ambiguity) is
where the density represents the variation of the quality θ of batches. [If comes from the beta family, and there is no particular reason why it should, then is a beta-binomial distribution, as mentioned at the end of Section 3.1 on ‘The binomial distribution’]. In his example, von Mises wished to estimate the density function on the basis of n = 3420 observations.
7.8.2 The Poisson case
Instead of considering the binomial distribution further, we shall consider a problem to do with the Poisson distribution which, of course, provides an approximation to the binomial distribution when the number m of samples is large and the probability θ is small. Suppose that we have observations where the have a distribution with a density , and that we have available n past observations, among which fn(x) were equal to x for . Thus, fn(x) is an empirical frequency and fn(x)/n is an estimate of the predictive density . As x has a Poisson distribution for given λ
Now suppose that, with this past data available, a new observation is made, and we want to say something about the corresponding value of λ. In Section 7.5 on ‘Bayesian decision theory’, we saw that the posterior mean of λ is
To use this formula, we need to know the prior or at least to know and , which we do not know. However, it is clear that a reasonable estimate of is , after allowing for the latest observation. Similarly, a reasonable estimate for is . It follows that a possible point estimate for the current value of λ, corresponding to the value resulting from a quadratic loss function, is
This formula could be used in a case like that investigated by von Mises if the number m of samples taken from each batch were fairly large and the probability θ that a sample contained at least one bacterium were fairly small, so that the Poisson approximation to the binomial could be used.
This method can easily be adapted to any case where the posterior mean of the parameter of interest takes the form
and there are quite a number of such cases (Maritz and Lwin, 1989, Section 1.3).
Going back to the Poisson case, if it were known that the underlying distribution were of the form for some S0 and , then it is known (cf. Section 7.5) that
In this case, we could use to estimate S0 and in some way, by, say, and , giving an alternative point estimate for the current value of
The advantage of an estimate like this is that, because, considered as a function of , it is smoother than , it could be expected to do better. This is analogous with the situation in regression analysis, where a fitted regression line can be expected to give a better estimate of the mean of the dependent variable y at a particular value of the independent variable x than you would get by concentrating on values of y obtained at that single value of x. On the other hand, the method just described does depend on assuming a particular form for the prior, which is probably not justifiable. There are, however, other methods of producing a ‘smoother’ estimate.
Empirical Bayes methods can also be used for testing whether a parameter θ lies in one or another of a number of sets, that is, for hypothesis testing and its generalizations.