8.1 The idea of a hierarchical model

8.1.1 Definition

So far, we have assumed that we have a single known form to our prior distribution. Sometimes, however, we feel uncertain about the extent of our prior knowledge. In a typical case, we have a first stage in which observations  have a density  which depends on r unknown parameters

Unnumbered Display Equation

for which we have a prior density  . Quite often we make one or more assumptions about the relationships between the different parameters  , for example, that they are independently and identically distributed [sometimes abbreviated i.i.d.] or that they are in increasing order. Such relationships are often referred to as structural.

In some cases, the structural prior knowledge is combined with a standard form of Bayesian prior belief about the parameters of the structure. Thus, in the case where the  are independently and identically distributed, their common distribution might depend on a parameter  , which we often refer to as a hyperparameter. We are used to this situation in cases where  is known, but sometimes it is unknown. When it is unknown, we have a second stage in which we suppose that we have a hyperprior expressing our beliefs about possible values of  . In such a case, we say that we have a hierarchical prior; for the development of this idea, see Good (1980). It should be noted that the difficulty of specifying a second stage prior has made common the use of noninformative priors at the second stage (cf. Berger, 1985, Sections 3.6 and 4.6.1).

In Lindley’s words in his contribution to Godambe and Sprott (1971),

The type of problem to be discussed … is one in which there is a substantial amount of data whose probability structure depends on several parameters of the same type. For example, an agricultural trial involving many varieties, the parameters being the varietal means, or an educational test performed on many subjects, with their true scores as the unknowns. In both these situations the parameters are related, in one case by the common circumstances of the trial, in the other by the test used, so that a Bayesian solution, which is capable of including such prior feelings of relationship, promises to show improvements over the usual techniques.

There are obvious generalizations. For one thing, we might have a vector  of hyperparameters rather than a single hyperparameter. For another, we sometimes carry this process to a third stage, supposing that the prior for  depends on one or more hyper-hyperparameters  and so takes the form  . If  is unknown, then we have a hyper-hyperprior density  representing our beliefs about possible values of  .

All of this will become clearer when we consider some examples. Examples from various fields are given to emphasize the fact that hierarchical models arise in many different contexts.

8.1.2 Examples

8.1.2.1 Hierarchical Poisson model

In Section 7.8 on ‘Empirical Bayes methods’, we considered a case where we had observations  where the  have a distribution with a density  , and then went on to specialize to the case where  was of the conjugate form  for some S0 and  . This is a structural relationship as defined earlier in which the parameters are  and the hyperparameters are  . To fit this situation into the hierarchical framework we only need to take a prior distribution for the hyperparameters. Since they are both in the range  , one possibility might be to take independent reference priors  and  or proper priors close to these over a large range.

8.1.2.2 Test scores

Suppose that a number of individuals take intelligence tests (‘IQ tests’) on which their scores are normally distributed with a known variance  but with a mean which depends on the ‘true abilities’ of the individuals concerned, so that  . It may well happen that the individuals come from a population in which the true abilities are (at least to a reasonable approximation) normally distributed, so that  . In this case the hyperparameters are  . If informative priors are taken at this stage, a possible form would be to take μ and  as independent with  and  for suitable values of the hyper-hyperparameters λ,  , S0 and  .

8.1.2.3 Baseball statistics

The batting average of a baseball player is defined as the number of hits Si divided by the number of times at bat; it is always a number between 0 and 1. We will suppose that each of r players have been n times at bat and that the batting average of the ith player Yi=Si/n is such that  , so that using the inverse root-sine transformation (see Section 3.2 on ‘Reference prior for the binomial likelihood’), we see that if

Unnumbered Display Equation

then to a good approximation

Unnumbered Display Equation

We might then suppose that

Unnumbered Display Equation

Finally, we suppose that  is known and that the prior knowledge of μ is weak, so that over the range over which the likelihood is appreciable, the prior density of μ is constant (cf. Section 2.5 on ‘Locally uniform priors’).

This example is considered by Efron and Morris (1975 and 1977); we give further consideration to it in Section 8.3. These authors also consider an example arising from data on the incidence of toxoplasmosis (a disease of the blood that is endemic in much of Central America) among samples of various sizes from 36 cities in El Salvador.

8.1.2.4 Poisson process with a change point

Suppose that we have observations xi for  which represent the number of times a rare event has occurred in each of n equal time intervals and that we have reason to believe that the frequency of this event has changed abruptly from one level to another at some intermediate value of i. We might then be interested in deciding whether there really is evidence of such an abrupt change and, if so, then investigating when it took place.

To model this situation, we suppose that  for  while  for  . We then take independent priors for the parameters  such that

a.  , that is, k has a discrete uniform distribution on [1, n];
b.  where U has an exponential distribution of mean 1 (or equivalently a one-parameter gamma distribution with parameter 1, so that  );
c.  where V is independent of U and has the same distribution.

These distributions depend on the two parameters  and  , so that  are hyperparameters.

Finally, we suppose that  and  have independent prior distributions which are multiples of chi-squared, so that for suitable values of the parameters  , ζ, α and β we have  and  .

This situation is a slightly simplified version of one described by Carlin et al. (1992), Tanner (1996, Sections 6.2.2 and 6.2.3) and Carlin and Louis (2008, Chapter 5, Exercises 8–10) as a model for the numbers of coal-mining disasters (defined as accidents resulting in the deaths of ten or more miners) in Great Britain for the years from 1851 to 1962 inclusive. We shall consider this example in detail in 9.4 on ‘The Gibbs sampler’.

8.1.2.5 Risk of tumour in a group of rats

In a study of tumours among laboratory rats of type ‘F344’, the probability of tumours in different groups of rats is believed to vary because of differences between rats and experimental conditions among the experiments. It may well be reasonable to suppose that the probabilities come from a beta distribution, but it is not clear a priori which prior beta distribution to take.

In this case, the number of rats yi that develop tumours in the ith group which is of size ni is such that  , while  . We then take some appropriate hyperprior distribution for the hyperparameters  ; it has been suggested that a suitable noninformative hyperprior is

Unnumbered Display Equation

This example is discussed in detail and the above hyperprior is derived in Gelman et al. (2004, Sections 5.1 and 5.3).

8.1.2.6 Vaccination against Hepatitis B

In a study of the effect of vaccination for Hepatitis B in the Gambia, it was supposed that yij, the log anti-HB titre (the amount of surface-antibody in blood samples) in the jth observation for the ith infant, could be modelled as follows:

Unnumbered Display Equation

(cf. Gilks et al., 1993, or Spiegelhalter, et al., 1996, Section 2.2). Here, the parameters are  and the hyperparameters are  . The hyperprior distributions for α and β are independent normals, and we take a reference prior for  . Further, we have hyper-hyperparameters  for which we take reference priors, so that

Unnumbered Display Equation

Actually, Gilks et al. take proper priors which over reasonable values of  and  behave similarly, namely,

Unnumbered Display Equation

8.1.3 Objectives of a hierarchical analysis

The objectives of hierarchical analyses vary. We can see this by considering the way in which the examples described in Subsection 8.1.2, headed ‘Examples’ might be analyzed.

In the case of the hierarchical Poisson model, the intention is to estimate the density function  or equivalently the hyperparameters  . Similarly, in the case of the example on rat tumours, the main interest lies in finding the joint posterior density of the hyperparameters α and β.

In the cases of the test example and the baseball example, the interest lies in estimating the parameters  as well as possible, while the hyperparameters μ and  are of interest mainly as tools for use in estimating  .

The main interest in the case of the Poisson process with a change point could quite well lie in determining whether there really is a change point, and, assuming that there is, finding out where it occurs as closely as possible.

However, the models could be explored with other objectives. For example, in the intelligence test example we might be interested in the predictive distribution  , which represents the overall distribution of ‘IQ’ in the population under consideration. Similarly, in the case of the Poisson distribution with a change point, we might be interested in the extent of the change (presuming that there is one), and hence in  , that is, in the distribution of a function of the parameters.

8.1.4 More on empirical Bayes methods

The empirical Bayes method, of which a very short account was given in 7.8, is often employed in cases where we have a structural relationship as described at the start of this section. Suppose for definiteness that we have a straightforward two-stage model in which the density  of the observations depends on parameters  which themselves are independent observations from a density  , so that we have a posterior density

Unnumbered Display Equation

We then estimate  by the method of maximum likelihood or by some other method of classical statistics. Note that this method makes no use of a prior distribution for the hyperparameter  .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset