9.3 Data augmentation by Monte Carlo
9.3.1 The genetic linkage example revisited
We can illustrate this technique with the example on genetic linkage we considered in connection with the EM algorithm. Recall that we found that likelihood of the augmented data was
In this method, we suppose that at each stage we have a ‘current’ distribution for η, which initially is the prior distribution. At all stages, this has to be a proper distribution, so we may as well take our prior as the uniform distribution Be(1, 1), which in any case differs little from the reference prior Be(0, 0). At the tth stage in the imputation step, we pick m possible values , … , of η by some (pseudo-) random mechanism with the current density, and then for each of these values of we generate a value for the augmented data , which in the particular example simply means picking a value y(i)1 with a binomial distribution of index x1 and parameter . Since we had a Be(1, 1) prior, this gives a posterior
In the posterior step we now take the new ‘current’ distribution of η as a mixture of the m beta distributions so generated, all values of i being treated as equally likely (cf. the end of Section 2.10). We could then construct an estimate for the posterior from a histogram of the values of η found at each iteration, but a better and smoother estimate results from taking the ‘current’ distribution at the end of the iteration.
It is worth noting that,
In practice, it is inefficient to take m large during the first few iterations when the posterior distribution is far from the true distribution. Rather, it is suggested that m initially be small and then increased with successive iterations (Tanner and Wong, 1987).
9.3.2 Use of
A number of examples in this chapter and the next are set out as programs in . The project for statistical computing is described on the web page
and useful books covering it are Dalgaard (2008), Krause and Olsen (2000), Fox (2002) and Venables and Ripley (2002). A very good book on Bayesian statistics with examples in is Albert (2009). At a lower level, Gill (2002) is useful (although it may be noted that his highest posterior densities (HPDs) are only approximately highest density regions). Another book with much useful material is Robert and Casella (2010). For the ordinary user, is virtually indistinguishable from S-plus, but has the advantage that it is free.
Even if you do not know , these programs should be reasonable easy to follow once you realize that <- (a form of arrow) is an assignment operator. In fact, programs are provided on the web site associated with the book for all the numerical examples in this book, making use of the programs for finding HDRs and various functions associated with Behrens’ distribution which can be found in Appendix C.
9.3.3 The genetic linkage example in
A program in for the genetic linkage example is as follows:
niter < - 50
minitial < - 20
etadivs < - 1000
mag < - 10
scale < - 8
x < - c(125,18,20,34)
pagethrow < - 12
m < - minitial
eta < - runif(m) # random from U(0,1)
y < - rbinom(m,x[1],eta/(eta+2)) # random binomial
for (t in 1:niter){
mold < - m
if (t > 30)
m < - 200
if (t > 40)
x m < - 1000
i0 < - floor(runif(m,0,mold))
eta < - rbeta(m,y[i0]+x[4]+1,x[2]+x[3]+1) # random beta
y < - rbinom(m,x[1],eta/(eta+2))}
p < - rep(0,etadivs) # vector of etadivs zeroes
for (etastep in 1:(etadivs-1)){
eta < - etastep/etadivs
term < - exp((y+x[4])*log(eta) + (x[2]+x[3])*log(1-eta)
+lgamma(y+x[2]+x[3]+x[4]+2)-lgamma(y+x[4]+1)
-lgamma(x[2]+x[3]+1)) # lgamma is log gamma fn
p[etastep] < - p[etastep] + sum(term)/m}
plot(1:etadivs/etadivs,p,pch=".",xlab="eta")
The resulting plot is shown as Figure 9.1. The function is thus evaluated for , 1/n, 2/n, … , 1, and can now be used to estimate the posterior density of η. A simulation in with T=50, m=m(t)=20 for , m=200 for , m=1000 for t> 40, and n=1000 showed a posterior mode at , which is close to the value found earlier using the EM algorithm. However, it should be noted that this method gives an approximation to the whole posterior distribution as opposed to the EM algorithm which will only give the mode.
Programs and algorithms for generating pseudo-random numbers with many common distributions can be found in Press et al. (1986–1993) or Kennedy and Gentle (1980); it may help to note that discrete uniform variates are easily constructed by seeing which of m equal intervals a U(0, 1) variate falls in, and a variate w can be constructed as u/(u+v), where and .
The data augmentation technique was originated by Tanner and Wong (1987) and is discussed in detail in Tanner (1996).
9.3.4 Other possible uses for data augmentation
Suppose that x1, x2, … , xn are independently where θ is unknown but is known but that instead of a normal prior for θ you have a prior
so that
has a Student’s t distribution (where μ and s are known). The posterior for θ in this case is of a complicated form, but if we augment the data with a parameter such that
so that the unconditional distribution of θ is as above (cf. Section 2.12; n in that section is here replaced by unity, so we do not have ), then the conditional distribution of θ given and is normal and we can now use the algorithm to find .
Further examples can be found in Tanner (1996).