9.5 Rejection sampling

9.5.1 Description

An important method which does not make use of Markov chain methods but which helps to introduce the Metropolis–Hastings algorithm is rejection sampling or acceptance-rejection sampling. This is a method for use in connection with a density  in the case where the normalizing constant K is quite possibly unknown, which, as remarked at the beginning of this chapter, is a typical situation occurring in connection with posterior distributions in Bayesian statistics. To use this method, we need to assume that there is a candidate density  from which we can simulate samples and a constant c such that  . Then, to obtain a random variable  with density  we proceed as follows:

1. Generate a variate Y from the density  ;
2. Generate a value  which is uniformly distributed on (0, 1);
3. Then if  we define  ; otherwise go back to step 1.

In the discrete case we can argue as follows. In N trials we get the value θ on  occasions, of which we retain it  times, which is proportional to  . Similar considerations apply in the continuous case; a formal proof can be found in Ripley (1987, Section 3.2) or Gamerman and Lopes (2011, Section 5.1).

9.5.2 Example

As a simple example, we can use this method to generate values X from a beta distribution with density f(x)/K, where  in the case where  and  (in fact, we know that in this case  and  ). We simply note that the density has a maximum at  (see Appendix A), so that we can take  and h(x)=1, that is,  as well as  .

9.5.3 Rejection sampling for log-concave distributions

It is often found that inference problems simplify when a distribution is log-concave, that is when the logarithm of the density function is concave (see Walther, 2009). In such a case, we can fit a piecewise linear curve over the density as depicted in Figure 9.4a. In that particular case, there are three pieces to the curve, but in general there could be more. Exponentiating, we find an upper bound for the density itself which consists of curves of the form exp(a+bx), as depicted in Figure 9.4b. As these curves are easily integrated, the proportions of the area under the upper bound which fall under each of the curves are easily calculated. We can then generate a random variable with a density proportional to the upper bound in two stages. We first choose a constituent curve exp(a+bx) between x=t and  with a probability proportional to

Figure 9.4 Rejection sampling: (a) log density, (b) un-normalized density, (c) empirical density before rejection and (d) empirical density of v.

ch09fig004.eps

Unnumbered Display Equation

After this, a point between t and  is chosen as a random variable with a density proportional to exp(a+bx).

For the latter step, we note that such a random variable has distribution function

Unnumbered Display Equation

The inverse function of this is easily shown to be

Unnumbered Display Equation

If then V is uniformly distributed over [0, 1] and we write X=F–1(V), it follows that

Unnumbered Display Equation

so that X has the distribution function f(x).

Figure 9.4c show that simulation using this method produces an empirical distribution which fits closely to the upper envelope of curves we have chosen. We can now use the method of rejection sampling as described earlier to find a sample from the original log-concave density. The empirical distribution thus produced is seen to fit closely to this density in Figure 9.4d.

The choice of the upper envelope can be made to evolve as the sampling proceeds; for details, see Gilks and Wild (1992).

9.5.4 A practical example

In Subsection 9.4.2 headed ‘An example with observed data’ which concerned pump failures, we assumed that the parameter ν was known to be equal to 1.4. It would clearly be better to treat ν as a parameter to be estimated in the same way that we estimated S. Now

Unnumbered Display Equation

since the distribution of  given  does not depend on ν or S0. Consequently

Unnumbered Display Equation

Unfortunately, the latter expression is not proportional to any standard family for any choice of hyperprior  . If, however,

Unnumbered Display Equation

has the form of an exponential distribution of mean μ, we find that

Unnumbered Display Equation

so that

Unnumbered Display Equation

The logarithm of this density is convex. A proof of this follows, but the result can be taken for granted. We observe that the first derivative of the density is

Unnumbered Display Equation

where

Unnumbered Display Equation

(cf. Appendix A.7), while the second derivative is

Unnumbered Display Equation

where  is the so-called trigamma function. Since the trigamma function satisfies

Unnumbered Display Equation

(see Whittaker and Watson, 1927, Section 12.16), it is clear that this second derivative is negative for all z> 0, from which convexity follows.

We now find a piecewise linear function which is an upper bound of the density in the manner described earlier. In this case, it suffices to approximate by a three-piece function, as illustrated in Figure 9.4a. This function is constructed by taking a horizontal line tangent at the zero n2 of the derivative of the log-density, where this achieves its maximum h2 and then taking tangents at two nearby points n1 and n3. The joins between the lines occur at the points t1 and t3 where these three curves intersect. Exponentiating, this gives an upper bound for the density itself, consisting of a number of functions of the form exp(a+bx) (the central one being a constant), as illustrated in Figure 9.4b.

In this case, the procedure is very efficient and some 87–88% of candidate points are accepted. The easiest way to understand the details of this procedure is to consider the  program:

windows(record=T)
par(mfrow=c(2,2))
N < - 5000
mu < - 1
S < - 1.850
theta < - c(0.0626,0.1181,0.0937,0.1176,0.6115,
       0.6130,0.8664,0.8661,1.4958,1.9416)
k < - length(theta)
logf < - function(nu) {
    k*(nu/2)*log(S)-k*(nu/2)*log(2)-k*log(gamma(nu/2))+
    (nu/2-1)*sum(log(theta))-nu/mu
    )
dlogf < - function(nu) {
    (k/2)*log(S)-(k/2)*log(2)-0.5*k*digamma(nu/2)+
    0.5*sum(log(theta))-1/mu
    }
scale < - 3/2
f < - function(nu) exp(logf(nu))
n2 < - uniroot(dlogf,lower=0.01,upper=100)$root
h2 < - logf(n2) 
n1 < - n2/scale
h1 < - logf(n1)
b1 < - dlogf(n1)
a1 < - h1-n1*b1
n3 < - n2*scale
h3 < - logf(n3)
b3 < - dlogf(n3)
a3 < - h3-n3*b3
d < - exp(h2)
# First plot; Log density
curve(logf,0.01,3.5,ylim=c(-8,-1),xlab=“a. Log density”)
abline(h=h2)
abline(a1,b1)
abline(a3,b3)
text(0.25,h2+0.3,expression(italic(h)[2])))
text(0.3,-6,
    expression(italic(a)[1]+italic(b)[1]*italic(x)))
text(3.25,-7,)
    expression(italic(a)[3]+italic(b)[3]*italic(x)))
# End of first plot
f1 < - function(x) exp(a1+b1*x)
f3 < - function(x) exp(a3+b3*x)
t1 < - (h2-a1)/b1
t3 < - (h2-a3)/b3
# Second plot; Un-normalized density
curve(f,0.01,3.5,ylim=c(0,0.225),
       xlab=“b. Un-normalized Density”)
abline(h=d)
par(new=T)
curve(f1,0.01,3.5,ylim=c(0,0.225),ylab="",xlab="")
par(new=T)
curve(f3,0.01,3.5,ylim=c(0,0.225),ylab="",xlab="")
abline(v=t1)
abline(v=t3)
text(t1-0.1,0.005,expression(italic(t)[1]))
text(t3-0.1,0.005,expression(italic(t)[3]))
text(0.25,0.2,expression(italic(d)))
text(0.5,0.025,expression(italic(f)[1]))
text(2.5,0.025,expression(italic(f)[3]))
# End of second plot
q1 < - exp(a1)*(exp(b1*t1)-1)/b1
q2 < - exp(h2)*(t3-t1)
q3 < - -exp(a3+b3*t3)/b3
const < - q1 + q2 + q3
p < - c(q1,q2,q3)/const
cat(“p =”,p,“
”)
case < - sample(1:3,N,replace=T,prob=p)
v < - runif(N)
w < - (case==1)*(1/b1)*log(q1*b1*exp(-a1)*v+1)+
         (case==2)*(t1+v*(t3-t1))+
         )*(1/b3)*(log(q3*b3*exp(-a3)*v+exp(b3*t3)))
dq < - d/const
f1q < - function(x) f1(x)/const
f3q < - function(x) f3(x)/const
h < - function(x) {
       dq +
       (x<t1)*(f1q(x)-dq) +
       (x>t3)*(f3q(x)-dq)
}
# Third plot; Empirical density
plot(density(w),xlim=c(0.01,4),ylim=c(0,1.1),
     xlab=“c. Empirical density before rejection”,
     main="")
par(new=T)
curve(h,0.01,4,ylim=c(0,1.1),ylab="",xlab="")
# End of third plot
u < - runif(N)
nu < - w
nu[u>f(nu)/(const*h(nu))] < - NA
nu < - nu[!is.na(nu)]
cat(“Acceptances”,100*length(nu)/length(w),“
”)
int < - integrate(f,0.001,100)$value
fnorm < - function(x) f(x)/int
# Fourth plot
plot(density(nu),xlim=c(0.01,4),ylim=c(0,1.1),
     xlab=bquote(paste(“d. Empirical density of ”,nu)),
     main="")
par(new=T)
curve(fnorm,0.01,4,ylim=c(0,1.1),ylab="",xlab="")
# End of fourth plot}

It is possible to combine this procedure with the method used earlier to estimate the  and S0, and an  program to do this can be found on the web. We shall not discuss this matter further here since it is in this case considerably simpler to use WinBUGS as noted in Section 9.7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset