Chapter 11
Applications and Limitations of Perfect
Simulation
There are more things in heaven and earth, Horatio, than are dreamt of in
your philosophy.
William Shakespeare
In the previous chapters, obtaining the perfect samples themselves was the goal.
In this chapter, we explore applications of perfect simulation, situations where nd-
ing a perfect draw acts as a subroutine in a larger method or algorithm. In addition,
it is shown how to run perfect simulation algorithms where the running time is con-
centrated around the expected running time, and the limitations of perfect simulation
are discussed.
11.1 Doubly intractable distributions
Again consider the basic Bayesian framework for inference. The ingredients are a
prior density f
Θ
on a random variable Θ that represents the parameters of the model.
Next suppose there is a density f
Y |Θ
for the data Y given the value of the parameters.
Then the posterior density on Θ is
f
Θ|Y
(
θ
|Y = y) f
Θ
(
θ
) f (y|Θ =
θ
). (11.1)
Note that notation is often collapsed in the Bayesian literature, and
θ
is used for
both the dummy variable and the random variable. Here our notation d istinguishes
between the two in order to make notation precise. The capital letter theta, Θ, refers
to the random variable, while the lower case theta,
θ
, refers to the dummy variable
in the density.
In the statistics literature, the pr oblem of calculatin g the norm alizing constant for
f
Θ|Y
is often r eferred to as intractable when no practically fast algorithm exists to nd
it. The meaning of fast will depend on the context of the problem and the statistical
model. This is in contrast to the use of intractable in com putational complexity, where
it usually means that nding the normalizing constant is known to be a Number P
complete problem.
As noted back in Chapter 1, even though the normalizing constant is unknown,
given a random walk that uses density q
b
(·) to propose the next state starting from
199
200 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION
b, a Metropolis-Hastings chain with stationary density f accepts the move to state
θ
with probability min{1,[ f (
θ
)q
b
(
θ
)]/[ f (
θ
)q
b
(
θ
)]}. When the target density is the
posterior density, this makes the acceptance probability
min
1,
f
Θ
(
θ
) f (y|Θ =
θ
)
f
Θ
(
θ
) f (y|Θ =
θ
)
. (11.2)
However, when f (y|
θ
) is not known, but itself is given by an unnormalized
density, it is not even possible to evaluate this ratio, and which point not even a
Metropolis-Hastings approximate sampler can be run!
For example, suppose that the data is the quality of soil samples in a rectangu-
lar eld bro ken down into square p lots. For simplicity, the quality of each plot is
measured as either being “good” or “bad. Then a natural model is the Ising model,
where 1 represents a good plot and 1 is a bad plot. The Ising model captures the
propensity of good plots to be close to other good plots spatially. The parameter
β
for the Ising model measures the strength of the spatial attraction.
However, there is no closed-form solutio n for the normalizing constant of the
Ising model for general graphs, and even for planar graphs, the O (n
3
) time to calcu-
late the normalizing constant could be too slow.
In general, if f (y|Θ =
θ
)=w(y|Θ =
θ
)/Z
θ
for a normalizing constant Z
θ
,then
the r atio to be computed is
r = min
1,
q
θ
(
θ
) f
Θ
(
θ
)w(y|Θ =
θ
)/Z
θ
q
θ
(
θ
) f
Θ
(
θ
)w(y|Θ =
θ
)/Z
θ
. (11.3)
This situation where the normalizing constant of th e posterior as well as that of
the statistical model are both unknown is referred to as a doubly intractable prob-
lem in the literatur e [108]. Møller et al. [101] rst came up with a way around this
problem, and used it for Gibbs point processes in [6].
Add an auxiliary variable X to the chain, where [X|
θ
] follows the same distribu-
tion as [Y |
θ
]. The state of the cha in is now (
θ
,x),where
θ
is a p arameter value, and
x is a conguration.
Given current parameter
θ
and conguration x, propose a new parameter
θ
,and
from the statistical model propose a new conguration x
given parameter
θ
.Sothe
density for the move from (
θ
,x) to (
θ
,x
) is q
θ
(
θ
)w
Y |Θ
(x
|Θ =
θ
)/Z
θ
. That makes
the acceptance ratio for moving from (
θ
,y) to (
θ
,y
) the minimum of 1 and
q
θ
(
θ
)[w
Y |Θ
(x
|Θ =
θ
)/Z
θ
] f
Θ
(
θ
)[w
Y |Θ
(y|Θ =
θ
)/Z
θ
]
q
θ
(
θ
)[w
Y |Θ
(x|Θ =
θ
)/Z
θ
] f
Θ
(
θ
)[w
Y |Θ
(y|Θ =
θ
)/Z
θ
]
. (11.4)
The Z
θ
and Z
θ
factors cancel out of this expression, and so this ratio can be computed
without needing to know the (difcult to compute) normalizing constant.
Because this adds a single auxiliary random variable draw from the statistical
model, this method was later referred to as the Single Auxiliary Variable Method
(SAVM) [108].
APPROXIMATING INTEGRALS AND SUMS 201
11.2 Approximating integrals and sums
One of the main uses of generating samples from unweighted distributions is to ap-
proximate the partition function for the distribution. In Chapter 9 the partition func-
tion for a particular set of weights on permutations is the permanent of a matrix, a
known #P complete problem. The partition function of the Ising model is also known
to b e #P complete [70].
11.2.1 TPA
One of the primary uses of perfect samplers is in constructing unbiased estimators of
integrals. In computer science, this is often accomplished in polynomial time through
the use of self-reducible problems; see [72] for details.
A more advanced version of the self-reducible idea is the Tootsie Pop algorithm
(TPA). This is a simple m eans fo r estim ating an integral given the ability to sample
from a particular family of distributions. The name is a reference to a popular com-
mercial for a Tootsie Pop, which is a candy that consists of a soft chocolate center
surrounded by a hard candy shell. The commercial asks the question “How many
licks does it take to get to the center of a Tootsie Pop?”
Any in tegral can be written as the measure
ν
of a set B.ThissetB is the shell. The
center will be a set A that is a subset of B where
ν
(A) is known. If one can perfectly
sample from B, then the percentage of times the sample falls in A forms an estimate
of
ν
(A)/
ν
(B), which gives a means for approximating
ν
(B).
However, as has been seen time and again, this probability is typically exponen-
tially small in the problem size for many applications. TPA offers a way around this
problem. Following [57], TPA can be broken down into four ingredients.
1. A measure space (Ω,F ,
μ
).
2. Two nite measurable sets A B with
μ
(A) > 0. The set A is the center and B is
the shell.
3. A family of sets B
β
such that
β
<
β
implies that B
β
B
β
. These sets should
not jump too much in the sense that
ν
(B
β
) should be a continuous function of
β
.
Also, lim
β
→−
ν
(B
β
)=0.
4. Two special values of the parameter
β
B
and
β
A
such that B
β
B
= B and B
β
A
= A.
Given these ingredients, the outline of the TPA protocol is as follows.
TPA Output: N Pois(ln(
ν
(B)/
ν
(A)))
1)
β
β
B
, N ←−1
2) Repeat
3) Draw X from
ν
conditioned to lie in B
β
, N N + 1
4)
β
inf{b : X B
b
}
5) Until X A
Figure 11.1 illustrates a run of TPA when B
β
is the ball of radius
β
in R
2
.The
distribution of the output of a run of TPA turns out to have a Poisson distribution
with mean ln(
ν
(B)/
ν
(A)).
202 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION
Figure 11.1 A single run of TPA when B is the ball of radius 3, A is the ball of radius 1, and
B
β
is the ball of radius
β
. The number of samples needed before X lands in the center is 3,
giving output N = 2.
Note that when X is drawn in TPA, the next value of
β
cuts off anywhere from 0%
to 100% of the region being drawn. A perhaps surprising fact is that this percentage
is uniform over these two extremes.
Lemma 11.1. Suppose X B
β
and
β
= inf{b : X B
b
}.Then
ν
(B
β
)/
ν
(B
β
)
Unif([0,1]).
Proof. Let U =
ν
(B
β
)/
ν
(B
β
).Fixa [0, 1) and consider P(U a).Since
ν
(B
β
)
is a continuous function of
β
that goes to 0 as
β
goes to , there exists a b
(,
β
] such that
ν
(B
b
)/
ν
(B
β
)=a. The chance that X Unif(B
β
) falls in B
b
is
ν
(B
b
)/
ν
(B
β
)=a, and when this happens U
i
a.SoP(U
i
a) a.
On the other hand, there exists N such that for any n N, there is a value b
n
such that
ν
(B
b
n
)/
ν
(B
β
)=a + 1/n. Using the nested property of the family of sets, if
X /B
b
n
,then
β
a + 1/n > a + 1/(n+ 1). Therefore, if U
i
a,thenX B
b
n
for all
n N + 1. But that means that P(U
i
a) a + 1 /n for all n N + 1, which means
P(U
i
a) a, completing the proof.
So by starting with m easure
ν
(B),andaftert steps of taking a draw X and gen-
erating a new value of
β
,
ν
(B
β
i
)
ν
(B)U
1
U
2
U
3
···U
t
. Or in negative log space,
ln(
ν
(B
β
i
))) (ln(
ν
(B))) + (ln(U
1
)) + (ln(U
2
)) + ···+(ln(U
t
)). (11.5)
The reason for working with the negative logarithms of the uniforms is that these
have an exponential distribution with rate 1.
Fact 11.1. Fo r U Unif([0,1]), ln(U) Pois(1).
Lemma 11.2. The output of TPA is a Poisson distributed random variable, with mean
ln(
ν
(B)/
ν
(A)).
OMNITHERMAL APPROXIMATION 203
Proof. From the last fact, the points ln(U
1
),ln(U
2
) ln(U
3
),... form a
Poisson process of rate 1. The number of such points that fall in the inter-
val [ln(
ν
(B)),ln(
ν
(A))] will be Poisson distributed with mean ln(
ν
(A))
(ln(
ν
(B))) = ln(
ν
(B)/
ν
(A)).
Poisson random variables, like normals, have the nice property that when in-
dependent values are added, you simply add the individual parameters. So if TPA
is run k times and the results added, the result N
k
is a Poisson random variable
with mean k ln(
ν
(B)/
ν
(A)). For any choice of k by the simulator, it is possible to
construct an exact condence interval for the r esulting estimate of
ν
(B) ˆa where
ˆa =
ν
(A)exp(N
k
/k).
So far we have assumed that
ν
(A) is easy to compute, but
ν
(B) is hard. Of course,
TPA just estimates between the two, so it is just as useful when
ν
(B) is easy to
compute, and the goal is to approximate
ν
(A).
11.2.2 TPA for Gibbs distributions
Suppose that a density can be written as f (x)=exp(
β
H(x))/Z
β
for a function
H(x). A d ensity of this form is said to represent a Gibbs distribution. TPA can
be applied to Gibbs distributions when H(x) 0. (Actually, any upper or lower
bound on H(x) sufces, but usually H(x) 0 in practice, so that is the case con-
sidered here.) For instance, the Ising model with no external magnetic eld has
H(x)=
{i, j}∈E
1(x(i)=x( j)) 0. As usual, let the set of congurations be Ω.
Let B
β
= {x,y : x Ω : y [0,exp(
β
H(x))]}. The restriction that H(x) 0en-
sures that for
β
<
β
, B
β
B
β
.Now
ν
(B
0
)=#Ω, which is usually easy to compute.
For example, in the Ising model Ω = {−1, 1}
V
,so
ν
(B
0
)=2
#V
.
To draw (x,y) uniformly from B
β
, rst draw X from the distribution with param-
eter
β
,andthenY |X can be drawn uniformly from [0, exp(
β
H(X ))].WhenY 1
then
β
< 0 and the center has been reached. When Y > 1, the next value for
β
is
then the smallest value of b such that Y [0, exp(bH(X))].Sob = ln(Y )/H(X).
(When H(X)=0justsetb = .)
TPA for Gibbs Input:
β
Output: N Pois(ln(Z
B
/#Ω))
1) N 0
2) While
β
> 0
3) Draw X from the Gibbs distribution with parameter
β
4) Draw Y uniformly from [0,exp(
β
H(X ))]
5) If Y 1orH(X)=0then
β
= 0
6) Else
β
←−ln(Y )/H(X) and N N + 1
11.3 Omnithermal approximation
For Gibbs distributions, it is not only possible to use TPA to generate estimates for
a single Z
β
in this fashion, but also to simultaneously give estimates of Z
β
for all
β
in some interval [
β
A
,
β
B
].Since
β
is also kn own as the inverse temperature, getting
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset