Chapter 11: Applications and Limitations of Perfect Simulation (1/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11

Applications and Limitations of Perfect

Simulation

There are more things in heaven and earth, Horatio, than are dreamt of in

your philosophy.

William Shakespeare

In the previous chapters, obtaining the perfect samples themselves was the goal.

In this chapter, we explore applications of perfect simulation, situations where ﬁnd-

ing a perfect draw acts as a subroutine in a larger method or algorithm. In addition,

it is shown how to run perfect simulation algorithms where the running time is con-

centrated around the expected running time, and the limitations of perfect simulation

are discussed.

11.1 Doubly intractable distributions

Again consider the basic Bayesian framework for inference. The ingredients are a

prior density f

on a random variable Θ that represents the parameters of the model.

Next suppose there is a density f

Y |Θ

for the data Y given the value of the parameters.

Then the posterior density on Θ is

Θ|Y

(

|Y = y) ∝ f

(

) f (y|Θ =

). (11.1)

Note that notation is often collapsed in the Bayesian literature, and

is used for

both the dummy variable and the random variable. Here our notation d istinguishes

between the two in order to make notation precise. The capital letter theta, Θ, refers

to the random variable, while the lower case theta,

, refers to the dummy variable

in the density.

In the statistics literature, the pr oblem of calculatin g the norm alizing constant for

Θ|Y

is often r eferred to as intractable when no practically fast algorithm exists to ﬁnd

it. The meaning of fast will depend on the context of the problem and the statistical

model. This is in contrast to the use of intractable in com putational complexity, where

it usually means that ﬁnding the normalizing constant is known to be a Number P

complete problem.

As noted back in Chapter 1, even though the normalizing constant is unknown,

given a random walk that uses density q

(·) to propose the next state starting from

199

200 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION

b, a Metropolis-Hastings chain with stationary density f accepts the move to state



with probability min{1,[ f (



(

)]/[ f (

(



)]}. When the target density is the

posterior density, this makes the acceptance probability

min



(



) f (y|Θ =



)

(

) f (y|Θ =

)



. (11.2)

However, when f (y|

) is not known, but itself is given by an unnormalized

density, it is not even possible to evaluate this ratio, and which point not even a

Metropolis-Hastings approximate sampler can be run!

For example, suppose that the data is the quality of soil samples in a rectangu-

lar ﬁeld bro ken down into square p lots. For simplicity, the quality of each plot is

measured as either being “good” or “bad.” Then a natural model is the Ising model,

where 1 represents a good plot and −1 is a bad plot. The Ising model captures the

propensity of good plots to be close to other good plots spatially. The parameter

for the Ising model measures the strength of the spatial attraction.

However, there is no closed-form solutio n for the normalizing constant of the

Ising model for general graphs, and even for planar graphs, the O (n

) time to calcu-

late the normalizing constant could be too slow.

In general, if f (y|Θ =

)=w(y|Θ =

)/Z

for a normalizing constant Z

,then

the r atio to be computed is

r = min





(

) f

(



)w(y|Θ =



)/Z



(



) f

(

)w(y|Θ =

)/Z



. (11.3)

This situation where the normalizing constant of th e posterior as well as that of

the statistical model are both unknown is referred to as a doubly intractable prob-

lem in the literatur e [108]. Møller et al. [101] ﬁrst came up with a way around this

problem, and used it for Gibbs point processes in [6].

Add an auxiliary variable X to the chain, where [X|

] follows the same distribu-

tion as [Y |

]. The state of the cha in is now (

,x),where

is a p arameter value, and

x is a conﬁguration.

Given current parameter

and conﬁguration x, propose a new parameter



,and

from the statistical model propose a new conﬁguration x



given parameter



.Sothe

density for the move from (

,x) to (



) is q

(



Y |Θ



|Θ =



)/Z



. That makes

the acceptance ratio for moving from (

,y) to (



) the minimum of 1 and



(

)[w

Y |Θ



|Θ =

)/Z

] f

(



)[w

Y |Θ

(y|Θ =



)/Z



]

(



)[w

Y |Θ

(x|Θ =



)/Z



] f

(

)[w

Y |Θ

(y|Θ =

)/Z

]

. (11.4)

The Z



and Z

factors cancel out of this expression, and so this ratio can be computed

without needing to know the (difﬁcult to compute) normalizing constant.

Because this adds a single auxiliary random variable draw from the statistical

model, this method was later referred to as the Single Auxiliary Variable Method

(SAVM) [108].

APPROXIMATING INTEGRALS AND SUMS 201

11.2 Approximating integrals and sums

One of the main uses of generating samples from unweighted distributions is to ap-

proximate the partition function for the distribution. In Chapter 9 the partition func-

tion for a particular set of weights on permutations is the permanent of a matrix, a

known #P complete problem. The partition function of the Ising model is also known

to b e #P complete [70].

11.2.1 TPA

One of the primary uses of perfect samplers is in constructing unbiased estimators of

integrals. In computer science, this is often accomplished in polynomial time through

the use of self-reducible problems; see [72] for details.

A more advanced version of the self-reducible idea is the Tootsie Pop algorithm

(TPA). This is a simple m eans fo r estim ating an integral given the ability to sample

from a particular family of distributions. The name is a reference to a popular com-

mercial for a Tootsie Pop, which is a candy that consists of a soft chocolate center

surrounded by a hard candy shell. The commercial asks the question “How many

licks does it take to get to the center of a Tootsie Pop?”

Any in tegral can be written as the measure

of a set B.ThissetB is the shell. The

center will be a set A that is a subset of B where

(A) is known. If one can perfectly

sample from B, then the percentage of times the sample falls in A forms an estimate

(A)/

(B), which gives a means for approximating

(B).

However, as has been seen time and again, this probability is typically exponen-

tially small in the problem size for many applications. TPA offers a way around this

problem. Following [57], TPA can be broken down into four ingredients.

1. A measure space (Ω,F ,

2. Two ﬁnite measurable sets A ⊆ B with

(A) > 0. The set A is the center and B is

the shell.

3. A family of sets B

such that



implies that B



⊆ B

. These sets should

not jump too much in the sense that

) should be a continuous function of

Also, lim

→−∞

)=0.

4. Two special values of the parameter

and

such that B

= B and B

= A.

Given these ingredients, the outline of the TPA protocol is as follows.

TPA Output: N ∼ Pois(ln(

(B)/

(A)))

←

, N ←−1

2) Repeat

3) Draw X from

conditioned to lie in B

, N ← N + 1

← inf{b : X ∈B

}

5) Until X ∈ A

Figure 11.1 illustrates a run of TPA when B

is the ball of radius

in R

.The

distribution of the output of a run of TPA turns out to have a Poisson distribution

with mean ln(

(B)/

(A)).

202 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION

Figure 11.1 A single run of TPA when B is the ball of radius 3, A is the ball of radius 1, and

is the ball of radius

. The number of samples needed before X lands in the center is 3,

giving output N = 2.

Note that when X is drawn in TPA, the next value of

cuts off anywhere from 0%

to 100% of the region being drawn. A perhaps surprising fact is that this percentage

is uniform over these two extremes.

Lemma 11.1. Suppose X ∼ B

and



= inf{b : X ∈ B

}.Then



) ∼

Unif([0,1]).

Proof. Let U =

)/

).Fixa ∈ [0, 1) and consider P(U ≤ a).Since

)

is a continuous function of

that goes to 0 as

goes to −∞, there exists a b ∈

(−∞,

] such that

)=a. The chance that X ∼ Unif(B

) falls in B

)=a, and when this happens U

≤ a.SoP(U

≤ a) ≥ a.

On the other hand, there exists N such that for any n ≥ N, there is a value b

such that

)=a + 1/n. Using the nested property of the family of sets, if

X /∈B

,then



≥a + 1/n > a + 1/(n+ 1). Therefore, if U

≤a,thenX ∈B

for all

n ≥ N + 1. But that means that P(U

≤ a) ≤ a + 1 /n for all n ≥ N + 1, which means

P(U

≤ a) ≤ a, completing the proof.

So by starting with m easure

(B),andaftert steps of taking a draw X and gen-

erating a new value of

) ∼

(B)U

···U

. Or in negative log space,

−ln(

))) ∼ (−ln(

(B))) + (−ln(U

)) + (−ln(U

)) + ···+(−ln(U

)). (11.5)

The reason for working with the negative logarithms of the uniforms is that these

have an exponential distribution with rate 1.

Fact 11.1. Fo r U ∼ Unif([0,1]), −ln(U) ∼ Pois(1).

Lemma 11.2. The output of TPA is a Poisson distributed random variable, with mean

ln(

(B)/

(A)).

OMNITHERMAL APPROXIMATION 203

Proof. From the last fact, the points −ln(U

),−ln(U

) − ln(U

),... form a

Poisson process of rate 1. The number of such points that fall in the inter-

val [−ln(

(B)),−ln(

(A))] will be Poisson distributed with mean −ln(

(A)) −

(−ln(

(B))) = ln(

(B)/

(A)).

Poisson random variables, like normals, have the nice property that when in-

dependent values are added, you simply add the individual parameters. So if TPA

is run k times and the results added, the result N

is a Poisson random variable

with mean k ln(

(B)/

(A)). For any choice of k by the simulator, it is possible to

construct an exact conﬁdence interval for the r esulting estimate of

(B) ≈ ˆa where

ˆa =

(A)exp(N

/k).

So far we have assumed that

(A) is easy to compute, but

(B) is hard. Of course,

TPA just estimates between the two, so it is just as useful when

(B) is easy to

compute, and the goal is to approximate

(A).

11.2.2 TPA for Gibbs distributions

Suppose that a density can be written as f (x)=exp(−

H(x))/Z

for a function

H(x). A d ensity of this form is said to represent a Gibbs distribution. TPA can

be applied to Gibbs distributions when H(x) ≤ 0. (Actually, any upper or lower

bound on H(x) sufﬁces, but usually H(x) ≤ 0 in practice, so that is the case con-

sidered here.) For instance, the Ising model with no external magnetic ﬁeld has

H(x)=−

∑

{i, j}∈E

1(x(i)=x( j)) ≤ 0. As usual, let the set of conﬁgurations be Ω.

Let B

= {x,y : x ∈ Ω : y ∈ [0,exp(−

H(x))]}. The restriction that H(x) ≤ 0en-

sures that for



, B

 ⊆ B

.Now

)=#Ω, which is usually easy to compute.

For example, in the Ising model Ω = {−1, 1}

,so

)=2

To draw (x,y) uniformly from B

, ﬁrst draw X from the distribution with param-

eter

,andthenY |X can be drawn uniformly from [0, exp(−

H(X ))].WhenY ≤ 1

then

< 0 and the center has been reached. When Y > 1, the next value for

then the smallest value of b such that Y ∈ [0, exp(−bH(X))].Sob = −ln(Y )/H(X).

(When H(X)=0justsetb = −∞.)

TPA for Gibbs Input:

Output: N ∼ Pois(ln(Z

/#Ω))

1) N ← 0

2) While

> 0

3) Draw X from the Gibbs distribution with parameter

4) Draw Y uniformly from [0,exp(−

H(X ))]

5) If Y ≤ 1orH(X)=0then

= 0

6) Else

←−ln(Y )/H(X) and N ←N + 1

11.3 Omnithermal approximation

For Gibbs distributions, it is not only possible to use TPA to generate estimates for

a single Z

in this fashion, but also to simultaneously give estimates of Z

for all

in some interval [

].Since

is also kn own as the inverse temperature, getting

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11: Applications and Limitations of Perfect Simulation (1/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11: Applications and Limitations of Perfect Simulation (1/4)