48 COUPLING FROM THE PAST
Lemma 3.1. The number of calls to
φ
needed by Coupling from the past is at
most t[2P(U A)
1
1].
Proof. In the call to the algor ithm, if U / A, then the recursive call in line 5 calls
Coupling
from the past again. Each of these recursive calls is independent of the
time before, so the number of calls is a geometric random variable with parameter
P(U A). That makes the mean number o f calls P(U A)
1
.
Line 3 is a call to
φ
t
, which requires t calls to
φ
. If line 6 is executed, then a
second call to
φ
t
is needed. Therefore, the expected number of calls to
φ
in any level
of the algorithm is t[1 + P(U / A)] = t[2 P(U A)]. Therefore the overall number
of expected calls is t[2P(U A )
1
1].
To keep the running time small, it is important that t be large enough that P(U
A) is reasonably sized, but once that is achieved t should be kept as small as possible.
It is not always easy to determine what such a t should be. Fortunately, there is a way
around this problem.
3.2.1 Varying the block size
In building CFTP, a single step update
φ
was composed with itself t times. Call this
set of t steps a block.
Suppose there is a threshold value t
such that P(U A)=0ift < t
. In practice,
it is usually very difcult to compute t
explicitly; therefore, it is common to use
blocks of increasing size to make sure that eventually t is greater than the threshold.
In general, there is no reason why the same number of steps needs to be used
for each block. The statement and proof of Theorem 1.1 easily adapts to the more
general situation where the functions b, g,and f change with each level of recursion.
Theorem 3.2 (Fundamental Theorem of perfect simulation (2nd form)). Suppose
for U
1
,U
2
,... independent and identically distributed, there are sequences of com-
putable functions {b
t
}, {g
t
}, and {f
t
} such that each b
t
has range {0, 1} and
t=1
P(b
t
(U )=0)=0. Let X be a random variable such that for all t:
X b
t
(U )g
t
(U )+(1 b
t
(U )) f
t
(X,U ). (3.10)
Let T = inf{t : b
t
(U
t
)=1}.Then
Y = f
0
(···f
T 2
( f
T 1
(g
T
(U
T
),U
T
),U
T 1
),...,U
1
) (3.11)
has the same distribution as X and E[T ]=1/P(b(U)=1).
Proof. The pro of is nearly identical to that of Theorem 1.1.
Using this for general version of the fundamental theorem for CFTP gives the
following result.
FROM THE PAST 49
Theorem 3.3. Let t(0) < t(1) < t(2) < ··· be positive integers. Let W
0
,W
1
,... be
independent with W
i
Unif([0, 1]
t(i)
), and A
t(i)
a measurable subset of [0,1]
t(i)
such
that
φ
t(i)
(Ω,W
i
)={y}. Suppose
i=1
P(W
i
A)=0. Then for any x
0
Ω,
Y
0
=
φ
t(0)
(x
0
,W
0
)1(W
0
A)+
φ
t(1)
(
φ
t
(x
0
,W
1
),W
0
)1(W
1
A)+
φ
t(2)
(
φ
t
(
φ
t
(x
0
,W
2
),W
1
),W
0
)1(W
2
A)+···
has Y
0
π
.
A common choice is to make t(i)=2t(i 1), so that the number of steps taken
doubles at each step. If the initial t = t(0) is 1, then doubling each time reaches t
after only a logarithmic number of recursions. Moreover, the total work done in all
of the blocks will be about 1 + 2 + 4 + ···+t
= 2t
1.
Doubling Coupling from the past Input: t Output: Y
1) Draw U Unif([0, 1]
t
)
2) If U A
t
3) Return
φ
t
(·,U)
4) Else
5) Draw X Doubling
Coupling from the past(2t)
6) Return
φ
t
(X,U )
It should be noted, however, that this doubling is only a rule of thumb. One could
use t(i)=(3/2)
i
,or4
i
,orjusti and the algorithm will still return a perfect sample,
no matter how big t
is. Only the number of steps (on average) taken by the algorithm
could change.
For a more sophisticated block size method designed to give greater concentra-
tion of the running time around the mean, see Section 11.6.
3.2.2 Recursion as history
There is another way of viewing CFTP that can make more intuitive sense, especially
to those already familiar with Markov chain Monte Carlo. Suppose that a Markov
chain is run forward in time yieldin g X
0
,X
1
,X
2
,.... Then as the number of steps
grows towards innity, Theorem 1.3 tells us that the state will converge towards
stationarity.
Now suppose that the Markov chain steps are not just nonnegative integers, but all
integers. So the process is ...,X
2
,X
1
,X
0
,X
1
,X
2
,...and not only runs innitely for-
ward in time, but also innitely backwards in time as well. Then by time 0, intuitively
the process has already been running for an “innite” number of steps. Therefore X
0
should already be stationary.
This intuitive idea can b e made precise using the idea of the reverse process for a
Markov chain (see Section 5.2 .1,) but for now let us just follow where this intuition
leads.
Suppose CFTP is run with t = 1. Then
φ
t
can be viewed as the step from X
1
to
X
0
. If coalescence occurs, the algorithm quits and returns X
0
. Otherwise, recursion
50 COUPLING FROM THE PAST
occurs, and t is doubled to 2. These two steps can be viewed as the step from X
3
to
X
2
and the step from X
2
to X
1
If coalescence occurs, then one more step in the Markov chain is taken, and recall
this was the step from X
1
up to X
0
. The two blocks together have moved the chain
from step X
3
up to step X
0
.
If coalescence does not occur. then now t = 4, and steps X
6
X
5
X
4
X
3
are taken. Then as the recursion unpacks, the two steps from X
3
X
2
X
1
are taken, nally followed by the step X
1
X
0
.
No matter how far back the recursion goes, the nal output of the procedure (from
this point of view) is X
0
π
. This way of viewing CFTP becomes very important for
designing perfect simulation algorithms for spatial point processes (see Chapter 7.)
3.3 Monotonic CFTP
To use CFTP requires an update function that is stationary with respect to
π
,anda
set A such that for U A,
φ
(Ω,U)={x}. The stationary requirement is is usually
easy, as it can be done by constructing a Markov chain whose stationary distribution
is
π
. The hard part is creating the sets A
t(i)
together with a method for determining if
U A
t(i)
.
It is this difculty that prevents CFTP from being used for every Monte Carlo
application. In the remainder of this chapter, different methods for nding such A
t(i)
will be discussed, together with applications. The rst method considered is the use
of a monotonic update function.
Monotonicity was used by Johnson [74] to detect the speed of convergence for
the Gibbs chain for the Ising model, although the term monotonicity was not used
there. Propp and Wilson [112] saw that this same idea could be used as the basis of a
CFTP-type algorithm for perfect simulation.
The idea is as follows. Suppose that the state space Ω admits a partial order.
Recall Denition 1.24, which states that the binary relation is a partial order if it is
reexive (so s s), antisymmetric (so s t and t s gives s = t), and transitive (so
s t and t r gives s r.) But unlike a total order (such as over R ), there could
be elements s and t for which neither s t or t s holds.
As an example, consider two Ising congurations x and y. Say that x y if
for all nodes v, x(v) y(v). So if there are four nodes, (1, 1, 1,1) precedes
(
1,1,1,1). However, the states (1 , 1, 1, 1) and (1,1,1,1) are incompa-
rable; neither one precedes the other.
Suppose further that Ω has a largest element x
max
, and a smallest element x
min
.
Continuing the Ising model example, the conguration of all 1 values is the small-
est element, and the conguration of all 1 values is the largest element.
Denition 3.3. Let
φ
be an update function for a single step of a Markov chain. If
φ
has the property that for all x y and u [0,1], it holds that
φ
(x,u)
φ
(y,u),then
call
φ
monotonic.
Consider running two Markov chains {X
t
} and {Y
t
} wh ere X
0
= x
0
and Y
0
= y
0
with x
0
y
0
. Suppose at each step, X
t+1
=
φ
(X
t
,U
t
) and Y
t+1
=
φ
(Y
t
,U
t
),where
U
1
,U
2
,... is an iid stream of Unif([0,1]) random variables.
MONOTONIC CFTP 51
Then a simple induction gives that X
t+1
Y
t+1
for all values of t. Moreover,
suppose x
0
= x
min
and y
0
= x
max
. Then any other state w
0
satises x
0
w
0
y
0
,and
so building a W
t
chain in similar fashion gives that X
t
W
t
Y
t
for all t.
Suppose that X
t
= Y
t
. The properties of a partial order imply that X
t
= W
t
=
Y
t
as well. The starting state of W
t
was arbitrary, so for any starting state w
0
Ω,
φ
t
(w
0
,u)=X
t
= Y
t
.
In other words, if after taking a xed number of steps, the largest state chain and
the smallest state chain (also referred to as the upper and lower chain) have reached
the same state, every other state in the chain has been trapped in between them,
and also reached the same state. This g ives rise to one of the most important CFTP
variants, monotonic CFTP.
Monotonic coupling from the past Input: t Output: Y
1) Draw U Unif([0, 1]
t
)
2) Let X
t
φ
t
(x
max
,U),andY
t
φ
t
(x
min
,U).
3) If X
t
= Y
t
3) Return X
t
4) Else
5) Draw X Monotonic
coupling from the past(2t)
6) Return
φ
t
(X,U )
3.3.1 The Ising model
Consider the Ising
Gibbs update from Section 1.8.1. In this update, a node v was
chosen uniformly at random, as was U Unif([0,1]).ThenN
c
was the number of
neighbors o f v labeled c.IfU < exp(N
1
β
)/[exp(N
1
β
)+exp(N
1
β
)], then the node
was labeled 1, otherwise it was labeled 0.
In order to use this update for perfect simulation, it needs to b e written as an
update function. That is, the current state and the random choices become input pa-
rameters.
Ising Gibbs Input: x ∈{1,1}
V
, u [0,1], v V Output: x ∈{1,1}
V
1) Let N
1
be the number of neighbors of v labeled 1 in x
2) Let N
1
be the number of neighbors of v labeled -1 in x
3) If U < exp (N
1
β
)/[exp(N
1
β
)+exp(N
1
β
)]
4) x(v) 1
5) Else
6) x(v) ←−1
When u Unif([0,1]) and v Unif(V ), this is the random scan Gibbs update from
Section 1.8.1.
Fact 3.1. The Ising
Gibbs update is monotonic when
β
> 0.
Proof. Let x y.Letv be any node, and N
c
(z) denote the number of neighbors of v
labeled c in conguration z. Then any neighbor of v in x labeled 1 is also labeled 1 in
y,soN
1
(x) N
1
(y). Similarly N
1
(x) N
1
(y).
52 COUPLING FROM THE PAST
Let f (n
1
,n
1
)=exp(N
1
(x)
β
)/[exp(N
1
(x)
β
)+exp(N
1
β
)].Since
β
> 0,
f (n
1
,n
1
) is increasing in n
1
, and decreasing in n
2
. Hence if U < f (N
1
(x),N
1
(x)),
then certainly U < f (N
1
(y),N
1
(y)).Thatis,ifx(v) is changed to 1 , then certainly y
is as well.
Similarly, if U > f (N
1
(y),N
1
(y)) so that y(v) has its label changed to 0, then
U > f (N
1
(x),N
1
(x)) and x(v) is changed to 0 as well.
Therefore no m atter what the choice of v and U is in the update, if x y then
φ
(x,v,U)
φ
(y,v,U).
This means that monotonic CFTP can be used to generate a random sample ex-
ample from the stationary distribution.
Monotonic Ising Gibbs Input: t Output: X
1) Draw (U
1
,...,U
t
) Unif([0,1]
t
),draw(V
1
,...,V
t
) Unif(V
t
)
2) Let x
max
(1, 1,...,1), x
min
(0,0,...,0)
3) For i from 1 to t
4) x
max
Ising Gibbs(x
max
,U
i
,V
i
)
5) x
min
Ising Gibbs(x
min
,U
i
,V
i
)
6) If x
max
= x
min
7) Draw x
max
Monotonic Ising Gibbs(2t)
8) For i from 1 to t
9) x
max
Ising Gibbs(x
max
,U
i
,V
i
)
10) X x
max
3.3.2 The hard-core gas model on bipartite graphs
Consider the HCGM
Gibbs update from Section 1.8.5. This update function is not
monotonic using the simple partial order x y (v V )(x(v) y(v)).
Therefore either a new partial order or a new update is needed. The problem is
that the chain is repulsive—neighbors in the graph cannot both be given the label 1.
However, when the graph is bipartite, it is possible to build a new partial order that
is monotonic.
Denition 3.4. A graph is bipartite if the vertex set V can be partitioned into V
1
and
V
2
so that for every edge e, one node of e belongs to V
1
and one node belongs to V
2
.
For congurations on a bipartite graph, the partial order is similar to that of the
previous section:
x y (v
1
V
1
)(v
2
V
2
)(x(v
1
) y(v
1
) x(v
2
) y(v
2
)). (3.12)
Fact 3.2. HCGM
Gibbs is monotonic with respect to (3.12).
Proof. If x y and node i v
2
is chosen to update, then if any neighbor w of i has
x(w)=1theny(w)=1 as well. So both x(i) and y(i) will always stay 0. The only
way there is a chance that y(i) is changed to 1 is if y(w)=0 for all neighbors w of i.
In this case all x(w)=0 as well. x(i)=y(i)=1(U <
λ
/(
λ
+ 1)).
When i v
1
the analysis is similar.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset