Chapter 3
Coupling from the Past
Life can only be understood backwards; but it must be lived forwards
Søren Kierkegaard
Coupling from the past (CFTP) allowed the cr eation of perfect simulation algo-
rithms for problems where only Markov chain methods existed before. The easiest
way to use this approach is to take advantage of monotonicity in the update function
for a Markov chain. However, even for nonmonotonic problems, the use of bounding
chains can combine with CFTP to obtain perfect samples. Basic CFTP does have
drawbacks, though, the two biggest being noninterruptibility and the fact that the
random choices made by the algorithm must be used twice.
3.1 What is a coupling?
Coupling from the past, hereafter referred to as CFTP, is a method that allows ideas
from Markov chain theory to be used to create perfect sim ulation algorithms. When
introduced by Propp and Wilson [112] in the mid 1990s, it greatly extended the
applicability of pe rfect sim ulation, an d made efcient simulation of problems such
as the Ising model feasible for the rst time.
The idea is deceptively simple, and begins with the notion of an update function.
Denition 3.1. Say that
φ
: Ω ×[0,1] Ω is an update function for a Markov chain
{X
t
} if for U Unif([0, 1]), [X
t+1
|X
t
]
φ
(X
t
,U).
The function
φ
is deterministic—all of the randomness is contained in the value
of U . Any Markov chain that can be simulated on a computer is an example o f an
update function, so creating an update function representation of a Markov chain is
done every time a Markov chain is actually simulated. Update functions for a given
transition kernel a re n ot unique. In fact, a given Markov chain usually has an innite
number of different update functions that result in the same transition kernel.
Given an iid sequence U
0
,U
1
,U
2
,..., the entire process can be simulated by let-
ting X
1
=
φ
(x
0
,U
0
),andfori > 1, X
i
=
φ
(X
i1
,U
i1
).LetU denote the entire se-
quence, then let
φ
t
(x
0
,U)=
φ
(
φ
(
φ
(···(
φ
(x
0
,U
0
),U
1
),...,U
t1
)) be the state after t
steps in the chain starting from x
0
.
Suppose x
0
and y
0
are any two states in Ω.LetX
t
=
φ
t
(x
0
,U) and Y
t
=
φ
t
(y
0
,U)
43
44 COUPLING FROM THE PAST
1 2 3
0
1
2
Figure 3.1 A completely coupled set of processes. X (represented by squares) and Y (repre-
sented by circles) ar e simple symmetric random walks with partially r eecting boundaries on
Ω = {0,1,2}. They have coupled at time t = 3.
(using the same values of U). Then {X
t
} and {Y
t
} are both instances of the Markov
chain, just with (possibly) different starting points x
0
and y
0
.
A Markov chain can be created for every possible starting state x Ω using the
U values. A family of Markov chains that share a common kernel is known as a
coupling.
Note that with this coup ling, if X
t
= Y
t
= x for some t 0, then X
t
= Y
t
for all
t
t. Just as two train cars that are coupled together stay together, two processes at
the same state remain together no matter how much time passes.
An update function and the use of common U
t
values gives a coupling that applies
to the family of Mar kov chains started at every po ssible state in the set Ω,sothisis
also sometimes called a complete coupling [51].
Denition 3.2. Let S be a collection of stochastic processes d ened over a common
index set I and common outcome space Ω. If there is an index i I and state x Ω
such that for all S S ,S
i
= x, then say that the stochastic processes have coupled
or coalesced.
Example 3.1. Consider the simple symmetric random walk with partially reecting
boundaries on Ω = {0,1,2}. This Markov chain takes a step to the right (or left)
with probability 1/2, unless the move would take the state outside of Ω. An update
function for this chain is
φ
(x,U)=x + 1(x < 2,U > 1/2) 1(x > 0,U 1/2). (3.1)
Figure 3.1 is a pictur e of two Markov chains, {X
t
} and {Y
t
},whereX
0
= 0and
Y
0
= 2. The rst random choice is U
1
= 0.64 ..., so both chains attempt to increase by
1. The next random choice is U
2
= 0.234..., so both chains attempt to d ecrease by 1.
The third random choice is U
3
= 0.100 ..., so again both chains attempt to decrease
by one. At this point, both X
3
= Y
3
= 0. So the two chains have coupled at time 3 .
In fact, we could go further, and consider a chain started in state 1 at time 0. The
rst move takes this chain up to state 2, then the second move brings it down to 1,
and then the last move takes it to state 3. No ma tter where the chain started, after
three moves the state of the chain will be state 0 .
FROM THE PAST 45
3.2 From the past
Propp and Wilson started with this notion of coupling through use of an update func-
tion, and showed how this could be used to obtain perfect samples from the target
distribution.
To see how this works, consider a stationary state X . Then for a xed t,letY =
φ
t
(X,U ).ThenY is also stationary, because
φ
t
is the composition of t stationary
updates.
The output of CFTP will always be Y .LetW =(U
1
,U
2
,...,U
t
) be uniform over
[0,1]
t
. Consider a measurable set A [0, 1]
t
. Th e n either W A or W /A.So
Y =
φ
t
(X,W )1(W A)+
φ
t
(X,W )1(W / A). (3.2)
Suppose we could nd a set A, such that when W A,
φ
(X,W ) did not depend on
X. That is to say, whenever W A, it holds that
φ
(x,W )=
φ
(x
,W ) for all x,x
Ω.
Such a Markov chain h as completely forgotten its starting state through its random
moves. If this happens, then there is no need to know the value of X in order to nd
the value of Y !
Recall the example of the simple symmetric random walk with partially reecting
boundaries on {0,1,2}illustrated in Figure 3.1. In that example
φ
3
(0,W )=
φ
3
(1,W )=
φ
3
(2,W )=0. (3.3)
So
φ
3
({0,1,2},W)={0}.
So what is a valid choice of A here? Suppose that the rst three steps were
(up,down,down). This sequence of moves corresponds to the set of values A
1
=
(1/2,1] ×[0, 1/2 ] ×[0,1/2].IfW A
1
,then
φ
3
({0,1,2},W)={0}.
Another set of moves that moves all starting states together is (down,down,down).
Here A
2
=[0,1/2] ×[0, 1/2]×[0,1/2]. Of course, if A
1
and A
2
are both valid, so is
A
3
= A
1
A
2
.So
φ
3
({0,1,2},W)={0} for all W A
3
. The point is that the set A
does not have to be exactly the set of moves that coalesce down to a single point.
However, the bigger A is, the more likely it will be that W A.
Back to Equation (3.2). In the rst term, the value of X does not matter, so
φ
t
(X,W )1(W / A)=
φ
t
(x
0
,W )1(W A) where x
0
is an arbitrary element of the
state space. That is,
Y =
φ
t
(x
0
,W )1(W A)+
φ
t
(X,W )1(W / A). (3.4)
So when W A, there is no need to gure out what X is, in order to compute Y .
Okay, that is all well and good, but what happens when W / A?
Then it is necessary to evaluate
φ
(X,W ) to obtain Y . The central idea behind
CFTP is to nd X
π
by recursively calling CFTP (that is the “from the past” part
of the algorithm), and then evaluate Y =
φ
(X,W ) as before.
To see how this works in practice, it helps to alter our notation somewhat by
adding a time index. Let Y
0
= Y , W
0
= W ,andY
1
= X . With this notation,
Y
0
=
φ
t
(x
0
,W
0
)1(W
0
A)+
φ
t
(Y
1
,W
0
)1(W
0
/A). (3.5)
46 COUPLING FROM THE PAST
So if W
0
A, we are done. Otherwise, it is necessary to draw Y
1
. But this is done
in exactly the same way as drawing Y
0
:
Y
1
=
φ
t
(x
0
,W
1
)1(W
1
A)+
φ
t
(Y
2
,W
1
)1(W
1
/ A), (3.6)
where W
1
Unif([0, 1]
t
) and is independent of W
0
.
In general,
Y
i
=
φ
t
(x
0
,W
i
)1(W
i
A)+
φ
t
(Y
i1
,W
i
)1(W
i
/A), (3.7)
This can be taken back as far as necessary. The end result is an innite sequence
representation of a stationary state.
Theorem 3.1 (Coupling from the past). Suppose that
φ
is an update function for a
Markov chain over Ω such that for U =(U
0
,...,U
t1
) Unif([0,1]
t
) the following
holds.
Fo r Y
π
,
φ
(Y,U
0
)
π
.
There exists a set A [0,1]
t
such that P(U A) > 0 and
φ
t
(Ω,A)={x } for some
state x Ω.
Then for all x
0
Ω,
Y
0
=
φ
t
(x
0
,U
0
)1(U
0
A)+
φ
t
(
φ
t
(x
0
,U
1
),U
0
)1(U
1
A)+
φ
t
(
φ
t
(
φ
t
(x
0
,U
2
),U
1
),U
0
)1(U
2
A)+···
has Y
0
π
.
Proof. Fix x
0
Ω. Then the result follows from the Fundamental Theorem of perfect
simulation (Theo rem 1.1) using g(U)=
φ
t
(x
0
,U), b (U )=1(U A),and f (X,U)=
φ
t
(X,U ).
The key insight of Propp and Wilson was that it is not necessary to evaluate every
term in the sequence to nd Y
0
. We only need U
0
,...,U
T
,whereU
T
A. As long
as P(U A) > 0, T will be a geometric random variable with mean 1/P(U A).The
following code accomplishes this task.
Coupling from the past Output: Y
π
1) Draw U Unif([0,1]
t
)
2) If U A
3) Return
φ
t
(x
0
,U) (where x
0
is an arbitrary element of Ω)
4) Else
5) Draw X Coupling
from the past
6) Return
φ
t
(X,U )
This is a recursive algorithm; the alg orithm is allowed to call itself in line 5. When
the algorithm recursively calls itself, it does not pass the values of U as a parameter.
The new call to Coupling
from the past generates in line 1 its own value of U
that is indpendent of the choice of U in the calling functio n. This is important to do,
as otherwise the Fundamental Theorem of Perfect Simulation would not apply.
FROM THE PAST 47
Table 3.1 For simple symmetric random walk on {0,1,2}, this lists the possible outcomes and
their probabilities for taking two steps in the chain given U / A.
state X Moves
up-up up-down down-up
02 0 1
12 1 1
22 1 2
Example 3.2. Use CFTP to draw uniformly from {0,1,2}by using the Markov chain
with update function
φ
(x,U)=x + 1(x < 2,U > 1/2) 1(x > 0,U 1/2),
t = 2, and A =[0,1/2] ×[0,1/2].
As noted earlier,
φ
2
({0,1,2},U)={0} whenever U A .SotouseCFTP, rst
draw U .IfU A , output {0} and quit. Otherwise recursively call the algorithm in
order to nd X
π
. Then output
φ
(X,U ). Note that because this second step was
only executed when U /A, the output of
φ
(X,U ) in this second step is not stationary,
but instead is the distribution of
φ
(X,U ) conditioned on U / A.
The theory guarantees that the output of this procedure has the correct distribu-
tion, but it can also be veried directly in this example.
What is the probability that the output of the algorithm is 0? The event U A
occurs with probability (1/2)(1/2)=(1/4), so there is a 1/4 chance that the state is
0aftertherst step, and a 3/4 chance that the algorithm continues to the next step.
During this next step, the state X is a draw from the stationary distribution, and so
has a one-third chance of being either 0, 1, or 2. Conditioned on U / A, the uniforms
could either result in the chain moving up-up, up-down, or down-up. Therefore there
are nine possibilities to consider, the o utcomes of which are collected in Table 3.1.
For instance, if the state starts at 0, then the chain moves down-up, th en m oving down
leaves the state at 0 and then moving up brings the ch ain to 1. So the nal state is 1.
Each of these outcomes occur with probab ility 1/9. There is a 1/3 chance for
each starting state and (conditioned on U / A)a1/3 chance for each pair of moves.
Therefore, the chance of ending in state 0 is
1
4
+
3
4
·
1
9
=
4
12
=
1
3
. (3.8)
The chance the algorithm ends in state 1 or 2 is
0 +
3
4
·
4
9
=
1
3
. (3.9)
So the output does have the correct distribution! Now consider the running time,
as measured by the number of evaluations of the update function
φ
that is needed.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset