FROM THE PAST 45
3.2 From the past
Propp and Wilson started with this notion of coupling through use of an update func-
tion, and showed how this could be used to obtain perfect samples from the target
distribution.
To see how this works, consider a stationary state X . Then for a fixed t,letY =
φ
t
(X,U ).ThenY is also stationary, because
φ
t
is the composition of t stationary
updates.
The output of CFTP will always be Y .LetW =(U
1
,U
2
,...,U
t
) be uniform over
[0,1]
t
. Consider a measurable set A ⊂[0, 1]
t
. Th e n either W ∈ A or W /∈A.So
Y =
φ
t
(X,W )1(W ∈ A)+
φ
t
(X,W )1(W /∈ A). (3.2)
Suppose we could find a set A, such that when W ∈A,
φ
(X,W ) did not depend on
X. That is to say, whenever W ∈ A, it holds that
φ
(x,W )=
φ
(x
,W ) for all x,x
∈ Ω.
Such a Markov chain h as completely forgotten its starting state through its random
moves. If this happens, then there is no need to know the value of X in order to find
the value of Y !
Recall the example of the simple symmetric random walk with partially reflecting
boundaries on {0,1,2}illustrated in Figure 3.1. In that example
φ
3
(0,W )=
φ
3
(1,W )=
φ
3
(2,W )=0. (3.3)
So
φ
3
({0,1,2},W)={0}.
So what is a valid choice of A here? Suppose that the first three steps were
(up,down,down). This sequence of moves corresponds to the set of values A
1
=
(1/2,1] ×[0, 1/2 ] ×[0,1/2].IfW ∈ A
1
,then
φ
3
({0,1,2},W)={0}.
Another set of moves that moves all starting states together is (down,down,down).
Here A
2
=[0,1/2] ×[0, 1/2]×[0,1/2]. Of course, if A
1
and A
2
are both valid, so is
A
3
= A
1
∪A
2
.So
φ
3
({0,1,2},W)={0} for all W ∈ A
3
. The point is that the set A
does not have to be exactly the set of moves that coalesce down to a single point.
However, the bigger A is, the more likely it will be that W ∈ A.
Back to Equation (3.2). In the first term, the value of X does not matter, so
φ
t
(X,W )1(W /∈ A)=
φ
t
(x
0
,W )1(W ∈ A) where x
0
is an arbitrary element of the
state space. That is,
Y =
φ
t
(x
0
,W )1(W ∈ A)+
φ
t
(X,W )1(W /∈ A). (3.4)
So when W ∈ A, there is no need to figure out what X is, in order to compute Y .
Okay, that is all well and good, but what happens when W /∈ A?
Then it is necessary to evaluate
φ
(X,W ) to obtain Y . The central idea behind
CFTP is to find X ∼
π
by recursively calling CFTP (that is the “from the past” part
of the algorithm), and then evaluate Y =
φ
(X,W ) as before.
To see how this works in practice, it helps to alter our notation somewhat by
adding a time index. Let Y
0
= Y , W
0
= W ,andY
−1
= X . With this notation,
Y
0
=
φ
t
(x
0
,W
0
)1(W
0
∈ A)+
φ
t
(Y
−1
,W
0
)1(W
0
/∈A). (3.5)