Chapter 8
The Randomness Recycler
Of course there are many ways we can reuse something. We can dye it. We
can cut it. We can change the buttons. Those are other ways to make it alive.
But this is a new step to use anything—hats, socks, shirts. It’s the rst step in
the process.
Issey Miyake
The last four chapters all utilized coalescence of Markov chains in order to
achieve perfect samples. With these techniques you could either be read once or
interruptible, but not both. The techniques of this and the next two chapters rely
on different ideas that move beyond Markov chains. This enables the construction
of linear time, read-once, and interruptible algorithms for sampling from high-noise
Markov random elds, along with algorithms for distributions on permutations where
coalescence is difcult to achieve.
8.1 Strong stationary stopping time
Before CFTP was introduced, a perfect simulation method using Markov chains
was introduced by Aldous and Diaconis [3], called strong stationary stopping times
(SSST). In general it is more difcult to nd an SSST than a CFTP algorithm for a
Markov chain, but an SSST does give a read-once interruptible algorithm. Moreover,
the idea leads to the far more general notation of the randomness recycler which is
applicable in most instances that CFTP is.
Recall that a stopping time T with respect to a sequence X
1
,X
2
,... is a random
variable T such that the event {T n} is measurable with respect to the infor mation
contained in X
0
,X
1
,...,X
n
.
Denition 8.1. A stationary stopping time is a stopping time T such that X
T
π
,
where
π
is the stationary distribution for the process. A strong stationary stopping
time is a stationary stopping time where T and X
T
are independent even conditioned
on the value of X
0
.
Once an SSST exists, the perfect simulation algorithm is simple: run the process
X
1
,X
2
,...forward until T occurs, then report X
T
. This gives a read- once, interruptible
(because T and X
T
are independent) perfect simulation algorithm.
For example, consider simple random walk with partially reecting boundaries
141
142 THE RANDOMNESS RECYCLER
on {1,2,3}. In this chain the transition probabilities p(a, b)=P(X
t+1
= b|X
t
= a)
are p(1 , 1)=p(1, 2)=p(2,1)=p(2, 3)=p(3,2)=p(3,3)=1/2.
The stationary distribution for this Mar kov chain is un iform over {1, 2,3}.
Therefore, a simple way of generating a stationary stopping time is to let Y
Unif({1,2,3}) andthenletT
1
= inf{t : X
t
= Y }.ThenX
T
= Y and so h as the cor-
rect distribution, but T
1
and X
T
1
are not independent conditioned on X
0
. For instance,
when X
0
= 1, P(T
1
> 2|X
t
= 3)=1butP(T
1
> 2 ) < 1. So this is an example of a
stationary time, but not a strong stationary time.
Here is an SSST for this Markov chain. Say that the chain tries to move right
when X
t+1
> X
t
,orwhenX
t+1
= X
t
= 3. Denote this type of step with an r. Similarly,
the chain tries to move left when X
t+1
< X
t
or X
t+1
= X
t
= 1. Denote this with an .
So if the chain started at X
0
= 2andtherst four steps where rr,thenX
4
= 2as
well.
The SSST looks at the chain in groups of two moves. First, wait until X
3k
= 1for
some integer k. Next, lo ok at the next two moves of the chain. If they are either rr,
,orr, then since each of these are equally likely to occur, X
3k+2
is equally likely
to be in {1,2,3}.
That is, for
T
2
= inf{3k + 2:X
3k
= 1, and the next two moves are either rr, , or r}, (8.1)
then X
T
2
Unif({1, 2,3}). Moreover, knowing the value of T
2
does not change the
distribution of X
T
2
, even when conditioned on X
0
.
8.1.1 Example: SSST for simple symmetric random walk on {1,2,...,n}
Now extend the walk to {1,2,...,n} so that p (a,a 1)=p(a,a + 1)=1/2forall
a ∈{2,...,n 1},andp(1,1)=p(n,n)=1/2.
To extend the SSST to this more com plicated chain, it is easiest to consider
how the distribution of X
t
evolves over time. Suppose X
0
= 1. This can be rep-
resented by the probability vector that places 100% of the probability on state 1,
namely, (1, 0, 0,...,0). After one step in the ch ain, the probability vector will be
(1/2,1/2, 0,0,...,0) indicating that P(X
1
= 1|X
0
= 1)=P(X
1
= 2 |X
0
= 1)=1/2,
so now X
1
Unif({1, 2}).
Now take one more step in the chain. The probability vector is now
(1/2,1/4, 1/4,0,0 ,...,0).ThestateX
2
is no longer uniform over a set of states.
However, the distribution of X
2
is the equal mixture of two uniforms, one uniform
over {1,2,3}, and one uniform over {1}. See Figure 8.1.
Given X
2
= x,drawY |X
2
= x as uniform over [0,P(X
2
= x)].Then(X
2
,Y )
Unif({(x,y) :0 y P(X
2
= x)}).
Note that [X
2
|Y 1/2] Unif({1, 2,3}). Similarly, [X
2
|Y > 1/2] Unif({1}).
In general, suppose that X
t
Unif({1,...,i}) for i ∈{2 ,...,n 1}.ThenX
t+1
will be a mixture of the uniform distribution over {1,...,i 1} and the uniform
distribution over {1,...,i +1}).WhenX
t+1
> i1, then X
t+1
must have been a d raw
from the the latter distribution. When X
t+1
i 1, there is a 50% chance that X
t+1
is
uniform over {1,...,i 1}, and a 50% chance that it is uniform over {1,...,i + 1}.
STRONG STATIONARY STOPPING TIME 143
1
2
3
Figure 8.1 The density of X
2
for simple symmetric random walk on {1,2,...,n} conditioned
on X
0
= 1. The dotted line shows that this density is actually a mixture of the uniform distri-
bution over {1} with the uniform distribution over {1,2, 3}.
This gives the following SSST-based algorithm for generating from the station-
ary distribution of simple symmetric random walk.
SSST for ssrw
1) X 1, X
1
2) Repeat
3) Draw U
1
Unif([0, 1])
4) X X 1(U
1
1/2,X > 1)+1(U
1
> 1/2,X < n)
5) Draw U
2
Unif([0, 1])
6) If X X
1andU
2
1/2
7) X
X
1
8) Else
9) X
X
+ 1
10) Until X
= n
Lemma 8.1. Let (X
t
,X
t
) denote the value of (X, X
) after t steps through the repeat
loop. Then for any x
∈{1,...,n}
n
such that P((X
0
,X
1
,...,X
t
)=x
) > 0,
[X
t
|(X
0
,X
1
,...,X
t
)=x
] Unif({1 ,...,x
t
}). (8.2)
Proof. The proof is by induction on t. The base case when t = 0has(X
0
,X
0
)=(1,1),
so the result holds trivially.
Now assume that (8.2) holds for t, and consider what happens at time t + 1. Then
P(X
t+1
= i|X
t+1
= x
t+1
,...,X
0
= x
0
)=
P(X
t+1
= i, X
t+1
= x
t+1
|X
t
= x
t
,...,X
0
= x
0
)
P(X
t+1
= x
t+1
|X
t
= x
t
,...,X
0
= x
0
)
.
At this point, there are several cases to consider.
For instance, suppose that x
t+1
= x
t
+ 1wherex
t
∈{2,...,n 1}.
If 1 < i < x
t
(which equals x
t+1
1 is this case) then X
t
is either i 1ori + 1,
each with probability 1/x
t
given X
t
= x
t
by the induction hypothesis. Therefore
P(X
t+1
= i, X
t+1
= x
t
+ 1|X
t
= x
t
,...,X
0
= x
0
)=
1
x
t
·
1
2
·
1
2
+
1
x
t
·
1
2
·
1
2
=
1
x
t
·
1
2
.
If i = x
t
or i = x
t
+ 1, then X
t
must be i 1, and
P(X
t+1
= i, X
t+1
= x
t
+ 1|X
t
= x
t
,...,X
0
= x
0
)=
1
x
t
·
1
2
.
144 THE RANDOMNESS RECYCLER
Therefore
P(X
t+1
= x
t
+ 1|X
t
= x
t
,...,X
0
= x
0
)=
x
t+1
i=1
1
x
t
·
1
2
=
x
t+1
2x
t
,
and for all i ∈{1,...,x
t
+ 1},
P(X
t+1
= i|X
t+1
= x
t
+ 1,...,X
0
= x
0
)=(2x
t
)
1
/[x
t+1
/(2x
t
)] = 1/x
t+1
,
which completes the induction in this case.
The other cases are shown in a similar fashion.
So the value of X
either goes up or down at each step. On average, it goes up
slightly more often than it goes down. Consider the following well-known lemma that
follows from the Optional Sampling Theorem. (For a proof, see for instance, [31].)
Lemma 8.2. Suppose {Y
t
}
t=0,1,...
is a stochastic process on {0,1,2,...} with
E[Y
t+1
|Y
t
] Y
t
δ
for some constant
δ
> 0, and |Y
t+1
Y
t
|≤M for all t. Let
T = inf{t : Y
t
= 0}.IfE[Y
0
] is nite then E[T ] E[Y
0
]/
δ
.
This fact can be transformed into many different results. For instance, if Y
t
is on a
state space of the form {1, 2,...,N}, |Y
t
Y
t+1
|≤1forallt and E[Y
t+1
|Y
t
] Y
t
+ f (i)
for all Y
t
i,thenE[inf{t : Y
t
= i + 1|Y
0
= i}] 1/ f (i). Using this fact, it is possible
to bound the mean of the SSST for this chain.
Lemma 8.3. Starting with X
0
= 1,letT= inf{t : X
t
= n}.ThenE[T ] < n(n 1)/2.
Proof. How much does X
t
increase on average as each step? Well, X
t
increases by
1inSSST
for ssrw when X
t+1
∈{1,...,X
t
1} and U
2
> 1/2orwhenX
t+1
{X
t
,X
t+1
}.
The chance of this occurring is
P(X
t+1
= X
t
+ 1)=
2
X
t
·P(U
1
> 1/2)+
2
X
t
P(U
1
1/2)P(U
2
> 1/2)+
X
t
2
X
t
P(U
2
> 1/2)
=
1
2
+
1
2X
t
.
That means P(X
t+1
= X
t
1)=[1/2] [1/(2X
t
)], and
E[X
t+1
|X
t
]=1/X
t
. (8.3)
So that means when X
t
= 1, then on average one step is needed to move it to an
X
value of 2. When X
t
= i,onaverageatmost1/(1/i)=i steps are needed to move
the process to an X
value of i + 1. To move from 1 up to n then, on average
1 + 2 + ···+(n 1)=n(n 1)/2
steps are needed.
EXAMPLE: RR FOR THE ISING MODEL 145
At each step, the goal is to make the X
t
state larger, since when X
t
= n,itis
known that X
t
is uniform over {1,2,...,n}. In the good case, X
t+1
= X
t
+ 1. But
when X
t+1
< X
t
, this good case only happens with probability 1/2.
When the good case is rejected, the state has lost some of its randomness. It
is known that the state is no longer uniform from {1,...,X
t
}. However, all is not
lost: conditioned on rejection, X
t+1
is uniform over {1,...,X
t
1}. Reclaiming at
least part of the distribution of X
t+1
after rejection is called recycling. Using this
recycling idea to keep the rejected sample as random as possible forms the basis of
the randomness recycler (RR).
1. There is an updating process {X
t
}.
2. There is an indexing process {X
t
} so that [X
t
|X
t
] has a known distribution.
3. ThereisavalueofX
t
such that [X
t
|X
t
= x
π
]
π
.
4. At each step, the conguration X
t
can be updated.
5. This allows the distribution indexed by X
t
to be updated.
6. Usually, when the distribution is closer to the target distribution, there is a chance
of rejecting the conguration as coming from the new distribution.
7. There is a sense in which the indexed distribution is either closer or farther away
from the target distribution.
In this example X
t
is a Markov chain, in which case T = inf{t : X
t
= x
π
} gives
a SSST for the Markov chain. In general, for the randomness recycler there is n o
requirement that X
t
must be a Markov chain. This gives exibility in updating X
t
that
allows for more targeted application of random moves. That in turn allows for easier
construction of linear time perfect simulation algorith ms for MRF’s.
8.2 Example: RR f or the Ising model
Before giving the general format of the randomness recycler, consider a specicex-
ample of the Ising model on a graph G =(V, E). Recall that here a conguration
labels each node 1 or 1, and the distribu tion
π
has density that is proportional to
exp(
β
H(x)),whereH(x) is the number of edges whose endpoints r eceive the
same label.
Begin by considering the graph that is a 4-cycle. In this graph V = {a,b,c,d},
and E = {{a,b},{b,c},{c,d},{d,a}}. The goal is to draw X
π
, but start with an
easier goal. Can we draw X
π
conditioned on all the values of X being 1? Of
course, just make X(a)=X (b)=X (c)=X (d)=1. Say that nodes a,b,c,andd are
frozen at the value 1.
Now consider taking a step towards the nal sample by “unfreezing” node a.The
goal is to make X
2
π
(·|X(b)=X(c)=X(d)=1). This is also easy. Since two of
the neighbors of a are still frozen at 1, X
2
(a)=1 with probability proportional to
exp(2
β
),andX
2
(a)=1 with probability proportional to exp(0
β
)=1. The nor-
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset