Chapter 8: The Randomness Recycler (1/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8

The Randomness Recycler

Of course there are many ways we can reuse something. We can dye it. We

can cut it. We can change the buttons. Those are other ways to make it alive.

But this is a new step to use anything—hats, socks, shirts. It’s the ﬁrst step in

the process.

Issey Miyake

The last four chapters all utilized coalescence of Markov chains in order to

achieve perfect samples. With these techniques you could either be read once or

interruptible, but not both. The techniques of this and the next two chapters rely

on different ideas that move beyond Markov chains. This enables the construction

of linear time, read-once, and interruptible algorithms for sampling from high-noise

Markov random ﬁelds, along with algorithms for distributions on permutations where

coalescence is difﬁcult to achieve.

8.1 Strong stationary stopping time

Before CFTP was introduced, a perfect simulation method using Markov chains

was introduced by Aldous and Diaconis [3], called strong stationary stopping times

(SSST). In general it is more difﬁcult to ﬁnd an SSST than a CFTP algorithm for a

Markov chain, but an SSST does give a read-once interruptible algorithm. Moreover,

the idea leads to the far more general notation of the randomness recycler which is

applicable in most instances that CFTP is.

Recall that a stopping time T with respect to a sequence X

,... is a random

variable T such that the event {T ≤ n} is measurable with respect to the infor mation

contained in X

,...,X

Deﬁnition 8.1. A stationary stopping time is a stopping time T such that X

∼

where

is the stationary distribution for the process. A strong stationary stopping

time is a stationary stopping time where T and X

are independent even conditioned

on the value of X

Once an SSST exists, the perfect simulation algorithm is simple: run the process

,...forward until T occurs, then report X

. This gives a read- once, interruptible

(because T and X

are independent) perfect simulation algorithm.

For example, consider simple random walk with partially reﬂecting boundaries

141

142 THE RANDOMNESS RECYCLER

on {1,2,3}. In this chain the transition probabilities p(a, b)=P(X

t+1

= b|X

= a)

are p(1 , 1)=p(1, 2)=p(2,1)=p(2, 3)=p(3,2)=p(3,3)=1/2.

The stationary distribution for this Mar kov chain is un iform over {1, 2,3}.

Therefore, a simple way of generating a stationary stopping time is to let Y ∼

Unif({1,2,3}) andthenletT

= inf{t : X

= Y }.ThenX

= Y and so h as the cor-

rect distribution, but T

and X

are not independent conditioned on X

. For instance,

when X

= 1, P(T

> 2|X

= 3)=1butP(T

> 2 ) < 1. So this is an example of a

stationary time, but not a strong stationary time.

Here is an SSST for this Markov chain. Say that the chain tries to move right

when X

t+1

> X

,orwhenX

t+1

= X

= 3. Denote this type of step with an r. Similarly,

the chain tries to move left when X

t+1

< X

or X

t+1

= X

= 1. Denote this with an .

So if the chain started at X

= 2andtheﬁrst four steps where rr,thenX

= 2as

well.

The SSST looks at the chain in groups of two moves. First, wait until X

= 1for

some integer k. Next, lo ok at the next two moves of the chain. If they are either rr,

,orr, then since each of these are equally likely to occur, X

3k+2

is equally likely

to be in {1,2,3}.

That is, for

= inf{3k + 2:X

= 1, and the next two moves are either rr, , or r}, (8.1)

then X

∼ Unif({1, 2,3}). Moreover, knowing the value of T

does not change the

distribution of X

, even when conditioned on X

8.1.1 Example: SSST for simple symmetric random walk on {1,2,...,n}

Now extend the walk to {1,2,...,n} so that p (a,a −1)=p(a,a + 1)=1/2forall

a ∈{2,...,n −1},andp(1,1)=p(n,n)=1/2.

To extend the SSST to this more com plicated chain, it is easiest to consider

how the distribution of X

evolves over time. Suppose X

= 1. This can be rep-

resented by the probability vector that places 100% of the probability on state 1,

namely, (1, 0, 0,...,0). After one step in the ch ain, the probability vector will be

(1/2,1/2, 0,0,...,0) indicating that P(X

= 1|X

= 1)=P(X

= 2 |X

= 1)=1/2,

so now X

∼ Unif({1, 2}).

Now take one more step in the chain. The probability vector is now

(1/2,1/4, 1/4,0,0 ,...,0).ThestateX

is no longer uniform over a set of states.

However, the distribution of X

is the equal mixture of two uniforms, one uniform

over {1,2,3}, and one uniform over {1}. See Figure 8.1.

Given X

= x,drawY |X

= x as uniform over [0,P(X

= x)].Then(X

,Y ) ∼

Unif({(x,y) :0≤ y ≤ P(X

= x)}).

Note that [X

|Y ≤ 1/2] ∼ Unif({1, 2,3}). Similarly, [X

|Y > 1/2] ∼ Unif({1}).

In general, suppose that X

∼ Unif({1,...,i}) for i ∈{2 ,...,n −1}.ThenX

t+1

will be a mixture of the uniform distribution over {1,...,i −1} and the uniform

distribution over {1,...,i +1}).WhenX

t+1

> i−1, then X

t+1

must have been a d raw

from the the latter distribution. When X

t+1

≤i −1, there is a 50% chance that X

t+1

uniform over {1,...,i −1}, and a 50% chance that it is uniform over {1,...,i + 1}.

STRONG STATIONARY STOPPING TIME 143

Figure 8.1 The density of X

for simple symmetric random walk on {1,2,...,n} conditioned

on X

= 1. The dotted line shows that this density is actually a mixture of the uniform distri-

bution over {1} with the uniform distribution over {1,2, 3}.

This gives the following SSST-based algorithm for generating from the station-

ary distribution of simple symmetric random walk.

SSST for ssrw

1) X ← 1, X

∗

← 1

2) Repeat

3) Draw U

∼ Unif([0, 1])

4) X ← X −1(U

≤ 1/2,X > 1)+1(U

> 1/2,X < n)

5) Draw U

∼ Unif([0, 1])

6) If X ≤ X

∗

−1andU

≤ 1/2

7) X

∗

← X

∗

−1

8) Else

9) X

∗

← X

∗

+ 1

10) Until X

∗

= n

Lemma 8.1. Let (X

∗

) denote the value of (X, X

∗

) after t steps through the repeat

loop. Then for any x

∗

∈{1,...,n}

such that P((X

∗

,...,X

∗

)=x

∗

) > 0,

|(X

∗

,...,X

∗

)=x

∗

] ∼ Unif({1 ,...,x

∗

}). (8.2)

Proof. The proof is by induction on t. The base case when t = 0has(X

∗

)=(1,1),

so the result holds trivially.

Now assume that (8.2) holds for t, and consider what happens at time t + 1. Then

P(X

t+1

= i|X

∗

t+1

= x

∗

t+1

,...,X

∗

= x

∗

P(X

t+1

= i, X

∗

t+1

= x

∗

t+1

∗

= x

∗

,...,X

∗

= x

∗

)

P(X

∗

t+1

= x

∗

t+1

∗

= x

∗

,...,X

∗

= x

∗

)

At this point, there are several cases to consider.

For instance, suppose that x

∗

t+1

= x

∗

+ 1wherex

∗

∈{2,...,n −1}.

If 1 < i < x

∗

(which equals x

∗

t+1

−1 is this case) then X

is either i −1ori + 1,

each with probability 1/x

∗

given X

∗

= x

∗

by the induction hypothesis. Therefore

P(X

t+1

= i, X

∗

t+1

= x

∗

+ 1|X

∗

= x

∗

,...,X

∗

= x

∗

If i = x

∗

or i = x

∗

+ 1, then X

must be i −1, and

P(X

t+1

= i, X

∗

t+1

= x

∗

+ 1|X

∗

= x

∗

,...,X

∗

= x

∗

144 THE RANDOMNESS RECYCLER

Therefore

P(X

∗

t+1

= x

∗

+ 1|X

∗

= x

∗

,...,X

∗

= x

∗

t+1

∑

i=1

∗

t+1

∗

and for all i ∈{1,...,x

∗

+ 1},

P(X

t+1

= i|X

∗

t+1

= x

∗

+ 1,...,X

∗

= x

∗

)=(2x

∗

)

−1

/[x

∗

t+1

/(2x

∗

)] = 1/x

∗

t+1

which completes the induction in this case.

The other cases are shown in a similar fashion.

So the value of X

∗

either goes up or down at each step. On average, it goes up

slightly more often than it goes down. Consider the following well-known lemma that

follows from the Optional Sampling Theorem. (For a proof, see for instance, [31].)

Lemma 8.2. Suppose {Y

}

t=0,1,...

is a stochastic process on {0,1,2,...} with

E[Y

t+1

] ≤ Y

−

for some constant

> 0, and |Y

t+1

−Y

|≤M for all t. Let

T = inf{t : Y

= 0}.IfE[Y

] is ﬁnite then E[T ] ≤ E[Y

This fact can be transformed into many different results. For instance, if Y

is on a

state space of the form {1, 2,...,N}, |Y

−Y

t+1

|≤1forallt and E[Y

t+1

] ≥Y

+ f (i)

for all Y

≤i,thenE[inf{t : Y

= i + 1|Y

= i}] ≤ 1/ f (i). Using this fact, it is possible

to bound the mean of the SSST for this chain.

Lemma 8.3. Starting with X

∗

= 1,letT= inf{t : X

∗

= n}.ThenE[T ] < n(n −1)/2.

Proof. How much does X

∗

increase on average as each step? Well, X

∗

increases by

1inSSST

for ssrw when X

t+1

∈{1,...,X

∗

−1} and U

> 1/2orwhenX

t+1

∈

∗

t+1

The chance of this occurring is

P(X

∗

t+1

= X

∗

+ 1)=

∗

·P(U

> 1/2)+

∗

P(U

≤ 1/2)P(U

> 1/2)+

∗

−2

∗

P(U

> 1/2)

∗

That means P(X

∗

t+1

= X

∗

−1)=[1/2] −[1/(2X

∗

)], and

E[X

∗

t+1

∗

]=1/X

∗

. (8.3)

So that means when X

∗

= 1, then on average one step is needed to move it to an

∗

value of 2. When X

∗

= i,onaverageatmost1/(1/i)=i steps are needed to move

the process to an X

∗

value of i + 1. To move from 1 up to n then, on average

1 + 2 + ···+(n −1)=n(n −1)/2

steps are needed.

EXAMPLE: RR FOR THE ISING MODEL 145

At each step, the goal is to make the X

∗

state larger, since when X

∗

= n,itis

known that X

is uniform over {1,2,...,n}. In the good case, X

∗

t+1

= X

∗

+ 1. But

when X

t+1

< X

∗

, this good case only happens with probability 1/2.

When the good case is rejected, the state has lost some of its randomness. It

is known that the state is no longer uniform from {1,...,X

∗

}. However, all is not

lost: conditioned on rejection, X

t+1

is uniform over {1,...,X

∗

−1}. Reclaiming at

least part of the distribution of X

t+1

after rejection is called recycling. Using this

recycling idea to keep the rejected sample as random as possible forms the basis of

the randomness recycler (RR).

1. There is an updating process {X

2. There is an indexing process {X

∗

} so that [X

∗

] has a known distribution.

3. ThereisavalueofX

∗

such that [X

∗

= x

∗

] ∼

4. At each step, the conﬁguration X

can be updated.

5. This allows the distribution indexed by X

∗

to be updated.

6. Usually, when the distribution is closer to the target distribution, there is a chance

of rejecting the conﬁguration as coming from the new distribution.

7. There is a sense in which the indexed distribution is either closer or farther away

from the target distribution.

In this example X

is a Markov chain, in which case T = inf{t : X

∗

= x

∗

} gives

a SSST for the Markov chain. In general, for the randomness recycler there is n o

requirement that X

must be a Markov chain. This gives ﬂexibility in updating X

that

allows for more targeted application of random moves. That in turn allows for easier

construction of linear time perfect simulation algorith ms for MRF’s.

8.2 Example: RR f or the Ising model

Before giving the general format of the randomness recycler, consider a speciﬁcex-

ample of the Ising model on a graph G =(V, E). Recall that here a conﬁguration

labels each node 1 or −1, and the distribu tion

has density that is proportional to

exp(−

H(x)),where−H(x) is the number of edges whose endpoints r eceive the

same label.

Begin by considering the graph that is a 4-cycle. In this graph V = {a,b,c,d},

and E = {{a,b},{b,c},{c,d},{d,a}}. The goal is to draw X ∼

, but start with an

easier goal. Can we draw X ∼

conditioned on all the values of X being 1? Of

course, just make X(a)=X (b)=X (c)=X (d)=1. Say that nodes a,b,c,andd are

frozen at the value 1.

Now consider taking a step towards the ﬁnal sample by “unfreezing” node a.The

goal is to make X

∼

(·|X(b)=X(c)=X(d)=1). This is also easy. Since two of

the neighbors of a are still frozen at 1, X

(a)=1 with probability proportional to

exp(2

),andX

(a)=−1 with probability proportional to exp(0

)=1. The nor-

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: The Randomness Recycler (1/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8: The Randomness Recycler (1/4)