Chapter 5: Advanced Techniques Using Coalescence (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

86 ADVANCED TECHNIQUES USING COALESCENCE

dx|X

= x)=

(dy)P(X

t+1

∈dx|X

= y),whichiswhyinDeﬁnition 1.34 suc h chains

were labe led reversible.

Okay, so now we know how to run the chain backwards in time. Suppose in our

simulation, X

= 0, X

= 1, X

= 2, X

= 1, X

= 0, and X

= 0. Now look at how

the chain can be run forward in time conditioned on the path (0,0,1,2,1,0) using the

updates X

t−1

In the ﬁrst step X

= 0givenX

= 0. That means that U

≤ 2/3. So generate U

uniformly from [0, 2/3]. Similarly, since X

= 0andX

= 1, U

> 2 /3. The rest of

the U

can be bounded similarly.

Note that regardless of the value of x

, setting x

t−1

) using U

≤ 2 /3,

> 2/3, U

≤2/3, and U

≤2/3givesx

= 0 = X

. Hence this gives an

block.

On the other hand, suppose (X

)=(1,0 , 1, 0,1,0). Then gener-

ating the U

gives a ﬁnal state of 1 if x

= 1and0ifx

= 0. So that is an example o f

an F block.

In practice, it is not necessary to retain the entire path (X

,...,X

). As the pair



−1



) the value of U



can be imputed and stored. So only the (U

,...,U



) need

be re tained.

5.2.1 Reversing a Markov chain

In the random walk on {0, 1,2}example, it is was easy to determine how the reversed

walk process behaved. For instance, if X

t+1

= X

+ 1, then in the reverse process

= X

t+1

−1, and so on. Now the general method for reversing a Markov chain is

discussed.

Recall Deﬁnitio n 1.27 of a Markov kernel K requires that for all x ∈ Ω, K(x,·) is

a probability distribution, and that for all measurable A and a ∈[0,1], the set of states

x such that K(x,A) ≤ a is also measurable.

In other words, what the kernel K gives us is a family of probability distributions.

Given the current state x, K(x,·) is the probability distribution for the next state y.

(The second part of the deﬁnition is necessary for technical reasons.)

Now the notion of a reverse kernel with respect to a distribution

can be created.

Deﬁnition 5.1. Let K be a kernel from (Ω,F ) to itself, and

a probability mea-

sur e on (Ω,F ).ThenK

rev

is a reverse kernel associated with

if for all bounded

measurable functions f deﬁned on Ω ×Ω



(x,y)∈Ω×Ω

f (x, y)

(dx)K(x, dy)=



(x,y)∈Ω×Ω

f (x, y)

(dy)K

rev

(y,dx).

Note that in general the reverse kernel might not even exist, and if it does, is not

always uniquely deﬁned!

While the deﬁnitio n above is presented in an abstract fashion, the notion that it is

capturing is more straightforward. Suppose X

∼

,andthenX

t+1

comes from taking

one step in the Markov chain. Now suppose the value of X

is lost. What is the new

distribution of X

given X

t+1

? This is what the reverse kernel gives us.

As was seen in the previous section, for the biased simple random walk on

FILL, MACHIDA, MURDOCH, AND ROSENTHAL’S METHOD 87

{0,1,2}, the kernel is also the reverse kernel. Fortunately, the basic toolkit for build-

ing Markov chains with a target stationary distribution often gives u s the reversed

kernel alongside the original one!

Theorem 5.1. Both Gibbs samplers and Metropolis-Hastings chains are reversible.

For more complicated chains, a reversible kernel can be constructed as long as

the distribution of the next state can be written using a density with respect to

Theorem 5.2. Suppose that for all x ∈ Ω, there exists a density p(x,·) with respect

such that K(x, B)=



p(x,y )

(dy). Then a reverse kernel exists where

rev

(y,dx)=

p(x,y )

(dy)



z∈Ω

p(z,y)

(dz)

The general proof of this theorem requires the use of a tool from measure the-

ory called Fubini’s Theorem, but when the density is with respect to the counting

measure things are simpler. In this case

(dy)=

(dz)=1forally and z,and



z∈Ω

p(z,y)

(dz)=

∑

z∈Ω

P(X

= y,X

= x), so this is saying

P(X

= x|X

= y)=

P(X

= x, X

= y)

∑

P(X

= z, X

= y)

just as in our earlier discrete example.

5.2.2 General FMMR

Now that we have a precise notion of how to reverse a Markov chain, the FMMR

procedu re can be written as follows. Let

rev

denote the update function for the re-

verse kernel of the Markov chain. As earlier, let

,...,U

) be the state X

conditioned on X

= x

FMMR Input: x ∈ Ω, t Output: X

∼

1) Repeat

2) X

← x

3) For i from t −1to0

4) Draw U ← Unif([0,1])

5) X

←

rev

i+1

,U)

6) Draw U

i+1

uniformly from [0,1] conditioned on

i+1

)=X

i+1

7) Until

(Ω,U)={x}

Lemma 5.2. Suppose that for x ∈ Ω and X

∼

, P(

(Ω,U)={x}) > 0. Then the

output of FMMR comes from

Proof. Fix a state x in Ω. Suppose X

∼

. Generate the forward uniforms U =

,...,U

until {U ∈ A} = {

(Ω,U)={x}}.ThenX

and {U ∈A} are independent

of each other, so [X

|U ∈ A] ∼

From AR theory, another way to generate [X

|U ∈A] is to draw X

and generate U

until {U ∈A} occurs. When this happens, the sample path X

t−1

) will end

at X

= x

. Therefore, by only working with the draws of U where the sample paths

end at X

= x

, the last accepted draw of U

remains from the correct distribution.

88 ADVANCED TECHNIQUES USING COALESCENCE

The memory requirement here is two states of the conﬁguration and recording

all of the U

,...,U

for use in the until condition. Therefore, FMMR is a read twice,

interruptible perfect simulation algorithm.

5.2.3 Example: FMMR for the Ising model

To see this approach on a nontrivial problem, consider perfect simulation from the

Ising model o f Section 1.6.1 using the Ising

Gibbs algo rithm from Section 1.8.1.

Since this is a Gibbs sampler style chain, the same kernel works for both the forward

process and the reverse process.

Let X

t+1

) whose value is the output of the function

Ising

Gibbs update function from Section 5.1.1. Suppose the states x

and x

t+1

are known, and the question is how can the values of V

t+1

and U

t+1

for the forward

step be simulated?

First consider the case that X

= X

t+1

.Since

can only change the value at a

single node, there exists a unique v such that X

(v) = X

t+1

(v).SoV

t+1

= v.

As in Ising

Gibbs update function,letn

denote the number of neigh-

bors of v labeled 1 in X

,andn

−1

the number of neighbors labeled −1. Set r =

exp(

)/[exp(

)+exp(

−1

)].ThenifX

t+1

(v)=−1, then U

t+1

is uniform over

[0,r],andifX

t+1

(v)=1, then U

t+1

is uniform over (r,1].

Now suppose that X

i+1

= X

. Then the goal is to choose (V

t+1

) uniformly

from the set (v,U) such that

i+1

,v,U)=X

. But that was already done! The (v,U )

that resulted in moving from X

t+1

to X

= X

t+1

was exactly such a draw. Hence no

additional work is needed.

This gives the following method.

FMMR Ising Gibbs Input: x, t Output: X ∼

1) X ←x

2) Repeat

3) For i from t −1to0

4) Draw (V

i+1

) ← Unif(V ×[0,1])

5) c ← Ising

Gibbs update function(X ,V

i+1

)

6) For d ∈{−1,1},letn

be the # of neighbors of V

i+1

labeled d in X

7) r ← exp(

)/[exp(

)+exp(

−1

)]

8) If X(V

i+1

)=1andc = −1thenU

i+1

← Unif([0, r ]), X (V

i+1

) ← c

10) Else if X(v

i+1

)=1andc = 1thenU

i+1

← Unif((r,1]), X(V

i+1

) ← c

12) x

min

← (−1,...,−1), x

max

← (1,...,1)

13) For i from 0 to t −1

14) x

min

i+1

) ← Ising Gibbs update function(x

min

i+1

)

15) x

max

i+1

) ← Ising Gibbs update function(x

max

i+1

)

16) Until x

max

= x

min

VARIABLE CHAINS 89

5.2.4 Doubling time

Note that in the p roof of Lemma 5.2, the size of t is not important: the distribution of

conditioned on coalescence detected in th e block is

That means that it is perfectly legal to double the value of t used at each step of

the process, in the same way as in the original CFTP. In this way it is not necessary

to worry about for what value of t the backwards coalescence is likely to occur.

5.3 Variable chains

In a bounding chain, a subset of labellings was updated at each step. A more general

approach is to keep track of the state as a function of the initial values of nodes. This

type of chain is called a variable chain, and can be used to understand how changes

in the initial state propagate through the system.

In a variable chain, each node is labeled not with a set of labellings, but rather

with a function of variables x

,...,x

. Initially, node v is given the label x

.Asthe

chain evolves, each node will always have a label that is a function of (x

,...,x

The variables that appear in (x

,...,x

) indicate where chan ges in the initial state

would have changed the current value of a node.

5.3.1 Antivoter model

Variable chains were introduced in [61] in order to generate samples from the anti-

voter model stationary distribution.

The antivoter model is a sequence of states that gives voter preferences that

change through time based on the preferences of the local neighbors.

The state space for a graph G =(V,E) is Ω = {0,1}

, so that each node is labeled

either 0 or 1. The Markov chain then chooses a node v uniformly at random, and

then a neighbor w of node v uniformly from the neighbors of v. Finally, x(v) is set to

1 −x(w).

In the voter model, if v then w is chosen x(v) is set to be equal to x(w).Onaﬁnite

graph, the voter model eventually reaches the all 0’s state or the all 1’s state and stays

there.

For the antivoter model, th e behavior is more interesting. When the grap h G is

regular with degree r ≥ 3 and is not bipartite, Donnelly and Welsh [30] showed that

this Markov chain has a unique stationary distribution that is uniform over every state

except the all 0’s and all 1’s states.

In the variable chain, each node v begins with a label x

(v)=x

. When the step

in the chain chooses v and then w to modify, the label o n v becomes 1 −x

. Suppose

that edge (3,4) in the chain was chosen. Then x

(4) is set to be 1 −x

. Note that at

this point, n o node has an x

in its label. Since the only way a node can get a variable

in its label is if it is next to a label with that variable, there is no longer any way for

a node to be labeled x

. That variable is gone forever from the state of the variable

chain.

The update then looks as follows.

90 ADVANCED TECHNIQUES USING COALESCENCE

Antivoter model variable chain

Input: old state x

Output: new state x

1) Draw v uniformly from V

2) Draw w uniformly from the neighbors of v

3) Set x

(v) to be 1 −x

(w)

It is easy to show that eventually, only one variable remains.

Fact 5.1. Consider a connected graph G that is regular of degree r ≥ 3 , and has a

minimum cut of size c ≥ 1. Then starting from state x

(v)=x

for all v ∈V , after

t steps, the chance that more than one variable remains in the node labels is at most

exp(−ct ((2/

)(n −1)

−2

2 +(1/(6

))(n −1)

−4

)#E

−1

Proof. Let n(z)=#{v : x

(v) ∈{x

,1 −x

}}be the number of nodes that are labeled

either x

or 1 −x

. (At time 0, n(z)=1forallz ∈V .) Let W

= max

n(z) after t steps,

max

= arg max

n(z),andV

= {v : x

(v) ∈{x

max

,1 −x

max

}}. Suppose there are k

edges connecting nodes in V

to nodes in V V

. Then the chance of choosing one of

these edges at the ﬁrst step is 2k(1/#V )(1/r).Ifv ∈V

and w /∈V

then W

increases

by 1. If v /∈ V

and w ∈ V

then W

decreases by at most 1. (It could be less: for

instance, if W

= 4usingx

or x

, and an edge removed an x

instance, then x

takes

over as the most common label and W

t+1

is still 4.)

Let W



= n −W

.ThenW

= n when W



= 0. Use the same potential function as

in the proof of Lemma 4.2:

(i)=sin(Ci)/sin(C), with C =

/(2(n −1)).Thenas

in (4.10),



)|W



t−1

] ≤

2#E



t−1

−1)+



1 −





t−1

2#E



t−1

+ 1)



t−1

)+k(2#E)

−1



t−1

−1)



t−1

)+k(2#E)

−1



t−1

)(2cos(C) −2)

≤



t−1

)

where

= 1 −ct(1 −cos(C))#E

−1

.(Sincec is the size of the minimum cut, k ≥ c.)

Now a simple induction gives E[



)] ≤W



When W



> 0 it holds that W



≥ 1, so by Markov’s inequality P(W



> 0)=



) > 0) ≤ E[



)]. Recall that 1 −x ≤ exp(−x) for all real x. Hence

≤

exp(−ct (1 −cos(

/(2(n −1))))#E

−1

).Using1−cos(x) ≥ x

/2 −x

/24 com pletes

the proof.

Once the variable chain has been reduced to a conﬁguration where every node

is labeled x

or 1 −x

, there are only two possible states, the one where x

= 0and

the one where x

= 1. Since the stationary distribution assigns equal weight to all

conﬁgurations, simply decide uniformly which of these two states to pick to complete

the perfect simulation algorithm.

With the antivoter model, the problem was that the basic bounding chain did not

have any traction to get started. In order to learn about the value of a node, at least

one node already needed to be known. The variable chain sidestepped that problem

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Advanced Techniques Using Coalescence (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5: Advanced Techniques Using Coalescence (2/4)