Chapter 4: Bounding Chains (1/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4

Bounding Chains

It is often safer to be in chains than to be free.

Franz Kafka

Monotonicity is a valuable property to h ave, but unfortunately, not every Markov

chain update function h as it. From among our examples, the Ising model when

< 0,

the Potts model with k ≥4, the h ard-core gas model on nonbipartite graphs, weighted

permutations, sink-free orientations, the antivoter model, and self-organizing lists

all have nonmonotonic Gibbs samplers. (While the slice sampler is monotonic, for

these examples it cannot be employed efﬁciently in its pure f orm.) While Gibbs sam-

plers are often monotonic, Metropolis-Hastings, chains on permutations, and auxil-

iary variable chains (like Swendsen-Wang) are often not.

The ﬁrst move away from monotonic Markov chains was the notion of antimono-

tonicity, introduced in [44, 96, 46]. Bounding chains, introduced independently in

[62] and [44], and further reﬁned in [63, 64, 65, 66, 51], are a generalization of anti-

monotonicity that apply to a wide variety of Markov chains.

4.1 What is a bounding chain?

Suppose an underlying chain state has nodes, each of which has a label. Then a

bounding chain keeps track of a set of possible labels at each node. It bounds the

underlying chain if it is possible to ensure that the actual label of each node falls into

the subset of labels in the bounding chain.

Let Ω = C

be the state space of the underlying chain. Call C the set of colors,

values, or labels, and V the set of dimensions, positions, coordinates, nodes, or ver-

tices. For instance, in the Ising model C = {−1 , 1} and V is the vertex set of the

graph. In the autonormal model C =[0,1]. For Brownian motion both C and V are

R. For permutations on n items, both C and V equal {1, 2,...,n}.

A bounding chain wraps around the underlying chain C

. At each node of V ,

the bounding chain keeps a subset of C that represents the possible colors permit-

ted for the underlying chain. So a bounding chain for the Ising model could have

either {−1}, {1}, or {−1,1}as the label of a vertex. The bounding part comes when

steps in the Markov chain are made. Let X

be the underlying chain state and Y

the

bounding chain state. If X

(v) ∈ Y

(v) for every v, then it should also be true that

t+1

(v) ∈Y

t+1

(v).Let2

denote the set of all subsets of the set C.

62 BOUNDING CHAINS

Deﬁnition 4.1. Say that y ∈ (2

)

bounds x ∈C

if for all v ∈V, x(v) ∈ y(v).

Deﬁnition 4.2. Say that the Markov process {Y

} over (2

)

is a bounding chain for

the Markov chain {X

} over C

if there exists a coupling between {Y

} and {X

} suc h

that if Y

bounds X

,thenY

t+1

bounds X

t+1

The simplest way to create such a coupling is by using an update function for the

Markov chain.

Theorem 4.1 (Building a bounding chain). Suppose X

t+1

) for all t, where

,... are iid Unif([0,1]).Createy



(y,u) as follows. For all v ∈V, set



(v)={c : ∃x bounded by y such that

(x,u) has color c at vertex v}.

Then

gives a bounding chain for {X

In other words, consider all the possible states x that are bounded by the bounding

state y. Take a step in each of those chains, and let the resulting set of labels for v

comprise the new bounding state at v.

Once a bounding chain update has been created, CFTP can be run as follows.

Bounding chain cftp Input: t Output: X

1) Draw U

,...,U

2) For all v ∈V ,setY (v)=C

3) For i from 1 to t

4) Y ← Bounding

chain update(Y,U

)

5) If #Y (v) > 1forsomev

6) X ← Bounding

chain cftp(2t)

7) For all v, Y (v) ←{X(v)}

8) For i from 1 to t

9) Y ← Bounding

chain update(Y,U

)

10) For all v,letX(v) be the single element of Y (v)

As usual, when writing something like Bounding chain cftp(Y,U

),theU

stands for all the random choices needed to be made inside the bounding chain step.

So it can be multiple random variables, or even an iid sequence if necessary.

With the general framework in place, now consider some speciﬁcexamples.

4.2 The hard-core gas model

Consider again the hard-core gas model (Deﬁnition 1 .20) where each node is labeled

either 0 or 1, two adjacent nodes cannot both be labeled 1, and the weight of a con-

ﬁguration is proportional to

raised to the number of nodes labeled 1. When the

graph was bipartite, it was possible to cast the Gibbs chain as a monotone chain with

the proper partial order. Now consider a graph (such as the triangular lattice) that is

not bipartite.

In the Gibbs update step from Section 1.8.5, a vertex is chosen uniformly at

random along with U ∼ Unif([0,1]).IfU <

/(1 +

) and no neighbors are labeled

0, then the node is relabeled 1. Otherwise, it stays at 0.

THE HARD-CORE GAS MODEL 63

Now consider the bounding chain. Nodes are now labeled with nonempty subsets

of {0,1}, so each node is labeled either {0}, {1},or{0,1}. If a node v has a neighbor

w that is labeled {1},thenanyx bounded by v has x(w)=1, so the node v must be

given the label {0}. If all neighbors of v are labeled {0},thenanyx bounded by v

has all nodes labeled 0, and so U determines wh ether y(v) is {0} or {1}.

But if some neighbors of v are labeled {0}, and some neighbors are labeled

{0,1},andU <

/(1 +

), then it is uncertain how to proceed. There is a state x

bounded by y with x(w)=1, in which case x(v)=0, but another state x



could have

all neighbors labeled 0, giving x



(v)=1. Hence y(v)={0,1} at the next state. The

following pseudocode formalizes this idea.

HCGM bounding chain Gibbs Input: y, v ∈V,U ∈ [0,1] Output: y

1) Let N

{1}

be the number of neighbors of v labeled 1 in y

2) Let N

{0,1}

be the number of neighbors of v labeled {0, 1} in y

3) If U >

/(1 +

) or N

{1}

> 0

4) y(v) ← 0

5) Else if N

{0,1}

= 0

6) y(v) ←{1}

7) Else if N

{1}

= 0andN

{0,1}

> 0

8) y(v) ←{0,1}

This bounding chain can be used with CFTP to obtain perfect samples. It can

also be a useful aid in analyzing the mixing time of the Markov chain. To understand

how long CFTP takes, it is necessary to bound the expected time needed for all of

the {0,1} labels to disappear.

Lemma 4.1. Let

denote t steps in HCGM bounding chain Gibbs.For

1/(Δ −1), the chance that the bounding chain bounds more than one state is at most

#V exp



−t(#V )

−1

(1 −

Δ/(1 +

))



In particular, setting

t = #V ln(2#V )



1 +

1 −

(Δ −1)



gives an expected running time of at most 4t to generate one sample with CFTP.

Proof. Let W

= {v : Y

(v)={0, 1}} denote the number of positions where the un-

derlying state x is not known completely. When a node v labeled {0,1} is selected,

its label is wiped away, reducing W

by 1. Since there are W

nodes labeled {0,1},the

chance that the chosen node has this label is W

/#V .

After v is chosen, consider the value of U.IfU <

/(1 +

) then the node might

end the step with the label {0,1}. In order for this to happen, it must be true that v is

adjacent to a node already labeled {0, 1}. There are at most W

Δ nodes labeled {0, 1}

in the graph.

The chance that the v picked is adjacent to a node labeled {0,1} is at most

64 BOUNDING CHAINS

Δ/#V . Hence

E[W

t+1

] ≤W

−

1 +

= W



1 −



1 −

1 +



Let

=(#V )

−1

(1 −

Δ/(1 +

)).NoteE[W

]=E[E[W

t−1

]] ≤ E[

t−1

E[W

t−1

]. So a simple induction gives E[W

] ≤

E[W

].For

> 0, 1−

≤exp (−

)

≤ exp(−t(#V)

−1

(1 −

Δ/(1 +

))).

Since W

is nonnegative and integral, and W

= #V , Markov’s inequality gives

P(W

> 0) ≤ E[W

] ≤ #V exp(−t(#V )

−1

(1 −

Δ/(1 +

The bound on the expected running time for CFTP then follows from Lemma 3.1.

4.2.1 The Dyer-Greenhill shifting chain

Now consider the more effective chain for the hard core gas model of Dyer and

Greenhill from Section 1.8.5, called HCGM

shift. Recall that in this chain when an

attempt is made to label node v with a 1 and exactly one neighbor is preventing that

from happening, there is a Bernoulli S ∼Bern(p

swap

).WhenS = 1, the value x(v)=1

and x(w)=0. So when S = 1, the value of x(w) gets shifted over to x(v).

For the bounding chain, this raises many cases to deal with. For instance, suppose

that U <

/(1 +

), S = 1, all but one neighbor of v has label {0}, and exactly one

neighbor w has label {0, 1}. In the original chain, this meant that y(v)={0, 1}.

But now consider how the shifting chain works. If that {0, 1}neighbor w is hiding

a 0 in the underlying state, then y(v) should be 1. If the {0,1} label at w is hiding a 1

in the underlying state, then y(v) should still be 1, since the neighboring 1 would get

shifted to v anyway! Moreover, the value of y(w) should be changed to {0}, since if

there was a 1 there, it has been shifted away!

Then the underlying state x might have x(w)=0orx(w)=1. But if U <

/(1 +

) and S = 1, then if x(w)=0, the new value of x(v) is 1. And if x(w)=1, then

the new value of x(v) is still 1, and the new value of x(w)=0. So y(v)={1} and

y(w)={0} for all underlying states.

So for all underlying states, y(v)={1}and y(w)={0} in the bounding chain.

That is a good move: the {0,1} label at the node w has been removed! Now

suppose that that all but two neighbors of v have label {0}, one neighbor w

has label

{0,1}, and one neighbor w

has label {1}. Then for x(w

)=x(w

)=1, whether or

not S = 1orS = 0, no swap occurs, and x(v)=0,x(w

)=x(w

)=1.

However, if x(w

)=0,x(w

)=1, then a swap would occur, so x(v)=1,x(w

0,x (w

)=0. So in the bounding chain, y(v)=y(w

)=y(w

)={0,1}.Thisisabad

move for the bounding chain, as two new {0,1} labels have been added!

To break down the bounding chain into cases, let N



denote the number of neigh-

bors of the chosen node v given label .WhenN

{0,1}

= 1, let w

{0,1}

denote the neigh-

bor of v with label {0,1},andwhenN

{1}

= 1, let w

{1}

be the neighbor of v with label

{1}. The complete bounding chain update is given in Table 4.1. In this table, deg(v)

denotes the d egree of v ,and∗ is a wildcard character that indicates any value.

So why go to the extra trouble of executing an update which is far more complex

THE HARD-CORE GAS MODEL 65

Table 4.1 Cases for updating the bounding c hain for the swap chain for the hard-core gas

model. A ∗ indicates that there is no restriction on that component of the vector. When N

= 1,

let w

be the neighbor of v with y(w

)=q.

Case (N

{0}

{1}

{0,1}

) Is U <

/(1 +

)? S new bounding state

1 (∗,∗,∗) No 0 or 1 y(v)={0}

2 (∗,0,0) Yes 0 or 1 y(v)={1}

3 (∗,0,1) Yes 0 y(v)={0,1}

4 (∗,0,1) Yes 1 y(v)={1}, y(w

{0,1}

)=0

5 (∗,1,∗) Yes 0 y(v)={0}

6 (∗,1,0) Yes 1 y(v)={1}, y(w

{1}

)={0}

7 (∗, 1,≥1) Yes 1 y(v)={0,1}, y(w

{1}

)={0, 1}

8 (∗, ≥ 2,∗) Yes 0 or 1 y(v)={0}

than that for the simple Gibbs model? Because it is possible to prove that this method

runs in polynomial time over a wider range of

Lemma 4.2. Let

denote t steps in the bounding chain for the swap chain with

swap

= 1/4.If

< 2/(Δ −2), then the chance that the bounding chain bounds more

than one state is at most

#V exp



−t(#V )

−1

(1 −

Δ/[2(1 +

)]



In particular, setting

t = #V ln(2#V )



2(1 +

)

2 −

(Δ −2)



gives an expected running time of at most 4t to generate one sample with CFTP.

Proof. As in the proof of the previous lemma, let W

= {v : Y

(v)={0,1}} denote

the number of positions labeled {0,1} in the bounding chain.

Use the case numbers in Table 4.1. When node v is ﬁrst selected, its label is

removed. If its label had been {0, 1},thenW

is reduced by 1. Then, depending on

the case, W

is increased by either 1 or 2. For example, in case 3, which occurs with

prob ability p

swap

/(1 +

), W

increases by 1. Case 3 increases W

by 1, case 4

actually decreases W

by 1, and case 7 increases W

by 2. Let n

denote the number

of nodes that activate either case 3 or 4 (so (N

{0}

{1}

{0,1}

)=(∗,0,1)), and n

the

number of nodes where (N

{0}

{1}

{0,1}

)=(∗,1,≥ 1).

Then

E[W

t+1

−W



−W

1 +



(1 −p

swap

) −n

swap

+ 2n

swap





Let p

swap

= 1/4, so n

(1 −p

swap

) −n

swap

= n

/2, and 2n

swap

= n

/2. Then

E[W

t+1

−W



−W

1 +

+ n

)



..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4: Bounding Chains (1/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 4: Bounding Chains (1/4)