Chapter 4
Bounding Chains
It is often safer to be in chains than to be free.
Franz Kafka
Monotonicity is a valuable property to h ave, but unfortunately, not every Markov
chain update function h as it. From among our examples, the Ising model when
β
< 0,
the Potts model with k 4, the h ard-core gas model on nonbipartite graphs, weighted
permutations, sink-free orientations, the antivoter model, and self-organizing lists
all have nonmonotonic Gibbs samplers. (While the slice sampler is monotonic, for
these examples it cannot be employed efciently in its pure f orm.) While Gibbs sam-
plers are often monotonic, Metropolis-Hastings, chains on permutations, and auxil-
iary variable chains (like Swendsen-Wang) are often not.
The rst move away from monotonic Markov chains was the notion of antimono-
tonicity, introduced in [44, 96, 46]. Bounding chains, introduced independently in
[62] and [44], and further rened in [63, 64, 65, 66, 51], are a generalization of anti-
monotonicity that apply to a wide variety of Markov chains.
4.1 What is a bounding chain?
Suppose an underlying chain state has nodes, each of which has a label. Then a
bounding chain keeps track of a set of possible labels at each node. It bounds the
underlying chain if it is possible to ensure that the actual label of each node falls into
the subset of labels in the bounding chain.
Let Ω = C
V
be the state space of the underlying chain. Call C the set of colors,
values, or labels, and V the set of dimensions, positions, coordinates, nodes, or ver-
tices. For instance, in the Ising model C = {−1 , 1} and V is the vertex set of the
graph. In the autonormal model C =[0,1]. For Brownian motion both C and V are
R. For permutations on n items, both C and V equal {1, 2,...,n}.
A bounding chain wraps around the underlying chain C
V
. At each node of V ,
the bounding chain keeps a subset of C that represents the possible colors permit-
ted for the underlying chain. So a bounding chain for the Ising model could have
either {−1}, {1}, or {−1,1}as the label of a vertex. The bounding part comes when
steps in the Markov chain are made. Let X
t
be the underlying chain state and Y
t
the
bounding chain state. If X
t
(v) Y
t
(v) for every v, then it should also be true that
X
t+1
(v) Y
t+1
(v).Let2
C
denote the set of all subsets of the set C.
61
62 BOUNDING CHAINS
Denition 4.1. Say that y (2
C
)
V
bounds x C
V
if for all v V, x(v) y(v).
Denition 4.2. Say that the Markov process {Y
t
} over (2
C
)
V
is a bounding chain for
the Markov chain {X
t
} over C
V
if there exists a coupling between {Y
t
} and {X
t
} suc h
that if Y
t
bounds X
t
,thenY
t+1
bounds X
t+1
.
The simplest way to create such a coupling is by using an update function for the
Markov chain.
Theorem 4.1 (Building a bounding chain). Suppose X
t+1
=
φ
(X
t
,U
t
) for all t, where
U
0
,U
1
,U
2
,... are iid Unif([0,1]).Createy
=
φ
BC
t
(y,u) as follows. For all v V, set
y
(v)={c : x bounded by y such that
φ
t
(x,u) has color c at vertex v}.
Then
φ
BC
gives a bounding chain for {X
t
}.
In other words, consider all the possible states x that are bounded by the bounding
state y. Take a step in each of those chains, and let the resulting set of labels for v
comprise the new bounding state at v.
Once a bounding chain update has been created, CFTP can be run as follows.
Bounding chain cftp Input: t Output: X
1) Draw U
1
,U
2
,...,U
t
2) For all v V ,setY (v)=C
3) For i from 1 to t
4) Y Bounding
chain update(Y,U
i
)
5) If #Y (v) > 1forsomev
6) X Bounding
chain cftp(2t)
7) For all v, Y (v) ←{X(v)}
8) For i from 1 to t
9) Y Bounding
chain update(Y,U
i
)
10) For all v,letX(v) be the single element of Y (v)
As usual, when writing something like Bounding chain cftp(Y,U
i
),theU
i
stands for all the random choices needed to be made inside the bounding chain step.
So it can be multiple random variables, or even an iid sequence if necessary.
With the general framework in place, now consider some specicexamples.
4.2 The hard-core gas model
Consider again the hard-core gas model (Denition 1 .20) where each node is labeled
either 0 or 1, two adjacent nodes cannot both be labeled 1, and the weight of a con-
guration is proportional to
λ
raised to the number of nodes labeled 1. When the
graph was bipartite, it was possible to cast the Gibbs chain as a monotone chain with
the proper partial order. Now consider a graph (such as the triangular lattice) that is
not bipartite.
In the Gibbs update step from Section 1.8.5, a vertex is chosen uniformly at
random along with U Unif([0,1]).IfU <
λ
/(1 +
λ
) and no neighbors are labeled
0, then the node is relabeled 1. Otherwise, it stays at 0.
THE HARD-CORE GAS MODEL 63
Now consider the bounding chain. Nodes are now labeled with nonempty subsets
of {0,1}, so each node is labeled either {0}, {1},or{0,1}. If a node v has a neighbor
w that is labeled {1},thenanyx bounded by v has x(w)=1, so the node v must be
given the label {0}. If all neighbors of v are labeled {0},thenanyx bounded by v
has all nodes labeled 0, and so U determines wh ether y(v) is {0} or {1}.
But if some neighbors of v are labeled {0}, and some neighbors are labeled
{0,1},andU <
λ
/(1 +
λ
), then it is uncertain how to proceed. There is a state x
bounded by y with x(w)=1, in which case x(v)=0, but another state x
could have
all neighbors labeled 0, giving x
(v)=1. Hence y(v)={0,1} at the next state. The
following pseudocode formalizes this idea.
HCGM bounding chain Gibbs Input: y, v V,U [0,1] Output: y
1) Let N
{1}
be the number of neighbors of v labeled 1 in y
2) Let N
{0,1}
be the number of neighbors of v labeled {0, 1} in y
3) If U >
λ
/(1 +
λ
) or N
{1}
> 0
4) y(v) 0
5) Else if N
{0,1}
= 0
6) y(v) ←{1}
7) Else if N
{1}
= 0andN
{0,1}
> 0
8) y(v) ←{0,1}
This bounding chain can be used with CFTP to obtain perfect samples. It can
also be a useful aid in analyzing the mixing time of the Markov chain. To understand
how long CFTP takes, it is necessary to bound the expected time needed for all of
the {0,1} labels to disappear.
Lemma 4.1. Let
φ
BC
t
denote t steps in HCGM bounding chain Gibbs.For
λ
<
1/(Δ 1), the chance that the bounding chain bounds more than one state is at most
#V exp
t(#V )
1
(1
λ
Δ/(1 +
λ
))
.
In particular, setting
t = #V ln(2#V )
1 +
λ
1
λ
(Δ 1)
gives an expected running time of at most 4t to generate one sample with CFTP.
Proof. Let W
t
= {v : Y
t
(v)={0, 1}} denote the number of positions where the un-
derlying state x is not known completely. When a node v labeled {0,1} is selected,
its label is wiped away, reducing W
t
by 1. Since there are W
t
nodes labeled {0,1},the
chance that the chosen node has this label is W
t
/#V .
After v is chosen, consider the value of U.IfU <
λ
/(1 +
λ
) then the node might
end the step with the label {0,1}. In order for this to happen, it must be true that v is
adjacent to a node already labeled {0, 1}. There are at most W
t
Δ nodes labeled {0, 1}
in the graph.
The chance that the v picked is adjacent to a node labeled {0,1} is at most
64 BOUNDING CHAINS
W
t
Δ/#V . Hence
E[W
t+1
|W
t
] W
t
W
t
#V
+
W
t
Δ
#V
·
λ
1 +
λ
= W
t
1
1
#V
1
λ
Δ
1 +
λ

.
Let
γ
=(#V )
1
(1
λ
Δ/(1 +
λ
)).NoteE[W
t
]=E[E[W
t
|W
t1
]] E[
γ
W
t1
]=
γ
E[W
t1
]. So a simple induction gives E[W
t
]
γ
t
E[W
0
].For
α
> 0, 1
α
exp (
α
)
so
γ
t
exp(t(#V)
1
(1
λ
Δ/(1 +
λ
))).
Since W
t
is nonnegative and integral, and W
0
= #V , Markov’s inequality gives
P(W
t
> 0) E[W
t
] #V exp(t(#V )
1
(1
γ
Δ/(1 +
λ
).
The bound on the expected running time for CFTP then follows from Lemma 3.1.
4.2.1 The Dyer-Greenhill shifting chain
Now consider the more effective chain for the hard core gas model of Dyer and
Greenhill from Section 1.8.5, called HCGM
shift. Recall that in this chain when an
attempt is made to label node v with a 1 and exactly one neighbor is preventing that
from happening, there is a Bernoulli S Bern(p
swap
).WhenS = 1, the value x(v)=1
and x(w)=0. So when S = 1, the value of x(w) gets shifted over to x(v).
For the bounding chain, this raises many cases to deal with. For instance, suppose
that U <
λ
/(1 +
λ
), S = 1, all but one neighbor of v has label {0}, and exactly one
neighbor w has label {0, 1}. In the original chain, this meant that y(v)={0, 1}.
But now consider how the shifting chain works. If that {0, 1}neighbor w is hiding
a 0 in the underlying state, then y(v) should be 1. If the {0,1} label at w is hiding a 1
in the underlying state, then y(v) should still be 1, since the neighboring 1 would get
shifted to v anyway! Moreover, the value of y(w) should be changed to {0}, since if
there was a 1 there, it has been shifted away!
Then the underlying state x might have x(w)=0orx(w)=1. But if U <
λ
/(1 +
λ
) and S = 1, then if x(w)=0, the new value of x(v) is 1. And if x(w)=1, then
the new value of x(v) is still 1, and the new value of x(w)=0. So y(v)={1} and
y(w)={0} for all underlying states.
So for all underlying states, y(v)={1}and y(w)={0} in the bounding chain.
That is a good move: the {0,1} label at the node w has been removed! Now
suppose that that all but two neighbors of v have label {0}, one neighbor w
1
has label
{0,1}, and one neighbor w
2
has label {1}. Then for x(w
1
)=x(w
2
)=1, whether or
not S = 1orS = 0, no swap occurs, and x(v)=0,x(w
1
)=x(w
2
)=1.
However, if x(w
1
)=0,x(w
2
)=1, then a swap would occur, so x(v)=1,x(w
1
)=
0,x (w
2
)=0. So in the bounding chain, y(v)=y(w
1
)=y(w
2
)={0,1}.Thisisabad
move for the bounding chain, as two new {0,1} labels have been added!
To break down the bounding chain into cases, let N
denote the number of neigh-
bors of the chosen node v given label .WhenN
{0,1}
= 1, let w
{0,1}
denote the neigh-
bor of v with label {0,1},andwhenN
{1}
= 1, let w
{1}
be the neighbor of v with label
{1}. The complete bounding chain update is given in Table 4.1. In this table, deg(v)
denotes the d egree of v ,and is a wildcard character that indicates any value.
So why go to the extra trouble of executing an update which is far more complex
THE HARD-CORE GAS MODEL 65
Table 4.1 Cases for updating the bounding c hain for the swap chain for the hard-core gas
model. A indicates that there is no restriction on that component of the vector. When N
q
= 1,
let w
q
be the neighbor of v with y(w
q
)=q.
Case (N
{0}
,N
{1}
,N
{0,1}
) Is U <
λ
/(1 +
λ
)? S new bounding state
1 (,,) No 0 or 1 y(v)={0}
2 (,0,0) Yes 0 or 1 y(v)={1}
3 (,0,1) Yes 0 y(v)={0,1}
4 (,0,1) Yes 1 y(v)={1}, y(w
{0,1}
)=0
5 (,1,) Yes 0 y(v)={0}
6 (,1,0) Yes 1 y(v)={1}, y(w
{1}
)={0}
7 (, 1,1) Yes 1 y(v)={0,1}, y(w
{1}
)={0, 1}
8 (, 2,) Yes 0 or 1 y(v)={0}
than that for the simple Gibbs model? Because it is possible to prove that this method
runs in polynomial time over a wider range of
λ
.
Lemma 4.2. Let
φ
BC
t
denote t steps in the bounding chain for the swap chain with
p
swap
= 1/4.If
λ
< 2/(Δ 2), then the chance that the bounding chain bounds more
than one state is at most
#V exp
t(#V )
1
(1
λ
Δ/[2(1 +
λ
)]
.
In particular, setting
t = #V ln(2#V )
2(1 +
λ
)
2
λ
(Δ 2)
gives an expected running time of at most 4t to generate one sample with CFTP.
Proof. As in the proof of the previous lemma, let W
t
= {v : Y
t
(v)={0,1}} denote
the number of positions labeled {0,1} in the bounding chain.
Use the case numbers in Table 4.1. When node v is rst selected, its label is
removed. If its label had been {0, 1},thenW
t
is reduced by 1. Then, depending on
the case, W
t
is increased by either 1 or 2. For example, in case 3, which occurs with
prob ability p
swap
λ
/(1 +
λ
), W
t
increases by 1. Case 3 increases W
t
by 1, case 4
actually decreases W
t
by 1, and case 7 increases W
t
by 2. Let n
34
denote the number
of nodes that activate either case 3 or 4 (so (N
{0}
,N
{1}
,N
{0,1}
)=(,0,1)), and n
7
the
number of nodes where (N
{0}
,N
{1}
,N
{0,1}
)=(,1, 1).
Then
E[W
t+1
W
t
|W
t
,n
34
,n
7
]=
1
#V
W
t
+
λ
1 +
λ
n
34
(1 p
swap
) n
34
p
swap
+ 2n
7
p
swap
.
Let p
swap
= 1/4, so n
34
(1 p
swap
) n
34
p
swap
= n
34
/2, and 2n
7
p
swap
= n
7
/2. Then
E[W
t+1
W
t
|W
t
,n
34
,n
7
]=
1
#V
W
t
+
1
2
·
λ
1 +
λ
(n
34
+ n
7
)
.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset