Chapter 5: Advanced Techniques Using Coalescence (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

DEALING WITH INFINITE GRAPHS 91

by keeping track of how the values of states percolate through the chain. (Lubetzky

would later refer to this as information percolation, which h e used to study cutoff

phenomena in the mixing time for the Ising model [87].)

5.4 Dealing with inﬁnite graphs

So far, the algorithms used to generate from the Markov random ﬁelds have only

applied when the graph in question V is ﬁnite. In this section it is shown how to draw

from a projection of the inﬁnite graph onto a ﬁnite subset.

Let V denote a possibly inﬁnite set of vertices, and C be the ﬁnite set o f labellings

for a node.

Deﬁnition 5.2. Let

∂

S = {w : ∃s ∈ S,{s,w}∈E} be the neighbors of the set S. Then

a random conﬁguration X with distribution

is a Markov random ﬁeld if

has

conditional probabilities such that for all ﬁnite S ⊆V, all x ∈C

and y ∈C

V S

(X(S)=x|X(V S)=y)=P

(X(S)=x|X(

∂

S)=y(

∂

S)). (5.1)

(So the distribution of X over S only depends on the values on the boundary of S, and

not on further distant nodes.) Call the set of conditional distributions a speciﬁcation.

In the case that V is ﬁnite, a speciﬁcation is enough to uniquely determine the dis-

tribution

.WhenV is inﬁnite, there will be at least one

with a given sp eciﬁcation,

but there also might be more than one such distribution.

Deﬁnition 5.3. Suppose that a node has a ﬁxed distribution given the values on the

neighboring nodes, and that there exists a unique measure on the state space satisfy-

ing these conditional distributions. Then call the unique measure a Gibbs measure.

The question of when the Gibbs measure is unique is a central issue in Markov

ﬁeld theory. Nonuniqueness occurs at values of the parameter known as a phase

transition.

Deﬁnition 5.4. The Ising m odel Gibbs measure has a speciﬁcation given by

P(X(S )=x|X(

∂

S)=y) ∝ exp(−

H(x(S ∪

∂

S))), (5.2)

where

H(x)=−

∑

{i, j}

1(x(i)=x( j)). (5.3)

When the distribution of a single node is “sufﬁciently” independent of the values

of its neighbors, say that the speciﬁcation satisﬁes a high-noise requirement. Typ-

ically the higher the noise, the easier it becomes to show there is a unique Gibbs

measure, and to generate samples from the measure.

There have been several different high-noise requirements, such as Dobrushin

uniqueness [28] or a condition of van den Berg and Maes [ 126]. The on e that will

be of use here is due to H¨aggstr¨om and Steif [45], and has the nice feature that not

only does it guarantee the existence of a unique Gibbs measure, it also guarantees the

existence of a linear time method for d rawing a sample perfectly from the measure.

92 ADVANCED TECHNIQUES USING COALESCENCE

Deﬁnition 5.5. Fo r c ∈C, deﬁne a parameter of independence of the distribution of

a node from its neighbors as follows:

(c)=min

v∈V

min

y∈C

∂

{v}

P(X(v)=c|X (

∂

{s})=y), (5.4)

∑

c∈C

(c). (5.5)

Say that a speciﬁcation is HS-high noise if for Δ the maximum degree in the graph,

Δ −1

. (5.6)

At each step of the process, a subset of nodes must be chosen so that no two

nodes in the subset are connected by an edge (this is known as an independent set of

nodes.) The labels on these nodes will simultaneously be updated. Because none are

connected, they can all be updated simultaneously.

One way to do this suggested in [45] is to independently ﬂip a p-coin for every

node in V . If a node comes up heads and all of its neighbors are tails, then the node

becomes part of the updating set. The probability that a particular node becomes part

of the set is then at least p(1 − p)

, which is maximized by taking p =(Δ + 1)

−1

For every node in the updating set, choose a new label for the node conditioned

on the values of the neighbors. If the value of one or more neighbors are unknown,

then draw U uniformly from [0,1], and set aside a subinterval of length

c ∈ C.IfU falls in the su binterval associated with color c, then label the node c.

Otherwise label the node “unknown.”

To be precise, label a node v with Z

(v)=1 if it is unknown and Z

(v)=0ifitis

known after i steps in this process.

Lemma 5.3 (Proposition 2.1 of [45]). Suppose there is a subset of nodes S with

(v)=1 for all v ∈

.Letp

∗

= p(1 − p)

. Then using the updating scheme given

in the previous paragraph, for a node v that is at least k distance in the graph from

∂

S, and

as in (5.5).

P(Z

(v)=1) ≤



1 −

(1 −Δ(1 −

))

(Δ + 1)

Δ+1



. (5.7)

Proof. The proof is by induction on k. The base case of k = 0 is trivially true. Assume

it holds for k.Letx be any node at least k + 1 distance from the boundary of S.

Let b(i)=(1 −(Δ

(1 −Δ(1 −

)))/(Δ + 1)

Δ+1

)

. Then any node adjacent to x is

at least k distance from the boundary of S .LetY

denote the set o f nodes chosen to

update at each step. As before, let p(x)=P(x ∈ Y

).Thenifx is not updated, use

the induction hypothesis to bound the chance Z

k+1

(x)=1. If it is updated, then the

chance Z

k+1

(x)=1isatmost1−

if any neighbor w of x has Z

(w)=1. The union

bound makes this chance at most Δb(i). Hence

P(Z

k+1

(x)=1)=P(x ∈Y

)P(Z

k+1

(x)=1|x ∈Y

)+P(x /∈Y

)P(Z

k+1

(x)=1|x /∈Y

)

≤ p(x)(1 −

)Δ ·b(i)+(1 −p(x))b(i)

= b(i)[(1 −

)Δ · p(x)+(1 −p(x))].

DEALING WITH INFINITE GRAPHS 93

For (1 −

)Δ < 1, the right-hand side is maximized when p(x) is minimized, and

p(x) ≥ p

∗

=(Δ + 1)

−1

. Simplifying this expression then completes the induction.

Let N(v) denote the set of nodes w such that {v, w}∈E. This then gives the

following method:

Finitary coding Input: W

1) k ← 0

2) Repeat

3) k ← k + 1

4) Let W

−k

be all points within distance k of W

5) For all v ∈W

−k

,letZ

−k

(v)=1

6) For t from −k to −1

7) Draw Y

⊆W

−k

to update

8) For all v ∈Y

9) Draw U ← Unif([0,1])

10) If U ≤

11) Z

t+1

(v) ← 0, set X

t+1

(v) by which subinterval for c that U is in

12) Else

13) If there is a neighbor w of v with Z

(w)=1

14) Z

t+1

(v)=1

15) Else

16) Z

t+1

(v)=0, set X

t+1

(v) given X

(N(v)) and U >

17) Until Z

(v)=0forallv ∈W

Theorem 5.3 (Theorem 2.2 of [45]). For any ﬁnite W , the Finitary coding ter-

minates with probability 1, and outputs an unbiased sample X from the distribution

over the observation window W .

Proof. From Lemma 5.3, the chance a node v ∈W has Z

(v)=1isatmost



1 −

(1 −Δ(1 −

))

(Δ + 1)

Δ+1



. (5.8)

So the chance that any of the elements in W has Z

(v)=1isatmost#W times this

bound. This tends to 0 as k → ∞, so the algorithm terminates w ith probability 1.

The p roof that the output comes from the correct distr ibution is similar to earlier

versions of CFTP.LetT denote the minimum value of k needed for the algorithm to

terminate. Then as shown above T is ﬁnite with probability 1.

For each k,letR

∗

be a draw from the distribution over the observation window

−k

. Then running this forward k steps gives a r esult R

, which is still stationary.

Thus the set of variables R

,... all h ave the correct distribution. But R

con-

verges to X, the output o f the algorithm with pr obability 1. The on ly way this can

happen is if X has the stationary distribution.

94 ADVANCED TECHNIQUES USING COALESCENCE

5.5 Clan of ancestors

So far the running time for generating samples has been nearly linear in the number

of nodes in the sample, O(#V ln(#V )). In this section it is shown that by restricting

carefully the nodes that are considered in an update, this can be reduced to time linear

in the number of nodes.

The idea is as follows. Consider th e ﬁnitary coding of H¨aggstr¨om and Steif [45],

and think about what the value of a speciﬁc node v at time 0 is. To learn about the

value of the node, it is necessary to go back in time until the last time that the node

was updated.

At this time in the past, there is a

chance that the value of v does not depend

in any way on the values of the neighbors. On the other hand, if a U is drawn for

v at this point that indicates that the values of the neighbors are important, then the

neighbors of v need to be added to the clan of ancestors of v.

Each of these neighbors, then, n eeds its value to be determined, which can be

done recursively. Some of these neighbors might have had U ≤

when it was up-

dated, or their neighbors might go on the list to be determined.

The result is similar to a branching process, and certainly has a tree structure.

It is not exactly a branching process b ecau se of the lack of independence between

“children.” However, ideas from branching processes can be used to bound the over-

all size of the cluster. Because the nodes of the cluster are present f or the birth of

later nodes, call these nodes the clan of ancestors of the original node.

Lemma 5.4. Suppose

> (Δ −1)/Δ. Then the expected size in the clan of ancestors

of v (including v) is at most (1 −Δ(1 −

))

−1

Proof. Let C

= {v}, C

denote the ﬁrst generation of ancestors, C

the second, and

so on. Then the expected number of nodes in generation i + 1isatmostΔ(1 −

)

times the expected number of nodes in generation i, since each node in i has only a

1 −

chance of adding at most Δ nodes to the next generation.

Hence E[#C

i+1

|#C

] ≤ Δ(1 −

, and taking the expected value of both sides

gives E[#C

i+1

] ≤ Δ(1 −

)E[C

]. An easy induction proof then gives E[#C

] ≤

[(Δ)(1 −

)]

, and so the expected size of the clan o f ancestor is at most

E[#C

+ #C

+ ···] ≤

∞

∑

i=0

[(Δ)(1 −

)]

1 −Δ(1 −

)

Fact 5.2. The expected time to use the clan of ancestors method to ﬁnd the values on

a ﬁ nite window is #W (1 −Δ(1 −

))

−1

Proof. The resolution of each clan of ancestors gives the value of exactly one node.

Since each clan of ancestors has a constant (in expectation) number of nodes, the

overall algorithm to ﬁnd the values of a ﬁnite window W is just #W times the bound

from the last fact on the expected size of the clan of ancestors.

DECIDING THE LENGTH OF TIME TO RUN 95

5.6 Deciding the length of t ime to run

While the doubling time method works for both CFTP,andFMMR,whenmany

samples are being taken it does involve some wasted effort.

A better m ethod in this case is to use the doubling method to initially ﬁnd a value

of t such that the probability of coalescence is at least 1/2, and then use that method

throughout. In this case the number of steps taken by ROCFTP will be at most 2t on

average, and for FMMR 4t ( sin ce it is read twice.)

One approach is to run the chain forward starting from the initial bounding state

k times, and let T

,...,T

be the times needed for coalescence to occur. Then the

median value of {T

,...,T

} will be a good estimate of the time n eeded for a block

to coalesce roughly half of the time. Let this median be t

med

. The average running

time per sample using ROCFTP will then approach 2t

med

as the number of samples

gets large.

Of course, at this point it might be possible to do much better. For instance, if

there is a value t that is only 10% greater than the m ed ian, but for which coalescence

occurs 80% of the time, then using this value for the size of a block in ROCFTP will

approach a time of (1.1/0.8)t

med

Because coalescence often has the kind of cutoff phenomena where for t less

than a critical threshold coalescence is unlikely, but past that threshold coalescence

becomes very likely, there is u sually a point somewhere past the median that will

minimize the running time of the algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Advanced Techniques Using Coalescence (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5: Advanced Techniques Using Coalescence (3/4)