DEALING WITH INFINITE GRAPHS 91
by keeping track of how the values of states percolate through the chain. (Lubetzky
would later refer to this as information percolation, which h e used to study cutoff
phenomena in the mixing time for the Ising model [87].)
5.4 Dealing with innite graphs
So far, the algorithms used to generate from the Markov random elds have only
applied when the graph in question V is nite. In this section it is shown how to draw
from a projection of the innite graph onto a nite subset.
Let V denote a possibly innite set of vertices, and C be the nite set o f labellings
for a node.
Denition 5.2. Let
S = {w : s S,{s,w}∈E} be the neighbors of the set S. Then
a random conguration X with distribution
ν
is a Markov random eld if
ν
has
conditional probabilities such that for all nite S V, all x C
S
and y C
V S
,
P
ν
(X(S)=x|X(V S)=y)=P
ν
(X(S)=x|X(
S)=y(
S)). (5.1)
(So the distribution of X over S only depends on the values on the boundary of S, and
not on further distant nodes.) Call the set of conditional distributions a specication.
In the case that V is nite, a specication is enough to uniquely determine the dis-
tribution
ν
.WhenV is innite, there will be at least one
ν
with a given sp ecication,
but there also might be more than one such distribution.
Denition 5.3. Suppose that a node has a xed distribution given the values on the
neighboring nodes, and that there exists a unique measure on the state space satisfy-
ing these conditional distributions. Then call the unique measure a Gibbs measure.
The question of when the Gibbs measure is unique is a central issue in Markov
eld theory. Nonuniqueness occurs at values of the parameter known as a phase
transition.
Denition 5.4. The Ising m odel Gibbs measure has a specication given by
P(X(S )=x|X(
S)=y) exp(
β
H(x(S
S))), (5.2)
where
H(x)=
{i, j}
1(x(i)=x( j)). (5.3)
When the distribution of a single node is “sufciently” independent of the values
of its neighbors, say that the specication satises a high-noise requirement. Typ-
ically the higher the noise, the easier it becomes to show there is a unique Gibbs
measure, and to generate samples from the measure.
There have been several different high-noise requirements, such as Dobrushin
uniqueness [28] or a condition of van den Berg and Maes [ 126]. The on e that will
be of use here is due to H¨aggstr¨om and Steif [45], and has the nice feature that not
only does it guarantee the existence of a unique Gibbs measure, it also guarantees the
existence of a linear time method for d rawing a sample perfectly from the measure.
92 ADVANCED TECHNIQUES USING COALESCENCE
Denition 5.5. Fo r c C, dene a parameter of independence of the distribution of
a node from its neighbors as follows:
γ
(c)=min
vV
min
yC
{v}
P(X(v)=c|X (
{s})=y), (5.4)
γ
=
cC
γ
(c). (5.5)
Say that a specication is HS-high noise if for Δ the maximum degree in the graph,
γ
>
Δ 1
Δ
. (5.6)
At each step of the process, a subset of nodes must be chosen so that no two
nodes in the subset are connected by an edge (this is known as an independent set of
nodes.) The labels on these nodes will simultaneously be updated. Because none are
connected, they can all be updated simultaneously.
One way to do this suggested in [45] is to independently ip a p-coin for every
node in V . If a node comes up heads and all of its neighbors are tails, then the node
becomes part of the updating set. The probability that a particular node becomes part
of the set is then at least p(1 p)
Δ
, which is maximized by taking p =(Δ + 1)
1
.
For every node in the updating set, choose a new label for the node conditioned
on the values of the neighbors. If the value of one or more neighbors are unknown,
then draw U uniformly from [0,1], and set aside a subinterval of length
γ
(c) for each
c C.IfU falls in the su binterval associated with color c, then label the node c.
Otherwise label the node “unknown.
To be precise, label a node v with Z
i
(v)=1 if it is unknown and Z
i
(v)=0ifitis
known after i steps in this process.
Lemma 5.3 (Proposition 2.1 of [45]). Suppose there is a subset of nodes S with
Z
0
(v)=1 for all v
λ
.Letp
= p(1 p)
Δ
. Then using the updating scheme given
in the previous paragraph, for a node v that is at least k distance in the graph from
S, and
γ
as in (5.5).
P(Z
k
(v)=1)
1
Δ
Δ
(1 Δ(1
γ
))
(Δ + 1)
Δ+1
k
. (5.7)
Proof. The proof is by induction on k. The base case of k = 0 is trivially true. Assume
it holds for k.Letx be any node at least k + 1 distance from the boundary of S.
Let b(i)=(1 (Δ
Δ
(1 Δ(1
γ
)))/(Δ + 1)
Δ+1
)
i
. Then any node adjacent to x is
at least k distance from the boundary of S .LetY
k
denote the set o f nodes chosen to
update at each step. As before, let p(x)=P(x Y
k
).Thenifx is not updated, use
the induction hypothesis to bound the chance Z
k+1
(x)=1. If it is updated, then the
chance Z
k+1
(x)=1isatmost1
γ
if any neighbor w of x has Z
k
(w)=1. The union
bound makes this chance at most Δb(i). Hence
P(Z
k+1
(x)=1)=P(x Y
k
)P(Z
k+1
(x)=1|x Y
k
)+P(x /Y
k
)P(Z
k+1
(x)=1|x /Y
k
)
p(x)(1
γ
)Δ ·b(i)+(1 p(x))b(i)
= b(i)[(1
γ
)Δ · p(x)+(1 p(x))].
DEALING WITH INFINITE GRAPHS 93
For (1
γ
)Δ < 1, the right-hand side is maximized when p(x) is minimized, and
p(x) p
=(Δ + 1)
1
. Simplifying this expression then completes the induction.
Let N(v) denote the set of nodes w such that {v, w}∈E. This then gives the
following method:
Finitary coding Input: W
1) k 0
2) Repeat
3) k k + 1
4) Let W
k
be all points within distance k of W
5) For all v W
k
,letZ
k
(v)=1
6) For t from k to 1
7) Draw Y
t
W
k
to update
8) For all v Y
t
9) Draw U Unif([0,1])
10) If U
γ
11) Z
t+1
(v) 0, set X
t+1
(v) by which subinterval for c that U is in
12) Else
13) If there is a neighbor w of v with Z
t
(w)=1
14) Z
t+1
(v)=1
15) Else
16) Z
t+1
(v)=0, set X
t+1
(v) given X
t
(N(v)) and U >
γ
17) Until Z
0
(v)=0forallv W
Theorem 5.3 (Theorem 2.2 of [45]). For any nite W , the Finitary coding ter-
minates with probability 1, and outputs an unbiased sample X from the distribution
over the observation window W .
Proof. From Lemma 5.3, the chance a node v W has Z
0
(v)=1isatmost
1
Δ
Δ
(1 Δ(1
γ
))
(Δ + 1)
Δ+1
k
. (5.8)
So the chance that any of the elements in W has Z
0
(v)=1isatmost#W times this
bound. This tends to 0 as k , so the algorithm terminates w ith probability 1.
The p roof that the output comes from the correct distr ibution is similar to earlier
versions of CFTP.LetT denote the minimum value of k needed for the algorithm to
terminate. Then as shown above T is nite with probability 1.
For each k,letR
k
be a draw from the distribution over the observation window
W
k
. Then running this forward k steps gives a r esult R
0
k
, which is still stationary.
Thus the set of variables R
0
1
,R
0
2
,R
0
3
,... all h ave the correct distribution. But R
0
k
con-
verges to X, the output o f the algorithm with pr obability 1. The on ly way this can
happen is if X has the stationary distribution.
94 ADVANCED TECHNIQUES USING COALESCENCE
5.5 Clan of ancestors
So far the running time for generating samples has been nearly linear in the number
of nodes in the sample, O(#V ln(#V )). In this section it is shown that by restricting
carefully the nodes that are considered in an update, this can be reduced to time linear
in the number of nodes.
The idea is as follows. Consider th e nitary coding of H¨aggstr¨om and Steif [45],
and think about what the value of a specic node v at time 0 is. To learn about the
value of the node, it is necessary to go back in time until the last time that the node
was updated.
At this time in the past, there is a
γ
chance that the value of v does not depend
in any way on the values of the neighbors. On the other hand, if a U is drawn for
v at this point that indicates that the values of the neighbors are important, then the
neighbors of v need to be added to the clan of ancestors of v.
Each of these neighbors, then, n eeds its value to be determined, which can be
done recursively. Some of these neighbors might have had U
γ
when it was up-
dated, or their neighbors might go on the list to be determined.
The result is similar to a branching process, and certainly has a tree structure.
It is not exactly a branching process b ecau se of the lack of independence between
“children. However, ideas from branching processes can be used to bound the over-
all size of the cluster. Because the nodes of the cluster are present f or the birth of
later nodes, call these nodes the clan of ancestors of the original node.
Lemma 5.4. Suppose
γ
> (Δ 1)/Δ. Then the expected size in the clan of ancestors
of v (including v) is at most (1 Δ(1
γ
))
1
.
Proof. Let C
0
= {v}, C
1
denote the rst generation of ancestors, C
2
the second, and
so on. Then the expected number of nodes in generation i + 1isatmostΔ(1
γ
)
times the expected number of nodes in generation i, since each node in i has only a
1
γ
chance of adding at most Δ nodes to the next generation.
Hence E[#C
i+1
|#C
i
] Δ(1
γ
)C
i
, and taking the expected value of both sides
gives E[#C
i+1
] Δ(1
γ
)E[C
i
]. An easy induction proof then gives E[#C
i
]
[(Δ)(1
γ
)]
i
, and so the expected size of the clan o f ancestor is at most
E[#C
0
+ #C
1
+ ···]
i=0
[(Δ)(1
γ
)]
i
=
1
1 Δ(1
γ
)
.
Fact 5.2. The expected time to use the clan of ancestors method to nd the values on
a nite window is #W (1 Δ(1
γ
))
1
.
Proof. The resolution of each clan of ancestors gives the value of exactly one node.
Since each clan of ancestors has a constant (in expectation) number of nodes, the
overall algorithm to nd the values of a nite window W is just #W times the bound
from the last fact on the expected size of the clan of ancestors.
DECIDING THE LENGTH OF TIME TO RUN 95
5.6 Deciding the length of t ime to run
While the doubling time method works for both CFTP,andFMMR,whenmany
samples are being taken it does involve some wasted effort.
A better m ethod in this case is to use the doubling method to initially nd a value
of t such that the probability of coalescence is at least 1/2, and then use that method
throughout. In this case the number of steps taken by ROCFTP will be at most 2t on
average, and for FMMR 4t ( sin ce it is read twice.)
One approach is to run the chain forward starting from the initial bounding state
k times, and let T
1
,...,T
k
be the times needed for coalescence to occur. Then the
median value of {T
1
,...,T
k
} will be a good estimate of the time n eeded for a block
to coalesce roughly half of the time. Let this median be t
med
. The average running
time per sample using ROCFTP will then approach 2t
med
as the number of samples
gets large.
Of course, at this point it might be possible to do much better. For instance, if
there is a value t that is only 10% greater than the m ed ian, but for which coalescence
occurs 80% of the time, then using this value for the size of a block in ROCFTP will
approach a time of (1.1/0.8)t
med
.
Because coalescence often has the kind of cutoff phenomena where for t less
than a critical threshold coalescence is unlikely, but past that threshold coalescence
becomes very likely, there is u sually a point somewhere past the median that will
minimize the running time of the algorithm.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset