86 ADVANCED TECHNIQUES USING COALESCENCE
dx|X
t
= x)=
π
(dy)P(X
t+1
dx|X
t
= y),whichiswhyinDenition 1.34 suc h chains
were labe led reversible.
Okay, so now we know how to run the chain backwards in time. Suppose in our
simulation, X
5
= 0, X
4
= 1, X
3
= 2, X
2
= 1, X
1
= 0, and X
0
= 0. Now look at how
the chain can be run forward in time conditioned on the path (0,0,1,2,1,0) using the
updates X
t
=
φ
(X
t1
,U
t
).
In the rst step X
1
= 0givenX
0
= 0. That means that U
1
2/3. So generate U
1
uniformly from [0, 2/3]. Similarly, since X
1
= 0andX
2
= 1, U
2
> 2 /3. The rest of
the U
i
can be bounded similarly.
Note that regardless of the value of x
0
, setting x
t
=
φ
(x
t1
,U
t
) using U
1
2 /3,
U
2
> 2/3, U
3
> 2/3, U
4
2/3, and U
5
2/3givesx
5
= 0 = X
5
. Hence this gives an
S
x
block.
On the other hand, suppose (X
0
,X
1
,X
2
,X
3
,X
4
,X
5
)=(1,0 , 1, 0,1,0). Then gener-
ating the U
i
gives a nal state of 1 if x
0
= 1and0ifx
0
= 0. So that is an example o f
an F block.
In practice, it is not necessary to retain the entire path (X
0
,...,X
t
). As the pair
(X
t
1
,X
t
) the value of U
t
can be imputed and stored. So only the (U
0
,...,U
t
) need
be re tained.
5.2.1 Reversing a Markov chain
In the random walk on {0, 1,2}example, it is was easy to determine how the reversed
walk process behaved. For instance, if X
t+1
= X
t
+ 1, then in the reverse process
X
t
= X
t+1
1, and so on. Now the general method for reversing a Markov chain is
discussed.
Recall Denitio n 1.27 of a Markov kernel K requires that for all x Ω, K(x,·) is
a probability distribution, and that for all measurable A and a [0,1], the set of states
x such that K(x,A) a is also measurable.
In other words, what the kernel K gives us is a family of probability distributions.
Given the current state x, K(x,·) is the probability distribution for the next state y.
(The second part of the denition is necessary for technical reasons.)
Now the notion of a reverse kernel with respect to a distribution
π
can be created.
Denition 5.1. Let K be a kernel from (Ω,F ) to itself, and
π
a probability mea-
sur e on (Ω,F ).ThenK
rev
is a reverse kernel associated with
π
if for all bounded
measurable functions f dened on Ω ×Ω
(x,y)Ω×Ω
f (x, y)
ν
(dx)K(x, dy)=
(x,y)Ω×Ω
f (x, y)
ν
(dy)K
rev
(y,dx).
Note that in general the reverse kernel might not even exist, and if it does, is not
always uniquely dened!
While the denitio n above is presented in an abstract fashion, the notion that it is
capturing is more straightforward. Suppose X
t
π
,andthenX
t+1
comes from taking
one step in the Markov chain. Now suppose the value of X
t
is lost. What is the new
distribution of X
t
given X
t+1
? This is what the reverse kernel gives us.
As was seen in the previous section, for the biased simple random walk on
FILL, MACHIDA, MURDOCH, AND ROSENTHAL’S METHOD 87
{0,1,2}, the kernel is also the reverse kernel. Fortunately, the basic toolkit for build-
ing Markov chains with a target stationary distribution often gives u s the reversed
kernel alongside the original one!
Theorem 5.1. Both Gibbs samplers and Metropolis-Hastings chains are reversible.
For more complicated chains, a reversible kernel can be constructed as long as
the distribution of the next state can be written using a density with respect to
π
.
Theorem 5.2. Suppose that for all x Ω, there exists a density p(x,·) with respect
to
π
such that K(x, B)=
B
p(x,y )
π
(dy). Then a reverse kernel exists where
K
rev
(y,dx)=
p(x,y )
π
(dy)
zΩ
p(z,y)
π
(dz)
.
The general proof of this theorem requires the use of a tool from measure the-
ory called Fubini’s Theorem, but when the density is with respect to the counting
measure things are simpler. In this case
π
(dy)=
π
(dz)=1forally and z,and
zΩ
p(z,y)
π
(dz)=
zΩ
P(X
1
= y,X
0
= x), so this is saying
P(X
0
= x|X
1
= y)=
P(X
0
= x, X
1
= y)
z
P(X
0
= z, X
1
= y)
,
just as in our earlier discrete example.
5.2.2 General FMMR
Now that we have a precise notion of how to reverse a Markov chain, the FMMR
procedu re can be written as follows. Let
φ
rev
denote the update function for the re-
verse kernel of the Markov chain. As earlier, let
φ
t
(x
0
,U
1
,...,U
t
) be the state X
t
conditioned on X
0
= x
0
.
FMMR Input: x Ω, t Output: X
0
π
1) Repeat
2) X
t
x
3) For i from t 1to0
4) Draw U Unif([0,1])
5) X
i
φ
rev
(X
i+1
,U)
6) Draw U
i+1
uniformly from [0,1] conditioned on
φ
(X
i
,U
i+1
)=X
i+1
7) Until
φ
t
(Ω,U)={x}
Lemma 5.2. Suppose that for x Ω and X
0
π
, P(
φ
t
(Ω,U)={x}) > 0. Then the
output of FMMR comes from
π
.
Proof. Fix a state x in Ω. Suppose X
0
π
. Generate the forward uniforms U =
U
1
,...,U
t
until {U A} = {
φ
(Ω,U)={x}}.ThenX
0
and {U A} are independent
of each other, so [X
0
|U A]
π
.
From AR theory, another way to generate [X
0
|U A] is to draw X
0
and generate U
until {U A} occurs. When this happens, the sample path X
t
=
φ
(X
t1
,U
t
) will end
at X
t
= x
0
. Therefore, by only working with the draws of U where the sample paths
end at X
t
= x
0
, the last accepted draw of U
t
remains from the correct distribution.
88 ADVANCED TECHNIQUES USING COALESCENCE
The memory requirement here is two states of the conguration and recording
all of the U
1
,...,U
t
for use in the until condition. Therefore, FMMR is a read twice,
interruptible perfect simulation algorithm.
5.2.3 Example: FMMR for the Ising model
To see this approach on a nontrivial problem, consider perfect simulation from the
Ising model o f Section 1.6.1 using the Ising
Gibbs algo rithm from Section 1.8.1.
Since this is a Gibbs sampler style chain, the same kernel works for both the forward
process and the reverse process.
Let X
t+1
be
φ
(X
t
,V
t+1
,U
t+1
) whose value is the output of the function
Ising
Gibbs update function from Section 5.1.1. Suppose the states x
t
and x
t+1
are known, and the question is how can the values of V
t+1
and U
t+1
for the forward
step be simulated?
First consider the case that X
t
= X
t+1
.Since
φ
can only change the value at a
single node, there exists a unique v such that X
t
(v) = X
t+1
(v).SoV
t+1
= v.
As in Ising
Gibbs update function,letn
1
denote the number of neigh-
bors of v labeled 1 in X
t
,andn
1
the number of neighbors labeled 1. Set r =
exp(
β
n
1
)/[exp(
β
n
1
)+exp(
β
n
1
)].ThenifX
t+1
(v)=1, then U
t+1
is uniform over
[0,r],andifX
t+1
(v)=1, then U
t+1
is uniform over (r,1].
Now suppose that X
i+1
= X
i
. Then the goal is to choose (V
t+1
,U
t+1
) uniformly
from the set (v,U) such that
φ
(X
i+1
,v,U)=X
i
. But that was already done! The (v,U )
that resulted in moving from X
t+1
to X
t
= X
t+1
was exactly such a draw. Hence no
additional work is needed.
This gives the following method.
FMMR Ising Gibbs Input: x, t Output: X
π
1) X x
2) Repeat
3) For i from t 1to0
4) Draw (V
i+1
,U
i+1
) Unif(V ×[0,1])
5) c Ising
Gibbs update function(X ,V
i+1
,U
i+1
)
6) For d ∈{1,1},letn
d
be the # of neighbors of V
i+1
labeled d in X
7) r exp(
β
n
1
)/[exp(
β
n
1
)+exp(
β
n
1
)]
8) If X(V
i+1
)=1andc = 1thenU
i+1
Unif([0, r ]), X (V
i+1
) c
10) Else if X(v
i+1
)=1andc = 1thenU
i+1
Unif((r,1]), X(V
i+1
) c
12) x
min
(1,...,1), x
max
(1,...,1)
13) For i from 0 to t 1
14) x
min
(V
i+1
) Ising Gibbs update function(x
min
,V
i+1
,U
i+1
)
15) x
max
(V
i+1
) Ising Gibbs update function(x
max
,V
i+1
,U
i+1
)
16) Until x
max
= x
min
VARIABLE CHAINS 89
5.2.4 Doubling time
Note that in the p roof of Lemma 5.2, the size of t is not important: the distribution of
X
0
conditioned on coalescence detected in th e block is
π
.
That means that it is perfectly legal to double the value of t used at each step of
the process, in the same way as in the original CFTP. In this way it is not necessary
to worry about for what value of t the backwards coalescence is likely to occur.
5.3 Variable chains
In a bounding chain, a subset of labellings was updated at each step. A more general
approach is to keep track of the state as a function of the initial values of nodes. This
type of chain is called a variable chain, and can be used to understand how changes
in the initial state propagate through the system.
In a variable chain, each node is labeled not with a set of labellings, but rather
with a function of variables x
1
,...,x
#V
. Initially, node v is given the label x
v
.Asthe
chain evolves, each node will always have a label that is a function of (x
1
,...,x
#V
).
The variables that appear in (x
1
,...,x
#V
) indicate where chan ges in the initial state
would have changed the current value of a node.
5.3.1 Antivoter model
Variable chains were introduced in [61] in order to generate samples from the anti-
voter model stationary distribution.
The antivoter model is a sequence of states that gives voter preferences that
change through time based on the preferences of the local neighbors.
The state space for a graph G =(V,E) is Ω = {0,1}
V
, so that each node is labeled
either 0 or 1. The Markov chain then chooses a node v uniformly at random, and
then a neighbor w of node v uniformly from the neighbors of v. Finally, x(v) is set to
1 x(w).
In the voter model, if v then w is chosen x(v) is set to be equal to x(w).Onanite
graph, the voter model eventually reaches the all 0’s state or the all 1’s state and stays
there.
For the antivoter model, th e behavior is more interesting. When the grap h G is
regular with degree r 3 and is not bipartite, Donnelly and Welsh [30] showed that
this Markov chain has a unique stationary distribution that is uniform over every state
except the all 0’s and all 1’s states.
In the variable chain, each node v begins with a label x
BC
(v)=x
v
. When the step
in the chain chooses v and then w to modify, the label o n v becomes 1 x
v
. Suppose
that edge (3,4) in the chain was chosen. Then x
VC
(4) is set to be 1 x
3
. Note that at
this point, n o node has an x
4
in its label. Since the only way a node can get a variable
in its label is if it is next to a label with that variable, there is no longer any way for
a node to be labeled x
4
. That variable is gone forever from the state of the variable
chain.
The update then looks as follows.
90 ADVANCED TECHNIQUES USING COALESCENCE
Antivoter model variable chain
Input: old state x
VC
Output: new state x
VC
1) Draw v uniformly from V
2) Draw w uniformly from the neighbors of v
3) Set x
VC
(v) to be 1 x
VC
(w)
It is easy to show that eventually, only one variable remains.
Fact 5.1. Consider a connected graph G that is regular of degree r 3 , and has a
minimum cut of size c 1. Then starting from state x
VC
(v)=x
v
for all v V , after
t steps, the chance that more than one variable remains in the node labels is at most
exp(ct ((2/
π
2
)(n 1)
2
2 +(1/(6
π
))(n 1)
4
)#E
1
).
Proof. Let n(z)=#{v : x
VC
(v) ∈{x
z
,1 x
z
}}be the number of nodes that are labeled
either x
z
or 1 x
z
. (At time 0, n(z)=1forallz V .) Let W
t
= max
z
n(z) after t steps,
v
max
= arg max
z
n(z),andV
t
= {v : x
VC
(v) ∈{x
v
max
,1 x
v
max
}}. Suppose there are k
edges connecting nodes in V
t
to nodes in V V
t
. Then the chance of choosing one of
these edges at the rst step is 2k(1/#V )(1/r).Ifv V
t
and w /V
t
then W
t
increases
by 1. If v / V
t
and w V
t
then W
t
decreases by at most 1. (It could be less: for
instance, if W
t
= 4usingx
7
or x
6
, and an edge removed an x
7
instance, then x
6
takes
over as the most common label and W
t+1
is still 4.)
Let W
t
= n W
t
.ThenW
t
= n when W
t
= 0. Use the same potential function as
in the proof of Lemma 4.2:
φ
(i)=sin(Ci)/sin(C), with C =
π
/(2(n 1)).Thenas
in (4.10),
E[
φ
(W
t
)|W
t1
]
k
2#E
φ
(W
t1
1)+
1
k
#E
φ
(W
t1
)+
k
2#E
φ
(W
t1
+ 1)
=
φ
(W
t1
)+k(2#E)
1
Δ
2
φ
(W
t1
1)
=
φ
(W
t1
)+k(2#E)
1
φ
(W
t1
)(2cos(C) 2)
φ
(W
t1
)
γ
,
where
γ
= 1 ct(1 cos(C))#E
1
.(Sincec is the size of the minimum cut, k c.)
Now a simple induction gives E[
φ
(W
t
)] W
0
γ
t
.
When W
t
> 0 it holds that W
t
1, so by Markov’s inequality P(W
t
> 0)=
P(
φ
(W
t
) > 0) E[
φ
(W
t
)]. Recall that 1 x exp(x) for all real x. Hence
γ
exp(ct (1 cos(
π
/(2(n 1))))#E
1
).Using1cos(x) x
2
/2 x
4
/24 com pletes
the proof.
Once the variable chain has been reduced to a conguration where every node
is labeled x
z
or 1 x
z
, there are only two possible states, the one where x
z
= 0and
the one where x
z
= 1. Since the stationary distribution assigns equal weight to all
congurations, simply decide uniformly which of these two states to pick to complete
the perfect simulation algorithm.
With the antivoter model, the problem was that the basic bounding chain did not
have any traction to get started. In order to learn about the value of a node, at least
one node already needed to be known. The variable chain sidestepped that problem
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset