146 THE RANDOMNESS RECYCLER
malizing constant to make these into probabilities is 1 + exp(2
β
). Therefore,
P(X(a)=1|X(b)=X(c)=X(d)=1)=
exp(2
β
)
1 + exp(2
β
)
,
P(X(a)=1|X(b)=X (c)=X(d)=1)=
1
1 + exp(2
β
)
.
Where things get interesting is the next step. What should the probability be that
X(b)=1, or X(b)=1? Well, the probability of unfreezing node b to label 1 or -1
should depend on what happened at the choice of a. That is, this can be thought of
as a two-stage experiment: rst unfreeze a, and then try to unfreeze b. First note that
conditioned on X(c)=X (d)=1, the probabilities for, say, (X(a), X(b)) = (1,1)
only depend on the edges leaving the unfrozen nodes. Let A be the event that X(c)=
X(d)=1 and that the label for a was chosen using X (b)=1. Then the goal is to have
P((X(a),X(b)) = (1,1)|A) exp(3
β
) P((X(a),X(b)) = (1,1)|A) exp(
β
)
P((X(a),X(b)) = (1,1)|A) exp(
β
) P((X(a),X (b)) = (1,1)|A) exp(
β
)
because this would be a draw from
π
(·|A).
Break down the rst expression:
P((X(a),X(b)) = (1,1)|A)=P(X (a)=1|A,X (b )=1)P(X (b)=1|A, X(a)=1)
=
exp(2
β
)
Z
1
·P(X(b)=1|A , X (a )=1).
Here Z
1
is the normalizing constant for the choice of X(a) (note Z
1
= exp(2
β
)+1in
this example). For the right-hand side to be proportional to exp(3
β
),itmustbetrue
that P(X (b)=1|A, X(a)=1) exp(
β
).
Repeating this calculation for the other three cases gives the following four ex-
pressions.
P(X(b)=1|A,X(a)=1) exp(
β
)
P(X(b)=1|A,X(a)=1) exp(
β
)
P(X(b)=1|A,X(a)=1) exp(
β
)
P(X(b)=1|A,X (a)=1) exp(
β
)
Here is the subtle point: to ensure that (X(a),X(b))
π
(·|A), it is necessary
that the same normalizing constant be applied to P(X(b)=i|X(a)= j) for all i and
j. In this case, set Z
2
= exp (
β
)+exp(
β
)=2exp(
β
) so, for instance, P(X(b)=
1|A,X(a)=1)=exp(
β
)/[2exp(
β
)] and
P((X(a),X(b)=(1, 1)|A)=
exp(2
β
)
Z
1
·
exp(
β
)
Z
2
=
exp(
β
)
Z
1
Z
2
,
and the same overall normalizing constant of Z
1
Z
2
holds for the other three cases.
Now, using Z
2
when conditioning on both X (a)=1andX(a)=1 means
EXAMPLE: RR FOR THE ISING MODEL 147
that P(X (b)=1|A, X(a)=1)+P(X (b)=1|A , X (a)=1)=1, but P(X(b)=
1|A,X(a)=1)+P(X(b)=1|A,X(a)=1) < 1for
β
> 0. What should be done
with the rest of the probability?
In this case, there is no valid unfrozen choice of X(b) to make, so the node b
should remain frozen at 1. But this comes at a cost: the only way that b could have
remained frozen is if X(a)=1. So by making this choice of keeping b frozen, a must
be refrozen at the value of 1.
So just as in the SSST for the random walk from the previous section, the RR al-
gorithm for the Ising model attempts to advance by unfreezing a node, but sometimes
must retreat by refreezing neighboring nodes.
This refreezing of neighboring nodes is bad, but it is not as bad as the general
acceptance/rejection approach, which throws away entirely the old sample whenever
rejection occurs. Here, some remnant of the sample can be kept, as only the neighbors
of the node are frozen. This process of keeping some of the sample is called recycling,
and this is the key to making this approach linear time for nontrivial values of
β
.
This approach can be broken down into the following seven pieces.
1. There is the set of congurations from which we are attempting to sample.
2. There is a set of frozen nodes whose label is xed.
3. Conditioned on the labels of the frozen nodes, there exists a distribution on the
unfrozen nodes.
4. There is a target distribution over the set of congurations.
5. When all the nodes are unfrozen, the distribution on unfrozen nodes matches the
target distribution.
6. When all the nodes are frozen, it is easy to sample.
7. There is a method of moving from the current state and current set of frozen nodes
to a new state and set of frozen nodes such that the correct distribution on unfrozen
nodes is maintained.
Let x
denote the set of frozen nodes in a graph, and consider the general problem
of how a frozen node v x
should be attempted to be made unfrozen. For c
{−1,1},let f
c
denote the number of frozen neighbors of v that have color c,andn
c
the number of unfrozen neighbors of v that have color c.
Suppose for concreteness that v is frozen at label 1. Then the factor for the weight
from edges adjacent to v whose other endpoint is not frozen is exp(n
1
β
).Whenv
is unfrozen, if v is unfrozen at 1, then the weight should gain an extra factor of
exp( f
1
β
), since all o f those f
1
edges would now move from an unfrozen node to a
frozen node.
If v is unfrozen at 1, then the weight should gain an extra factor of exp([ f
1
+
n
1
]
β
), but also a factor of exp(n
1
β
), since those n
1
edges no longer contribute to
the weight. Hence
P(X(v)=1|X (x
),n
1
,n
1
, f
1
, f
1
) exp(f
1
β
)
P(X(v)=1|X(x
),n
1
,n
1
, f
1
, f
1
) exp(( f
1
+ n
1
n
1
)
β
).
To nd the constant of proportionality, it is necessary to maximize exp( f
1
β
)+
148 THE RANDOMNESS RECYCLER
exp(( f
1
+ n
1
n
1
)
β
) over choices of n
1
and n
1
.For
β
> 0, this is maximized
when n
1
= 0. Therefore, for node v having deg(v) neighbors, the proposal kernel is
P(X(v)=1|X (x
),n
1
,n
1
, f
1
, f
1
)=
exp( f
1
β
)
exp( f
1
β
)+exp((deg(v) f
1
)
β
)
P(X(v)=1|X(x
),n
1
,n
1
, f
1
, f
1
)=
exp(( f
1
+ n
1
n
1
)
β
)
exp( f
1
β
)+exp((deg(v) f
1
)
β
)
.
If the node v is being unfrozen from 1, a similar calculation gives a proposal
kernel of
P(X(v)=1|X(x
),n
1
,n
1
, f
1
, f
1
)=
exp( f
1
β
)
exp( f
1
β
)+exp((deg(v) f
1
)
β
)
P(X(v)=1|X (x
),n
1
,n
1
, f
1
, f
1
)=
exp(( f
1
+ n
1
n
1
)
β
)
exp( f
1
β
)+exp((deg(v) f
1
)
β
)
.
This gives the following overall algorithm.
RR for Ising
1) x (1,1,...,1), x
V
2) Repeat
3) Let v be any node in x
, a x(v)
4) For c ∈{1,1}
5) f
c
#{w x
: {v, w}∈E,x(w)=c}
6) n
c
#{w /x
: {v, w}∈E,x(w)=c}
7) Z exp( f
a
β
)+exp((deg(v) f
a
)
β
)
8) Draw C from {−1,1,r} where P(C = a)=exp( f
a
β
)Z
1
,
P(C = a)=exp(( f
a
+ n
a
n
a
)
β
)Z
1
, P(C = r)=1 P(C ∈{1,1})
9) If C ∈{1,1}
10) x(v) C, x
x
{v}
11) Else
12) x
x
∪{w : {v,w}∈E}
13) Until x
= /0
The proof of the algorithm’s correctness is deferred to the next section when the
RR approach will be shown to be generally valid.
Note that when
β
is small, P(C = 1) and P(C = 1) will both be close to 1/2
and so th e probability of recycling (P(C = r)) will be very small. That makes it likely
that a node will be unfrozen, and that the algorithm will progress quickly towards the
nal state.
With this in hand, it is possible for small
β
to show that the running time of RR
for Ising is linear in the size of the graph.
Lemma 8.4. Let
δ
= 1 (Δ + 1)
exp (
β
Δ) exp(
β
Δ)
exp(
β
Δ)+1
.
THE GENERAL RANDOMNESS RECYCLER 149
If
δ
> 0, then the expected number of steps used by RR for Ising is bounded above
by #V /
δ
.
Proof. Let Y
t
denote the number of nodes frozen at each step. When Y
t
= 0, the
algorithm terminates. Each step of RR can be viewed as beginning by reducing the
number of frozen nodes by 1 by unfreezing node v.
Consider
p = P(C = r)=
exp(( f
a
+ n
a
+ n
a
)
β
) exp(( f
a
+ n
a
n
a
)
β
)
exp( f
a
β
)+exp(( f
a
+ n
a
+ n
a
)
β
)
. (8.4)
Fix the value of f
a
and f
a
,andletk = n
a
+ n
a
be xed as well. Th en
p =
exp(( f
a
+ k)
β
) exp(( f
a
+ k 2n
a
)
β
)
exp( f
a
β
)+exp(( f
a
+ k)
β
)
(8.5)
which is increasing in n
a
.Sotomakep as large as possible, set n
a
= k and n
a
= 0.
Hence
p
exp(( f
a
+ k)
β
) exp(( f
a
k)
β
)
exp( f
a
β
)+exp(( f
a
+ k)
β
)
. (8.6)
Next, x = f
a
+ k.Thenp [exp(
β
) exp( 2k)]/[exp( f
a
β
)+exp(
β
)].
This expression is increasing in k, and so is maximized when k = and f
a
= 0.
Together with f
a
+ = deg(v),thisgives
p
exp(
β
) exp(
β
)
exp((deg(v))
β
)+exp(
β
)
. (8.7)
Differentiating the right-hand side with respect to and simplifying gives
β
[2e
deg(v)
β
+ 2e
2
β
]
(e
(deg(v))
β
+ e
β
)
2
(8.8)
which is always positive, hence maximized when is as large as possible, namely,
deg(v). Therefore
p
exp(deg(v)
β
) exp(deg(v)
β
)
1 + exp(deg(v)
β
)
. (8.9)
Differentiating again shows th is righ t-hand side is increasing in deg(v).IfΔ is
the m aximum degree in the graph, then p [exp(Δ
β
) exp(Δ
β
)]/[1 + exp(Δ
β
)].
Therefore the expected number of nodes that are unfrozen at each step is at least
1 (Δ + 1)p 1 (Δ + 1)[exp(
β
Δ) exp(Δ
β
)]/[1 + exp(
β
Δ)]. Lemma 8.2 then
nishes the p roof.
8.3 The general randomness recycler
The general framework for the randomness recycler employs the following ingredi-
ents. These are listed in the same order as the pieces of RR for the Ising model from
the previous section.
150 THE RANDOMNESS RECYCLER
1. A measurable space (Ω,F ) for congurations called the (primary) state space).
2. A measurable space (Ω,F
) that plays an indexing role called the dual state
space.
3. An index kernel K
I
from (Ω
,F
) to (Ω,F ).
4. A probability measure
π
on (Ω,F ), which is the target distribution.
5. A special state in the dual space x
π
such that K(x
π
,A)=
π
(A) for all measurable
A.
6. A starting index state x
0
, such that it is possible to draw variates from K(x
0
,·).
7. A bivariate kernel K from Ω
×Ω to itself an d a kernel K
from Ω
to itself that
satises the design property,dened as follows.
Denition 8.2. Say that K, K
, and K
I
have the design property if
xΩ
K
I
(x
,dx)K((x
,x), (dy
,dy)) = K
(x
,dy
)K
I
(y
,dy). (8.10)
In other words, given the index X
t
and [X
t
|X
t
] K
I
(X
t
,·), after one step to
(X
t+1
,X
t+1
), it should be true that [X
t+1
|X
t+1
] K
I
(X
t+1
,·). The design property is
somewhat analogous to the use of reversibility in designing Markov chains. Once the
design property holds, the basic RR algorithm falls into place.
Theorem 8.1 (RR Theorem). Suppose RR is run using kernels that satisfy the design
property. Let T be the rst time such that X
t
= x
π
. Then T and X
T
are independent,
and
[X
T
|T < ]=
π
. (8.11)
The following lemma will be useful in proving the theorem.
Lemma 8.5. After t steps, given the index state x
t
, the distribution of the current state
comes fro m K
I
(x
t
,·), even when conditioned on the history of the index process. That
is to say,
[X
t
|X
0
= x
0
,...,X
t
= x
t
] K
I
(x
t
,·). (8.12)
Proof. By induction: the base case is just the denition of K
I
. Suppose (8.12) holds
after t steps, and consider step t + 1.
Let A be any measurable set, and the event H
t
= {X
0
= x
0
,...,X
t
= x
t
}.Then
P(X
t+1
A|H
t
,X
t+1
= x
t+1
)=
P(X
t+1
A,X
t+1
dx
t+1
|H
t
)
P(X
t+1
dx
t+1
|H
t
)
=
xΩ
P(X
t
= dx,X
t+1
A,X
t+1
dx
t+1
|H
t
)
P(X
t+1
dx
t+1
|H
t
)
=
xΩ
K
I
(x
t
,dx)K((x
t
,x), (dx
t+1
,A))
P(X
t+1
dx
t+1
|H
t
)
=
K
(x
t
,dx
t+1
)K
I
(x
t+1
,A)
K
(x
t
,dx
t+1
)
= K
I
(x
t+1
,A).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset