Chapter 8: The Randomness Recycler (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

146 THE RANDOMNESS RECYCLER

malizing constant to make these into probabilities is 1 + exp(2

). Therefore,

P(X(a)=1|X(b)=X(c)=X(d)=1)=

exp(2

)

1 + exp(2

)

P(X(a)=−1|X(b)=X (c)=X(d)=1)=

1 + exp(2

)

Where things get interesting is the next step. What should the probability be that

X(b)=1, or X(b)=−1? Well, the probability of unfreezing node b to label 1 or -1

should depend on what happened at the choice of a. That is, this can be thought of

as a two-stage experiment: ﬁrst unfreeze a, and then try to unfreeze b. First note that

conditioned on X(c)=X (d)=1, the probabilities for, say, (X(a), X(b)) = (1,−1)

only depend on the edges leaving the unfrozen nodes. Let A be the event that X(c)=

X(d)=1 and that the label for a was chosen using X (b)=1. Then the goal is to have

P((X(a),X(b)) = (1,1)|A) ∝ exp(3

) P((X(a),X(b)) = (1,−1)|A) ∝ exp(

)

P((X(a),X(b)) = (−1,1)|A) ∝ exp(

) P((X(a),X (b)) = (−1,−1)|A) ∝ exp(

)

because this would be a draw from

(·|A).

Break down the ﬁrst expression:

P((X(a),X(b)) = (1,1)|A)=P(X (a)=1|A,X (b )=1)P(X (b)=1|A, X(a)=1)

exp(2

)

·P(X(b)=1|A , X (a )=1).

Here Z

is the normalizing constant for the choice of X(a) (note Z

= exp(2

)+1in

this example). For the right-hand side to be proportional to exp(3

),itmustbetrue

that P(X (b)=1|A, X(a)=1) ∝ exp(

Repeating this calculation for the other three cases gives the following four ex-

pressions.

P(X(b)=1|A,X(a)=1) ∝ exp(

)

P(X(b)=−1|A,X(a)=1) ∝ exp(−

)

P(X(b)=1|A,X(a)=−1) ∝ exp(

)

P(X(b)=−1|A,X (a)=−1) ∝ exp(

)

Here is the subtle point: to ensure that (X(a),X(b)) ∼

(·|A), it is necessary

that the same normalizing constant be applied to P(X(b)=i|X(a)= j) for all i and

j. In this case, set Z

= exp (

)+exp(

)=2exp(

) so, for instance, P(X(b)=

−1|A,X(a)=1)=exp(−

)/[2exp(

)] and

P((X(a),X(b)=(1, −1)|A)=

exp(2

)

exp(−

)

exp(

)

and the same overall normalizing constant of Z

holds for the other three cases.

Now, using Z

when conditioning on both X (a)=1andX(a)=−1 means

EXAMPLE: RR FOR THE ISING MODEL 147

that P(X (b)=1|A, X(a)=−1)+P(X (b)=−1|A , X (a)=−1)=1, but P(X(b)=

1|A,X(a)=−1)+P(X(b)=−1|A,X(a)=−1) < 1for

> 0. What should be done

with the rest of the probability?

In this case, there is no valid unfrozen choice of X(b) to make, so the node b

should remain frozen at 1. But this comes at a cost: the only way that b could have

remained frozen is if X(a)=1. So by making this choice of keeping b frozen, a must

be refrozen at the value of 1.

So just as in the SSST for the random walk from the previous section, the RR al-

gorithm for the Ising model attempts to advance by unfreezing a node, but sometimes

must retreat by refreezing neighboring nodes.

This refreezing of neighboring nodes is bad, but it is not as bad as the general

acceptance/rejection approach, which throws away entirely the old sample whenever

rejection occurs. Here, some remnant of the sample can be kept, as only the neighbors

of the node are frozen. This process of keeping some of the sample is called recycling,

and this is the key to making this approach linear time for nontrivial values of

This approach can be broken down into the following seven pieces.

1. There is the set of conﬁgurations from which we are attempting to sample.

2. There is a set of frozen nodes whose label is ﬁxed.

3. Conditioned on the labels of the frozen nodes, there exists a distribution on the

unfrozen nodes.

4. There is a target distribution over the set of conﬁgurations.

5. When all the nodes are unfrozen, the distribution on unfrozen nodes matches the

target distribution.

6. When all the nodes are frozen, it is easy to sample.

7. There is a method of moving from the current state and current set of frozen nodes

to a new state and set of frozen nodes such that the correct distribution on unfrozen

nodes is maintained.

Let x

∗

denote the set of frozen nodes in a graph, and consider the general problem

of how a frozen node v ∈ x

∗

should be attempted to be made unfrozen. For c ∈

{−1,1},let f

denote the number of frozen neighbors of v that have color c,andn

the number of unfrozen neighbors of v that have color c.

Suppose for concreteness that v is frozen at label 1. Then the factor for the weight

from edges adjacent to v whose other endpoint is not frozen is exp(n

).Whenv

is unfrozen, if v is unfrozen at 1, then the weight should gain an extra factor of

exp( f

), since all o f those f

edges would now move from an unfrozen node to a

frozen node.

If v is unfrozen at −1, then the weight should gain an extra factor of exp([ f

−1

]

), but also a factor of exp(−n

), since those n

edges no longer contribute to

the weight. Hence

P(X(v)=1|X (x

∗

),n

−1

, f

−1

) ∝ exp(f

)

P(X(v)=−1|X(x

∗

),n

−1

, f

−1

) ∝ exp(( f

−1

+ n

−1

−n

)

To ﬁnd the constant of proportionality, it is necessary to maximize exp( f

148 THE RANDOMNESS RECYCLER

exp(( f

−1

+ n

−1

−n

)

) over choices of n

−1

and n

.For

> 0, this is maximized

when n

= 0. Therefore, for node v having deg(v) neighbors, the proposal kernel is

P(X(v)=1|X (x

∗

),n

−1

, f

−1

exp( f

)

exp( f

)+exp((deg(v) − f

)

P(X(v)=−1|X(x

∗

),n

−1

, f

−1

exp(( f

−1

+ n

−1

−n

)

exp( f

)+exp((deg(v) − f

)

If the node v is being unfrozen from −1, a similar calculation gives a proposal

kernel of

P(X(v)=−1|X(x

∗

),n

−1

, f

−1

exp( f

−1

)

exp( f

−1

)+exp((deg(v) − f

−1

)

P(X(v)=1|X (x

∗

),n

−1

, f

−1

exp(( f

+ n

−n

−1

)

exp( f

−1

)+exp((deg(v) − f

−1

)

This gives the following overall algorithm.

RR for Ising

1) x ← (1,1,...,1), x

∗

←V

2) Repeat

3) Let v be any node in x

∗

, a ← x(v)

4) For c ∈{1,−1}

5) f

← #{w ∈ x

∗

: {v, w}∈E,x(w)=c}

6) n

← #{w /∈x

∗

: {v, w}∈E,x(w)=c}

7) Z ← exp( f

)+exp((deg(v) − f

)

8) Draw C from {−1,1,r} where P(C = a)=exp( f

−1

P(C = −a)=exp(( f

−a

+ n

−a

−n

)

−1

, P(C = r)=1 −P(C ∈{−1,1})

9) If C ∈{−1,1}

10) x(v) ←C, x

∗

← x

∗

{v}

11) Else

12) x

∗

← x

∗

∪{w : {v,w}∈E}

13) Until x

∗

= /0

The proof of the algorithm’s correctness is deferred to the next section when the

RR approach will be shown to be generally valid.

Note that when

is small, P(C = 1) and P(C = −1) will both be close to 1/2

and so th e probability of recycling (P(C = r)) will be very small. That makes it likely

that a node will be unfrozen, and that the algorithm will progress quickly towards the

ﬁnal state.

With this in hand, it is possible for small

to show that the running time of RR

for Ising is linear in the size of the graph.

Lemma 8.4. Let

= 1 −(Δ + 1)

exp (

Δ) −exp(−

Δ)

exp(

Δ)+1

THE GENERAL RANDOMNESS RECYCLER 149

> 0, then the expected number of steps used by RR for Ising is bounded above

by #V /

Proof. Let Y

denote the number of nodes frozen at each step. When Y

= 0, the

algorithm terminates. Each step of RR can be viewed as beginning by reducing the

number of frozen nodes by 1 by unfreezing node v.

Consider

p = P(C = r)=

exp(( f

−a

+ n

−a

+ n

)

) −exp(( f

−a

+ n

−a

−n

)

exp( f

)+exp(( f

−a

+ n

−a

+ n

)

. (8.4)

Fix the value of f

and f

−a

,andletk = n

+ n

−a

be ﬁxed as well. Th en

p =

exp(( f

−a

+ k)

) −exp(( f

−a

+ k −2n

)

exp( f

)+exp(( f

−a

+ k)

)

(8.5)

which is increasing in n

.Sotomakep as large as possible, set n

= k and n

−a

= 0.

Hence

p ≤

exp(( f

−a

+ k)

) −exp(( f

−a

−k)

)

exp( f

)+exp(( f

−a

+ k)

)

. (8.6)

Next, ﬁx  = f

−a

+ k.Thenp ≤ [exp(

) −exp( −2k)]/[exp( f

)+exp(

)].

This expression is increasing in k, and so is maximized when k =  and f

−a

= 0.

Together with f

+  = deg(v),thisgives

p ≤

exp(

) −exp(−

)

exp((deg(v)−)

)+exp(

)

. (8.7)

Differentiating the right-hand side with respect to  and simplifying gives

[2e

deg(v)

+ 2e

2

]

(deg(v)−)

+ e



)

(8.8)

which is always positive, hence maximized when  is as large as possible, namely,

deg(v). Therefore

p ≤

exp(deg(v)

) −exp(deg(v)

)

1 + exp(deg(v)

)

. (8.9)

Differentiating again shows th is righ t-hand side is increasing in deg(v).IfΔ is

the m aximum degree in the graph, then p ≤ [exp(Δ

) −exp(Δ

)]/[1 + exp(Δ

)].

Therefore the expected number of nodes that are unfrozen at each step is at least

1 −(Δ + 1)p ≥ 1 −(Δ + 1)[exp(

Δ) −exp(−Δ

)]/[1 + exp(

Δ)]. Lemma 8.2 then

ﬁnishes the p roof.

8.3 The general randomness recycler

The general framework for the randomness recycler employs the following ingredi-

ents. These are listed in the same order as the pieces of RR for the Ising model from

the previous section.

150 THE RANDOMNESS RECYCLER

1. A measurable space (Ω,F ) for conﬁgurations called the (primary) state space).

2. A measurable space (Ω,F

∗

) that plays an indexing role called the dual state

space.

3. An index kernel K

from (Ω

∗

) to (Ω,F ).

4. A probability measure

on (Ω,F ), which is the target distribution.

5. A special state in the dual space x

∗

such that K(x

∗

,A)=

(A) for all measurable

6. A starting index state x

∗

, such that it is possible to draw variates from K(x

∗

,·).

7. A bivariate kernel K from Ω

∗

×Ω to itself an d a kernel K

∗

from Ω

∗

to itself that

satisﬁes the design property,deﬁned as follows.

Deﬁnition 8.2. Say that K, K

∗

, and K

have the design property if



x∈Ω

∗

,dx)K((x

∗

,x), (dy

∗

,dy)) = K

∗

,dy

∗

,dy). (8.10)

In other words, given the index X

∗

and [X

∗

] ∼ K

∗

,·), after one step to

∗

t+1

), it should be true that [X

t+1

∗

t+1

] ∼ K

∗

t+1

,·). The design property is

somewhat analogous to the use of reversibility in designing Markov chains. Once the

design property holds, the basic RR algorithm falls into place.

Theorem 8.1 (RR Theorem). Suppose RR is run using kernels that satisfy the design

property. Let T be the ﬁrst time such that X

∗

= x

∗

. Then T and X

are independent,

and

|T < ∞]=

. (8.11)

The following lemma will be useful in proving the theorem.

Lemma 8.5. After t steps, given the index state x

∗

, the distribution of the current state

comes fro m K

∗

,·), even when conditioned on the history of the index process. That

is to say,

∗

= x

∗

,...,X

∗

= x

∗

] ∼ K

∗

,·). (8.12)

Proof. By induction: the base case is just the deﬁnition of K

. Suppose (8.12) holds

after t steps, and consider step t + 1.

Let A be any measurable set, and the event H

= {X

∗

= x

∗

,...,X

∗

= x

∗

}.Then

P(X

t+1

∈ A|H

∗

t+1

= x

∗

t+1

P(X

t+1

∈ A,X

∗

t+1

∈ dx

∗

t+1

)

P(X

∗

t+1

∈ dx

∗

t+1

)



x∈Ω

P(X

= dx,X

t+1

∈ A,X

∗

t+1

∈ dx

∗

t+1

)

P(X

∗

t+1

∈dx

∗

t+1

)



x∈Ω

∗

,dx)K((x

∗

,x), (dx

∗

t+1

,A))

P(X

∗

t+1

∈dx

∗

t+1

)

∗

,dx

∗

t+1

∗

t+1

,A)

∗

,dx

∗

t+1

)

= K

∗

t+1

,A).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: The Randomness Recycler (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8: The Randomness Recycler (2/4)