Chapter 1: Introduction (4/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

16 INTRODUCTION

Deﬁnition 1.33. If X

= Y

, then say that the processes have coupled or coalesced.

Coupling is important for assessing convergence b ecause of the following result.

Theorem 1.4 (The Coupling Lemma [29, 1]). Let Y

∼

and X

= x

both evolve

according to a coupled Markov chain. Then for all measurable C,

|P(X

∈C|X

= x) −

(C)|≤P(X

= Y

In other words, the distance between the distribution of X

and the stationary

distribution is bounded above by the probability that the process X

has not coupled

at time t with a stationary process Y

Even if it is po ssible to use the coupling lemma to determine how close the law of

the current state is to the target distribution, at best this can only give approximately

correct samples. Still, it is a start!

1.8 Designing Markov chains

In order to use Markov chain Monte Carlo, it is necessary to build a Harris chain

for which

is a stationary distribution . It is usually better to require more than just

stationarity, and to build a chain that is reversible. To describe reversibility, it is help-

ful to have a differential notation for measures. When

has a density f (x),then

let

(dx)= f (x) dx. This differential notation means that for all measurable sets

(A)=



x∈A

(dx)=



x∈A

f (x) dx. Using this differential notation, reversibility

works as follows.

Deﬁnition 1.34. A distribution

is reversible with respect to a Markov chain if

(dx)P(X

t+1

∈ dy|X

= x)=

(dy)P(X

t+1

∈ dx|X

= y).

Lemma 1.2. If

is reversible, then it is also stationary.

Proof. Let Ω be the state space of the Markov chain, and

be reversible for the

chain. Then let X

∼

,andC be any measurable set. Then

P(X

t+1

∈C)=E[1(X

t+1

∈C)]

= E[E[1(X

t+1

∈C)|X

]]



x∈Ω

E[1(X

t+1

∈C)|X

= x]

(dx)



x∈Ω

P(X

t+1

∈C|X

= x)

(dx)



x∈Ω



y∈C

P(X

t+1

∈ dy|X

= x)

(dx)



y∈C



x∈Ω

P(X

t+1

∈ dx|X

= y)

(dy)



y∈C

P(X

t+1

∈ Ω|X

= y)

(dy)



y∈C

(dy)

(C).

DESIGNING MARKOV CHAINS 17

Hence

is stationary. The swapping of the order of the iterated integrals in the

proof is allowed by Tonelli’s theorem since the integrand is nonnegative; see for

instance [31].

The main reversible Markov chain types are Gibbs samplers, Metropolis-

Hastings, and shift-move chains. Auxiliary variable chains can be used together with

these ideas to expand their range of applicability. Each of these methods is now dis-

cussedinturn.

1.8.1 Gibbs sampling

There are many varieties of Gibbs sampler. The simplest operates over a state space

of the form C

.Callv ∈V a dimension of the problem. Given a current state X

= x,

the Gibbs sampler chooses a dimension v uniformly from V ,thenletsL(x,v) be

all the states that match the label of x everywhere except possibly at dimension v.

So L(v)={y : (∀w ∈ V {v})(y(w)=x(w))}. Given X

= x, the next state X

t+1

chosen from

conditioned to lie in L(x,v).

As an example, consider the Ising model when

= 0. In this case, a dimension

of the problem consists of a single node v. So select a node v uniformly at random

from V , and consider the two states that equal x at every node w = v.Thevalueat

node v can be 1 or -1. Call these two states x

v→1

and x

v→−1

, respectively.

Then make the next state either x

v→1

or x

v→−1

, where the probability of choosing

v→1

is proportional to

({x

v→1

}).So

P(X

t+1

= x

v→1

({x

v→1

})

({x

v→1

})+

({x

v→−1

})

. (1.12)

Recall that for the Ising model with

= 0,

({x})=

∏

{i, j}∈E

exp(

1(x(i)=x( j)). (1.13)

In (1.12), the normalizing constant Z cancels out. So does the exp(

1(x(i)=

x( j))) factors unless either i or j equals v. So the fraction reduces to

P(X

t+1

= x

v→1

∏

j:{v, j}∈E

1(x( j)=1)

∏

j:{v, j}∈E

1(x( j)=1)+

∏

j:{v, j}∈E

1(x( j)=−1)

exp(

)

exp(

)+exp(

−1

)

where n

is the number of neighbors of v that have label c.

For instance, suppose the node v is adjacent to three nodes labeled 1, and one

node labeled 0. Then the chance that v should be labeled 1 is exp(3

)/[exp(

exp(3

)].

This gives the following pseudocode for taking one step in the Markov chain.

18 INTRODUCTION

Ising Gibbs Input: old state x Output: new state x

1) Draw v ← Unif(V ), U ← Unif([0,1])

2) Let n

be the number of neighbors of v labeled 1 in x

3) Let n

−1

be the number of neighbors of v labeled - 1 in x

4) If U < exp (n

)/[exp(n

)+exp(n

−1

)]

5) x(v) ← 1

6) Else

7) x(v) ←−1

Note here that the node v is being chosen uniformly at random from among the

nodes of the graph. This is known as using random scan. Another popular choice

with Gibbs is to run through all the nodes one at a time in either a speciﬁed or a

random order. This is known as deterministic scan.

More generally, suppose that the current state is a vector (X (1),X (2),...,X(n)).

Then choose some subset of states I according to some d istribution. Then the next

state chooses values for X(i) for all i ∈I from

conditioned on the values of X(i



) for

all i



/∈ I. For instance, in the Ising model, an edge {i, j} could be chosen uniformly

from the set of edges, and the values of X (i) and X ( j) could be chosen from

conditioned on all of the remaining states.

In random scan, the set I is chosen uniformly at random. It is easily to verify that

as long as the choice of I is independent of the current state, the resulting chain is

stationary with respect to

. It is usually not difﬁcult to show that it is possible to

reach any state from any starting state in the ﬁnite number of moves.

1.8.2 Metropolis-Hastings

With Metropolis-Hastings, for each state x of the chain, there is a density q

. Choose

a p roposal state y according to q

. Then the next state of the chain will be either x or

y. T he probability the next state is y is chosen in such a way that reversibility holds

with r espect to the target distribution. If the next state is y, say that the chain accepts

the move to y. Otherwise, the chain rejects the move, and stays at x.

For this to work, it is necessary that if q

(y) > 0, then q

(x) > 0 as well. Then the

move to state y occurs with probability

min



(y)q

(x)

(x)q

(y)



. (1.14)

Note that f

can be either a normalized or unnormalized density for the target distri-

bution.

To see this in action, consider again the Ising model with

= 0. Then suppose

at each step a node v is selected uniformly at random, and then a candidate color for

y(v) is chosen uniformly from {−1, 1}.Thenq

(x)=q

(y)=1/2 so those factors

in (1.14) cancel out. In fact, the original Metropolis et al. m ethod [91] just considered

chains where q

(x)=q

(y). Only later did Hastings [47] consider q

(y) = q

(x).

Again we need to know n

(the number of neighbors of v labeled 1) and n

−1

(the

DESIGNING MARKOV CHAINS 19

number of neighbors labeled -1) to calculate f

(y)/ f

(x), but again the remainder of

the factors in f

simply cancel out. The resulting algorithm looks like this.

Ising MH Input: old state x Output: new state x

1) Draw v ← Unif(V ), U ← Unif([0,1]), c ← Unif({−1,1})

2) Let n

be the number of neighbors of v labeled 1 in x

3) Let n

−1

be the number of neighbors of v labeled - 1 in x

4) If U < exp(n

)/exp(n

x(v)

)

5) x(v) ← c

Now consider trying to build a M arkov chain whose stationary distribution has

a density with respect to the uniform distribution over permutations. It is impo ssi-

ble to change only one component of a permutation, so at least two values of the

permutation vector must be altered.

Given a permutation

,let

i↔j

(k)=

⎧

⎨

⎩

( j) when k = i

(i) when k = j

(k) when k /∈{i, j}

denote the move that transposes (swaps) the items at positions i and j.

Given current state x, a common way to construct a chain on permutations is

to choose i and j uniformly at random, and let the proposed state be y = x

i↔j

Then q

(y)=q

(x), and the chance of accepting the move is the minimum of 1

and f (y)/ f (x). Since only two elements are being considered, this is usually easy to

calculate.

As an example, consider the uniform distribution over linear extensions of a

poset. With respect to the uniform distribution on permutations, the density of a per-

mutation where two elements are out of order is 0, while the density of any valid

linear extension is 1.

A transposition is called adjacent if j = i +1. By choosing only adjacent transpo-

sitions, it is easy to check if the p artial order is still satisﬁed. For i ∈{1, 2,...,n −1}

and j = i+1, let

(i)=a and

(i+1)=b. Then do not transpose if a ≺b;otherwise,

the transposition always occurs.

1.8.3 Auxiliary random variables

In most of the applications of MCMC, the target density of X has a multiplicative

structure. For these densities, it is possible to add a vector Y of auxiliary (extra)

random variables such that (X ,Y ) has a simpler joint distribution.

As an example, consider once again the Ising model with

= 0. Then for each

edge {i, j}, create an auxiliary random variable Y ({i, j}) whose distribution condi-

tioned on X is uniform over [0,1] if X(i) = X ( j), and uniform over [0,exp(

)] if

X(i)=X ( j).

The density of a uniform over [a,b] is f (s)=(b−a )

−1

1(s ∈[a,b ]), so this choice

20 INTRODUCTION

for Y makes the joint density of (X ,Y ) uniform over X ∈{0,1}

and Y ∈{[0, ∞) :

(∀{i, j})(y ({i, j}) ≤ min(exp(

),1))}.

The Markov chain then is simple: given the state x, choose a new value of y

conditioned on x, and then choose a new x conditioned on y. (This can be viewed as

a deterministic scan Gibbs sampler applied to the new joint distribution.)

Auxiliary chain Input: old state (x,y) Output: new state (x ,y)

1) Draw y conditioned on x

2) Draw x conditioned on y

From the construction, drawing y conditioned on x is straightforward. Drawing x

conditioned on y can be more complex. Consider again the Ising model with

> 0.

Then exp(

) > 1. When y({i, j}) ∈ [0, 1], then it could be true that x(i)=x( j) or

x(i) = x( j).Butwheny({i, j}) > 1, it must be true that x(i)=x( j). The edges with

y({i, j}) > 1 break the graph into clusters of connected components. Each component

must have the same x value across the component, but otherwise is uniform.

Therefore, break the graph into connected components u sing the edges with

y({i, j}) > 1, then for each connected component, choose a label uniformly from

{−1,1}and apply that label to every node in the component.

This example was one of the ﬁrst examples of an auxiliary variable chain, and

is due to Swendsen and Wang [121]. For this reason, auxiliary variable chains are

sometimes known as Swendsen-Wang chains.

Swendsen Wang chain Input: old state ( x,y) Output: new state (x,y)

1) For each edge {i, j}∈E

2) Draw y({i, j}) ← Unif([0,exp(

·1(x(i)=x ( j)))])

3) Let A ←{{i, j} : y({i, j}) > 1}

4) For each connected component C in the graph (V,A)

5) Draw B ← Unif({−1,1})

6) For each v ∈ B,setx(v) ← B

1.8.4 Slice sampler

The slice sampler (see [95] for m ore details) is a special case of auxiliary vari-

ables that uses just a single component Y with conditional distribution [Y |X = x] ∼

Unif([0, f

(X)]).Then[X |Y = y] ∼ Unif(Ω ∩{x : f

(x) ≥ y}).

This last set Ω ∩{x : f

(x) ≥ y} is a slice through the density of f

at height y,

hence the name slice sampler.

1.8.5 Shift chains

Although Gibbs, Metropolis-Hastings, and auxiliary random variables are the most

widely used types of Markov chains, they are not the only types o f chains.

An important type of chain that is useful for repulsive processes is the shift move.

The canonical example is permutations: they are repulsive in the sense that two com-

ponents of the permutation cannot both be assigned the same value. The solution to

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1: Introduction (4/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1: Introduction (4/5)