Chapter 11: Applications and Limitations of Perfect Simulation (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

THE PAIRED PRODUCT ESTIMATOR 209

Proof. In the ﬁrst phase,

=(1/2)(1+



1 + 4ln(r)). This was chosen so that

ln(r)+

. So putting that into Lemma 11.3 gives

P(e

−

≤ ˆr

/r ≤ e

) > 1 −2exp



−

2ln(4

−1

)

ln(r)+



= 1 −

. (11.16)

So that means that with probability at least 1−

/2, ˆr

≥ln(r)−(1/2)



1 + 4ln(r),

or equivalently ln(r) ≤ ln(ˆr



ln(ˆr

)+1 + 1.

Using Lemma 11.3 once more with k

then completes the proof of (11.14).

From Lemma 11.2, each run of TPA uses on average ln(r)+1 samples. The value

of k

is ﬁxed; for k

, note that E[ˆr

]=r,and−[ln(x)+



ln(x)+1+1+

] is a convex

function, so E[−[ln(ˆr)+



ln(ˆr)+1+1+

]] ≤−[ln(r)+



ln(r)+1+1+

]. Using

x≤x + 1 completes the proof.

11.5 The paired product estimator

In the previous section, the running time of TPA was shown to be quadratic in ln(r).

Since r is usually exponentially large in the problem size, this tends to make TPA

require a number o f samples that is quadratic in the problem size.

The examples where TPA was used were all Gibbs distributions. In this case,

TPA coupled with importance sampling can b e a way to get more accurate estimates.

First return to TPA. Suppose it is run k times over b ∈ [0,

], and all the b values

from the runs are collected. Then if r

= ln(b

),andr

(1)

< r

(2)

< ···r

(N)

are the

order statistics of those points, these form a Poisson point process of rate k over

[−Z

,−Z

So if every kth point is considered, r

(k)

(2k)

,..., these will (on average) be about

distance 1 apart. Let a

= Z

.Thena

i+1

will be near exp(1).Lets(i)=e

(i)

the parameter value associated with this point.

In TPA, the estimate used is a

i+1

= e

. But importance sampling can be used

to get a tighter estimate. The idea is as follows. Draw a sample X

,...,X

using

parameter value s(i).ThenletW

= exp(H(X

)(s(i) −s(i + 1))).

E[W



exp(H(x)(s(i) −s(i + 1)))exp (−H(x)s(i)

s(i)

s(i+1)

s(i)

. (11.17)

For each level i then draw W

,...,W



iid from the distribution using parameter

value s(i). Then average them to get

≈ Z

s(i+1)

s(i)

. (11.18)

If there are  such parameters, let s( + 1)=

.Thisgives

+1

∏

i=1

≈

s(1)

s(0)

···

s()

s(−1)

s(+1)

s()

. (11.19)

This typ e of estimate was introdu ced in [125] and later in [ 72]. This was called a

product estimator by Fishman [38]. To understand the variance of such an estimator,

it helps to have the notation of relative variance.

210 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION

Deﬁnition 11.2. A random variable X with ﬁnite second moment has relative vari-

ance

rel

(X)=

V(X)

E[X]

E[X

]

E[X]

−1. (11.20)

Then the mean and relative variance of a product estimator are as follows.

Lemma 11.6. For P =

∏

where the P

are independent random variables with

ﬁnite second moment,

E[P]=

∏

E[P

], V

rel

(P)=−1 +

∏

(1 + V

rel

)). (11.21)

Proof. Since the P

are independent, E[P]=

∏

E[P

] and E[P

∏

E[P

]

. Hence

E[P

]/E[P]

∏

E[P

]/E[P

]

and the rest just follows from the deﬁnition o f relative

variance.

For the importance sampler given earlier,

E[W



exp(H(x)(s(i) −s(i + 1)))

exp(−H(x)s(i))

s(i)

2s(i+1)−s(i)

s(i)

. (11.22)

which gives

rel

)=−1 +

2s(i+1)−s(i)

s(i)

s(i+1)

(11.23)

Unfortunately, this expression is rather unwieldy to work with, since 2s(i + 1) −

s(i) /∈ [s(i),s(i + 1)], and so involves values of the partition function at par ameter

values outside of the interval of interest.

This problem can be solved by the paired product estimator. Let X

,...,X

be iid

draws from the parameter value s(i),andY

,...,Y

be iid draws from the parameter

value s(i + 1).

To simplify the notation, let m

=(s(i + 1)+s(i))/2 be the midpoint of the pa-

rameter values, and h

=(s(i + 1) −s(i))/2 be the half length of an interval. Then

set

= exp(−h

H(X

)), S

= exp(h

H(Y

)). (11.24)

A similar calculation to (11.17) gives

E[R

s(i)

, E[S

s(i+1)

(11.25)

and

rel

]=V

rel

]=−1 +

s(i+1)

s(i)

. (11.26)

For level i,let

=(R

+ ···+ R

)/n and

=(S

+ ···+ S

)/n.ThenletR =

∏

and S =

∏

.ThenE[R]/E[S]=Z

,andifn is large both R and S will be

close to their means. Therefore, the ﬁnal estimate is R/S.

The point of this more complicated procedure is that when the schedule is well-

balanced, the relative variance of the estimate R and S will be close to 0.

CONCENTRATION OF THE RUNNING TIME 211

Theorem 11.2. Let X be a d raw from the distribution with parameter value

, and

≤ 4e

−2

/(2 + ln(2E[H(X)])). Then if for all i, Z

s(i+1)

s(i)

≤ e

,then

rel

(R)=V

rel

(S) ≤

. (11.27)

The proof is relatively complex, see [58] for details.

For many problems, it is possible to obtain an upper bound on E[X ] based on the

problem input, which can then be used to get a usable bound on

. For instance, in

the Ising model, E[X ] is upper bounded by the number of nodes of the graph.

11.6 Concentration of the running time

Many of the results shown in this text have been of the form of bounds on the ex-

pected running time of the procedure. While it is also usually possible to bound the

standard deviation of the running time using similar procedures, such arguments tend

to be far more complex than the analysis of the expected value.

Therefore, it is useful to understand how well the running time concentrates

around the mean for various forms of perfect simulation. In general, the goal is to

obtain an exponential tail on the running time.

Deﬁnition 11.3. Let X be a nonnegative random variable with ﬁnite mean where

there exist nonnegative constants

and

such that for all sufﬁcien tly large k,

P(X ≥ kE[X]) ≤

−

. Then say that X has a polynomial tail. Suppose there ex-

ist positive constants

and

such that for sufﬁciently large k, P(X ≥ kE[X ]) ≤

exp(−

k). Then say that X has an exponential tail.

In general it is not possible to guarantee an exponential tail on the running time

of a particular perfect simulation procedure. That being said, for interruptible algo-

rithms, or for CFTP, it is possible to guarantee a tail that goes down faster than any

polynomial in k. In order to describe these algorithms, it is helpful to have the notion

of quantile.

Deﬁnition 11.4. For a random variable T , an

-quantile of T is any number a such

that P(T ≤ a) ≥

and P(T ≥ a) ≤ 1 −

. Denote the smallest such

-quantile by

(T ).

In this section, the quantile q

3/4

(T ) will be the one considered, so P(T ≥

3/4

(T )) ≤ 1/4. For random variables that are ﬁnite with probability 1, q

3/4

(T ) is

ﬁnite whether or not T has a ﬁnite mean. In the situation th at T does have a ﬁnite

mean E[T ], then by Markov’s inequality q

3/4

(T ) ≤ 4E[T ].

Theorem 11.3. Consider an interruptible perfect simulation algorithm A .LetT

denote the random number of steps taken by A . Then there exists an interruptible

algorithm B that takes T

steps where E[T

] ≤ 6q

3/4

(T ), and for k ≥ 1, P(T

≥

kE[T

]) ≤ exp(−k).

To prove the theorem, it is necessary to build an algorithm with the desired con-

centration properties. Before getting to that algorithm, ﬁrst a subroutine is needed

that takes input t, and then runs algorithm A forward as most t steps. If algorithm A

returns an answer X,thenBounded

time A returns X as well. Otherwise it returns

the symbol ⊥, indicating that it failed to complete in time.

212 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION

Bounded time A Input: t

1) Run A for at most t steps

2) If the algorithm had ﬁnished, output the result of the run X

3) Else output ⊥

Note that because A is interruptible, the output of Bounded time A conditioned

on X =⊥ (or T ≤ t) is the same as the target distribution. The same fact does not

hold if the algorithm is noninterruptible!

Now algorithm Doubled

Concentrated A is run as follows.

Doubled Concentrated A Input: t Output: Y

1) t ← 1

2) Repeat

3) Let Y be the output of Fixed

time A(t)

4) t ← 2t

5) Until Y =⊥

This algorithm has a polynomial tail on the running time.

Lemma 11.7. Doubled

Concentrated A returns output Y from the target distri-

bution. Let T

and T

be the running times of A and Doubled Concentrated A.

Then E[T

] ≤ 4q

3/4

). Moreover, for k ≥ 2 it holds that P(T

≥ kq

3/4

)) ≤

4/k

.IfE[T

] < ∞,thenP(T

≥ kE[T

] ≤ (1/2) exp(−[2ln(2)]

−1

ln(k)

(3/2)ln(k)) and E[T

] ≤ 6.52E[T

Proof. Let algorithm B be Doubled

Concentrated A, and as before, let T

denote

the random number of steps taken by algorithm A . From our AR theory in Chap ter 2,

the output of B is a draw from the output of Bounded

time A for some input t, call

it Y , which is conditioned on Y =⊥. No matter what the value o f t was, this is a draw

from the target distribution conditioned on the number of steps T

≤ t. Sin ce the

original algorithm A was interruptible, this conditional distribution of X is the same

as the unconditional distribution of X no matter what the ﬁnal t used was.

Consider the ith time through the repeat loop. The number of steps of A taken

during this loop is 2

. Now consider P(T

> s).Ins steps, there must be a block

of size at least s/3. The next smaller block must be of size at least s/4. After that

the block size decreases by a factor of 2, giving s/8, s/16, and so on. Now suppose

s = kq

3/4

) for integer d.

Then the number o f blocks of size at least q

3/4

) is at least 2 when k ∈ [4,8),

at least 3 when k ∈ [8,16), and so on.

The chance that n of these blocks of size q

3/4

) all return ⊥ is at most

(1/4)

≤ 4

−(log

(k)+1)

= 4/k

CONCENTRATION OF THE RUNNING TIME 213

The value of E[T

] can then be upper bounded using the tail sum formula:

E[T



∞

s=0

P(T > s) ds

= 2q

3/4



∞

s=2q

3/4

)

P(T > s) ds

= 2q

3/4

)+q

3/4

)



∞

k=2

P(T > kq

3/4

)) dk

= 2q

3/4

)+q

3/4

)



∞

k=2

4/k

= 4q

3/4

When the mean of T

is known to be ﬁnite, a better result is possible using

Markov’s inequality. Consider once ag a in P(T

> s). The largest block in these s

steps is of length at least s/3, the next largest at least s/4, and then it goes d own by

a factor of 2.

For a block of length rE[T

], Markov’s inequality says that the chance the output

is ⊥ is at most 1/r. Suppose k = 2

(if k ∈ (2

d+1

), rounding k down to the next

power of two can only decrease the chance of failure). The chance of failure for the

largest blocks when s = 2

E[T

] is at most 3/2

, followed by 4/2

,8/2

, and so on

up to 1. Now

∏

i=2

= 2

−(1/2)d

+(3/2)d−1

=(3/k)k

−log

(k)/2

·k

3/2

·(1/2),

and simplifying gives the result.

A tail sum analysis then gives E[T

] ≤ 6.52E[T

11.6.1 Subpolynomial tails on the runtime

This accomplishes our goal of a tail that is lighter than any polynomial when E[T

]

is ﬁnite, but only gives a polynomial tail when E[T

] is unknown.

To do better, suppose that the time allotted to run is of the form 3

.Use

this time as follows: in the ﬁrst 3

d−1

steps, call the algorithm recursively. In the

the second 3

d−1

steps, call the algorithm recursively. In the ﬁnal 3

d−1

steps, use

Bounded

time A. This gives the following algorithm.

Tripled Concentrated A Input: d Output: Y

1) If d = 0then

2) Y ← Bounded

time A(1)

3) Else

4) Y ← Tripled

Concentrated A(d −1)

5) If Y =⊥ then

6) Y ← Tripled

Concentrated A(d −1)

7) If Y =⊥ then

8) Y ← Bounded time A(3

d−1

)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11: Applications and Limitations of Perfect Simulation (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11: Applications and Limitations of Perfect Simulation (3/4)