THE PAIRED PRODUCT ESTIMATOR 209
Proof. In the rst phase,
ε
1
=(1/2)(1+
1 + 4ln(r)). This was chosen so that
ε
2
1
=
ln(r)+
ε
1
. So putting that into Lemma 11.3 gives
P(e
ε
1
ˆr
1
/r e
ε
1
) > 1 2exp
2ln(4
δ
1
)
ε
2
1
ln(r)+
ε
1
= 1
δ
2
. (11.16)
So that means that with probability at least 1
δ
/2, ˆr
1
ln(r)(1/2)
1 + 4ln(r),
or equivalently ln(r) ln(ˆr
1
)+
ln(ˆr
1
)+1 + 1.
Using Lemma 11.3 once more with k
2
then completes the proof of (11.14).
From Lemma 11.2, each run of TPA uses on average ln(r)+1 samples. The value
of k
1
is xed; for k
2
, note that E[ˆr
1
]=r,and[ln(x)+
ln(x)+1+1+
ε
] is a convex
function, so E[[ln(ˆr)+
ln(ˆr)+1+1+
ε
]] ≤−[ln(r)+
ln(r)+1+1+
ε
]. Using
x≤x + 1 completes the proof.
11.5 The paired product estimator
In the previous section, the running time of TPA was shown to be quadratic in ln(r).
Since r is usually exponentially large in the problem size, this tends to make TPA
require a number o f samples that is quadratic in the problem size.
The examples where TPA was used were all Gibbs distributions. In this case,
TPA coupled with importance sampling can b e a way to get more accurate estimates.
First return to TPA. Suppose it is run k times over b [0,
β
], and all the b values
from the runs are collected. Then if r
i
= ln(b
i
),andr
(1)
< r
(2)
< ···r
(N)
are the
order statistics of those points, these form a Poisson point process of rate k over
[Z
B
β
,Z
0
].
So if every kth point is considered, r
(k)
,r
(2k)
,..., these will (on average) be about
distance 1 apart. Let a
i
= Z
b
ik
.Thena
i+1
/a
i
will be near exp(1).Lets(i)=e
r
(i)
be
the parameter value associated with this point.
In TPA, the estimate used is a
i
/a
i+1
= e
1
. But importance sampling can be used
to get a tighter estimate. The idea is as follows. Draw a sample X
1
,...,X
n
using
parameter value s(i).ThenletW
j
= exp(H(X
j
)(s(i) s(i + 1))).
E[W
j
]=
x
exp(H(x)(s(i) s(i + 1)))exp (H(x)s(i)
Z
s(i)
=
Z
s(i+1)
Z
s(i)
. (11.17)
For each level i then draw W
1
,W
2
,...,W
iid from the distribution using parameter
value s(i). Then average them to get
ˆ
μ
i
Z
s(i+1)
/Z
s(i)
. (11.18)
If there are such parameters, let s( + 1)=
β
.Thisgives
+1
i=1
ˆ
μ
i
Z
s(1)
Z
s(0)
···
Z
s()
Z
s(1)
·
Z
s(+1)
Z
s()
=
Z
β
Z
0
. (11.19)
This typ e of estimate was introdu ced in [125] and later in [ 72]. This was called a
product estimator by Fishman [38]. To understand the variance of such an estimator,
it helps to have the notation of relative variance.
210 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION
Denition 11.2. A random variable X with nite second moment has relative vari-
ance
V
rel
(X)=
V(X)
E[X]
2
=
E[X
2
]
E[X]
2
1. (11.20)
Then the mean and relative variance of a product estimator are as follows.
Lemma 11.6. For P =
P
i
where the P
i
are independent random variables with
nite second moment,
E[P]=
E[P
i
], V
rel
(P)=1 +
(1 + V
rel
(P
i
)). (11.21)
Proof. Since the P
i
are independent, E[P]=
E[P
i
] and E[P
2
]=
E[P
i
]
2
. Hence
E[P
2
]/E[P]
2
=
E[P
2
i
]/E[P
i
]
2
and the rest just follows from the denition o f relative
variance.
For the importance sampler given earlier,
E[W
2
j
]=
x
exp(H(x)(s(i) s(i + 1)))
2
exp(H(x)s(i))
Z
s(i)
=
Z
2s(i+1)s(i)
Z
s(i)
. (11.22)
which gives
V
rel
(W
j
)=1 +
Z
2s(i+1)s(i)
Z
s(i)
Z
s(i+1)
2
(11.23)
Unfortunately, this expression is rather unwieldy to work with, since 2s(i + 1)
s(i) / [s(i),s(i + 1)], and so involves values of the partition function at par ameter
values outside of the interval of interest.
This problem can be solved by the paired product estimator. Let X
1
,...,X
n
be iid
draws from the parameter value s(i),andY
1
,...,Y
n
be iid draws from the parameter
value s(i + 1).
To simplify the notation, let m
i
=(s(i + 1)+s(i))/2 be the midpoint of the pa-
rameter values, and h
i
=(s(i + 1) s(i))/2 be the half length of an interval. Then
set
R
j
= exp(h
i
H(X
j
)), S
j
= exp(h
i
H(Y
j
)). (11.24)
A similar calculation to (11.17) gives
E[R
j
]=
Z
m
i
Z
s(i)
, E[S
j
]=
Z
m
i
Z
s(i+1)
(11.25)
and
V
rel
[R
j
]=V
rel
[S
j
]=1 +
Z
s(i+1)
Z
s(i)
Z
2
m
i
. (11.26)
For level i,let
ˆ
μ
i
=(R
1
+ ···+ R
n
)/n and
ˆ
κ
i
=(S
1
+ ···+ S
n
)/n.ThenletR =
i
ˆ
μ
i
and S =
i
ˆ
κ
i
.ThenE[R]/E[S]=Z
β
/Z
0
,andifn is large both R and S will be
close to their means. Therefore, the nal estimate is R/S.
The point of this more complicated procedure is that when the schedule is well-
balanced, the relative variance of the estimate R and S will be close to 0.
CONCENTRATION OF THE RUNNING TIME 211
Theorem 11.2. Let X be a d raw from the distribution with parameter value
β
, and
η
4e
ε
2
/(2 + ln(2E[H(X)])). Then if for all i, Z
s(i+1)
/Z
s(i)
e
η
,then
V
rel
(R)=V
rel
(S)
ε
. (11.27)
The proof is relatively complex, see [58] for details.
For many problems, it is possible to obtain an upper bound on E[X ] based on the
problem input, which can then be used to get a usable bound on
η
. For instance, in
the Ising model, E[X ] is upper bounded by the number of nodes of the graph.
11.6 Concentration of the running time
Many of the results shown in this text have been of the form of bounds on the ex-
pected running time of the procedure. While it is also usually possible to bound the
standard deviation of the running time using similar procedures, such arguments tend
to be far more complex than the analysis of the expected value.
Therefore, it is useful to understand how well the running time concentrates
around the mean for various forms of perfect simulation. In general, the goal is to
obtain an exponential tail on the running time.
Denition 11.3. Let X be a nonnegative random variable with nite mean where
there exist nonnegative constants
α
1
and
α
2
such that for all sufcien tly large k,
P(X kE[X])
α
1
k
α
2
. Then say that X has a polynomial tail. Suppose there ex-
ist positive constants
α
1
and
α
2
such that for sufciently large k, P(X kE[X ])
α
1
exp(
α
2
k). Then say that X has an exponential tail.
In general it is not possible to guarantee an exponential tail on the running time
of a particular perfect simulation procedure. That being said, for interruptible algo-
rithms, or for CFTP, it is possible to guarantee a tail that goes down faster than any
polynomial in k. In order to describe these algorithms, it is helpful to have the notion
of quantile.
Denition 11.4. For a random variable T , an
α
-quantile of T is any number a such
that P(T a)
α
and P(T a) 1
α
. Denote the smallest such
α
-quantile by
q
α
(T ).
In this section, the quantile q
3/4
(T ) will be the one considered, so P(T
q
3/4
(T )) 1/4. For random variables that are nite with probability 1, q
3/4
(T ) is
nite whether or not T has a nite mean. In the situation th at T does have a nite
mean E[T ], then by Markov’s inequality q
3/4
(T ) 4E[T ].
Theorem 11.3. Consider an interruptible perfect simulation algorithm A .LetT
A
denote the random number of steps taken by A . Then there exists an interruptible
algorithm B that takes T
B
steps where E[T
B
] 6q
3/4
(T ), and for k 1, P(T
B
kE[T
B
]) exp(k).
To prove the theorem, it is necessary to build an algorithm with the desired con-
centration properties. Before getting to that algorithm, rst a subroutine is needed
that takes input t, and then runs algorithm A forward as most t steps. If algorithm A
returns an answer X,thenBounded
time A returns X as well. Otherwise it returns
the symbol , indicating that it failed to complete in time.
212 APPLICATIONS AND LIMITATIONS OF PERFECT SIMULATION
Bounded time A Input: t
1) Run A for at most t steps
2) If the algorithm had nished, output the result of the run X
3) Else output
Note that because A is interruptible, the output of Bounded time A conditioned
on X = (or T t) is the same as the target distribution. The same fact does not
hold if the algorithm is noninterruptible!
Now algorithm Doubled
Concentrated A is run as follows.
Doubled Concentrated A Input: t Output: Y
1) t 1
2) Repeat
3) Let Y be the output of Fixed
time A(t)
4) t 2t
5) Until Y =
This algorithm has a polynomial tail on the running time.
Lemma 11.7. Doubled
Concentrated A returns output Y from the target distri-
bution. Let T
A
and T
B
be the running times of A and Doubled Concentrated A.
Then E[T
B
] 4q
3/4
(T
A
). Moreover, for k 2 it holds that P(T
B
kq
3/4
(T
A
))
4/k
2
.IfE[T
B
] < ,thenP(T
B
kE[T
B
] (1/2) exp([2ln(2)]
1
ln(k)
2
+
(3/2)ln(k)) and E[T
B
] 6.52E[T
A
].
Proof. Let algorithm B be Doubled
Concentrated A, and as before, let T
A
denote
the random number of steps taken by algorithm A . From our AR theory in Chap ter 2,
the output of B is a draw from the output of Bounded
time A for some input t, call
it Y , which is conditioned on Y =. No matter what the value o f t was, this is a draw
from the target distribution conditioned on the number of steps T
A
t. Sin ce the
original algorithm A was interruptible, this conditional distribution of X is the same
as the unconditional distribution of X no matter what the nal t used was.
Consider the ith time through the repeat loop. The number of steps of A taken
during this loop is 2
i
. Now consider P(T
B
> s).Ins steps, there must be a block
of size at least s/3. The next smaller block must be of size at least s/4. After that
the block size decreases by a factor of 2, giving s/8, s/16, and so on. Now suppose
s = kq
3/4
(T
A
) for integer d.
Then the number o f blocks of size at least q
3/4
(T
A
) is at least 2 when k [4,8),
at least 3 when k [8,16), and so on.
The chance that n of these blocks of size q
3/4
(T
A
) all return is at most
(1/4)
n
4
(log
2
(k)+1)
= 4/k
2
.
CONCENTRATION OF THE RUNNING TIME 213
The value of E[T
B
] can then be upper bounded using the tail sum formula:
E[T
B
]=
s=0
P(T > s) ds
= 2q
3/4
(T
A
)+
s=2q
3/4
(T
A
)
P(T > s) ds
= 2q
3/4
(T
A
)+q
3/4
(T
A
)
k=2
P(T > kq
3/4
(T
A
)) dk
= 2q
3/4
(T
A
)+q
3/4
(T
A
)
k=2
4/k
2
dk
= 4q
3/4
(T
A
).
When the mean of T
A
is known to be nite, a better result is possible using
Markov’s inequality. Consider once ag a in P(T
B
> s). The largest block in these s
steps is of length at least s/3, the next largest at least s/4, and then it goes d own by
a factor of 2.
For a block of length rE[T
A
], Markov’s inequality says that the chance the output
is is at most 1/r. Suppose k = 2
d
(if k (2
d
,2
d+1
), rounding k down to the next
power of two can only decrease the chance of failure). The chance of failure for the
largest blocks when s = 2
d
E[T
A
] is at most 3/2
d
, followed by 4/2
d
,8/2
d
, and so on
up to 1. Now
3
2
d
d
i=2
2
i
2
d
= 2
(1/2)d
2
+(3/2)d1
=(3/k)k
log
2
(k)/2
·k
3/2
·(1/2),
and simplifying gives the result.
A tail sum analysis then gives E[T
B
] 6.52E[T
A
].
11.6.1 Subpolynomial tails on the runtime
This accomplishes our goal of a tail that is lighter than any polynomial when E[T
A
]
is nite, but only gives a polynomial tail when E[T
A
] is unknown.
To do better, suppose that the time allotted to run is of the form 3
d
.Use
this time as follows: in the rst 3
d1
steps, call the algorithm recursively. In the
the second 3
d1
steps, call the algorithm recursively. In the nal 3
d1
steps, use
Bounded
time A. This gives the following algorithm.
Tripled Concentrated A Input: d Output: Y
1) If d = 0then
2) Y Bounded
time A(1)
3) Else
4) Y Tripled
Concentrated A(d 1)
5) If Y = then
6) Y Tripled
Concentrated A(d 1)
7) If Y = then
8) Y Bounded time A(3
d1
)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset