40 ACCEPTANCE/REJECTION
2.7.2 Chernoff bounds as an AR algorithm
Chernoff bounds [23] give upper limits on the p robability that a sum of random vari-
ables is greater (or smaller) than a set value. Let S
n
= X
1
+ ···+X
n
,whereX
1
,...,X
n
are iid; then the analog sampling goal to generate a draw from S
n
conditioned on
S
n
≥ a or S
n
≤ a.
Markov’s inequality can also be applied to the sum o f random variables, but the
bound only goes down as the inverse of n. Chernoff bounds, on the other hand, tend
to decrease exponentially in n, thereby giving much closer matches to the actual
behavior of the tails of S
n
and a much faster way of generating samples from the
tails.
Chernoff bounds can be viewed as an application of Markov’s inequality to the
moment-generating function of X.
Lemma 2.9 (Chernoff bounds). Consider a random variable X where E[e
tX
] is finite
for t ∈[a,b]. Then for a ∈ R,
P(X ≥ a) ≤ mgf
X
(t)/exp(ta) for all t ∈ [0, b], (2.8)
P(X ≤ a) ≤ mgf
X
(t)/exp(ta) for all t ∈ [a, 0]. (2.9)
To get the best possible bound on the right-hand side, minimize over the allowed
values of t.
Proof. First P(X ≥ a )=P(tX ≥ a) for any positive t.NextP(tX ≥ ta)=
P(exp(tX) ≥ exp(ta)). Finally, apply Markov’s inequality to get P(exp(tX ≥
exp(ta))) ≤ E[exp(tX )]/exp(ta). The result for P(X ≤a) is shown similarly.
Now consider what happens when Chernoff bounds are used on a sum of inde-
pendent random variables. First a fact about moment-generating functions.
Lemma 2.10. Fo r S
n
= X
1
+ ···+ X
n
,wherethe{X
i
} are iid with finite moment-
generating functions, E[exp(tS
n
)] = [E[exp(tX
i
)]]
n
.
Applying this to Chernoff bounds immediately yields the following.
Lemma 2.11. Fo r S
n
= X
1
+ ···+ X
n
,wherethe{X
i
} are iid with mgf
X
i
(t) is finite
for t ∈[a,b],
P(S
n
≥
α
n) ≤
mgf
X
i
(t)
exp(t
α
)
n
for all t ∈ [0, b], (2.10)
P(S
n
≤
α
n) ≤
mgf
X
i
(t)
exp(t
α
)
n
for all t ∈ [a, 0]. (2.11)
Say that X
i
∼ f
X
. The goal is to use the moment-generating function idea that led
to the Chernoff bound to obtain a better AR-type algorithm. To accomplish this goal,
it must be possible to draw from the twisted density g
t
(x) ∝ e
tx
f
X
(x).Whent is large,
this density pushes the probability in density g
t
out to larger values. When t is small
and negative, this pushes the probability in g
t
to more negative values.
Let t > 0. For x ≥ a,theng
t
(x)=e
tx
f
X
(x) ≥ e
ta
f
X
(x). Therefore a draw from g
t