Chapter 2: Acceptance/Rejection (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

GAMMA BERNOULLI APPROXIMATION SCHEME 35

Moment-generating functions are unique when they are deﬁned over a nontrivial

interval of t values.

Lemma 2.4 (Moment-generating function uniqueness). Suppose mgf

(t)=mgf

(t)

for all t ∈ [a, b] where a < 0 < b. Then X ∼Y.

The important idea needed here is that the sum of a random number o f copies of a

random variable has moment-generating function equal to a composition of moment-

generating functions.

Lemma 2.5. If Y ∈{0,1,2,...} and X

,... have ﬁnite moment-generating func-

tions for t ∈[a,b],thenX

+ ···+ X

has moment-generating function

mgf

+···+X

(t)=mgf

(mgf

(t)). (2.3)

Now to the proof of our fact.

Proof. A geometric random variable G with mean 1/p has mgf

(t)=pe

/(1 −(1 −

p)e

),atleastwhent < −ln(1−p). The mgf of an exponential random variable with

rate

is mgf

(t)=

(

−t)

−1

for t <

.SothemgfofA

+ ···+ A

mgf

+···+A

(t)=

(

−t)

−1

1 −(1 −p)

(

−t)

−1

−t

But this is just the mgf of an exponential of rate p

, which completes the proof.

This procedure of marking points in the Poisson point process with independent

coin ﬂips is called thinning. This concept is discussed in greater detail in Chapter 7.

Now consider the GBAS algorithm from earlier. It does this thinning procedure

starting from a rate of 1, and thinning using the p-coin ﬂips. Therefore the new rate of

the thinned process is p. This is done k times, giving k iid Exp(p) random variables

that are then added together.

When you add independent exponentials together, you get a gamma distribution

with shape parameter equal to the number of exponentials added and a rate parameter

equal to the rate of the exponentials. This establishes the following fact.

Fact 2.3. In GBAS,R∼ Gamma(k , p).

When a gamma distributed random variable is scaled by a constant, the rate

changes by the inverse of the constant. So R/(k −1) ∼ Gamma(k, p(k −1)).The

actual output of GBAS is (k −1)/R, which is the multiplicative inverse of a g amma

distributed random variable. This gives a distribution called the inverse gamma.

Fact 2.4. The output ˆpofGBAS is inverse gamma with shape parameter k and

scale parameter p(k − 1). The estimate is unbiased: E[ ˆp]=p. Also, ( ˆp/p) ∼

InvGamma

(k,1/(k −1)), so the relative error ( ˆp/p) −1 is a shifted inverse gamma

function.

Proof. The mean of X ∼ InvGamma(

) is

−1),soE[ ˆp]=p(k −1)/(k −

1)=p. Dividing ˆp by p is the same as scaling it by 1/p, so the scale parameter is

multiplied by the same amount.

36 ACCEPTANCE/REJECTION

The key point of the fact is that ˆp/p −1 has a known distribution: the value of

p does not affect it in any way! The InvGamma(k,k −1) distribution is concentrated

around 1, and is more tightly concentrated as k increases. So GBAS can be used to

obtain an (

)−ras as follows.

GBAS ras Input:

1) Let k = min{k : X ∼ InvGamma(k, k −1) → P(|X −1|>

) <

}

2) S ← 0, R ← 0

3) Repeat

4) X ← Bern(p), A ← Exp(1)

5) S ← S + X , R ← R + A

6) Until S = k

7) ˆp ← (k −1)/R

2.5.1 Application: exact p-values

As an application of this method, consider the problem of estimating exact p-values.

Recall that a statistical model for data is a probability distribution such that the data

is believed to be a draw from the distribution. For instance, suppose an observer

believes that the number o f taxicabs in a town is 200, and that the number of a ran-

domly spotted taxi is equally likely to be any integer from 1 to 200. Suppose that the

observer sees one taxi, and it is numbered 14.

The observer wishes a principled method for determining if their statistical model

should be rejected based upon the data. One such method for doing so is called the

p-value approach. Let S be any statistic, that is, a function of the data. Then if D

is the data, and the hypothesis to be tested (often called the null hypothesis) is that

D ∼

, then the p-value is just the probability that S(X) is at least as extreme as S(D)

where X ∼

For instance, in the taxicab example, it would be unusual to see a low-numbered

taxi if the nu mber of cabs in the town is very large. Therefore, let S(x)=x,and

consider S(X) to be as extreme than S(D) if S(X) ≤ S(D).

Then S(D)=14, and so the p-value for the hypothesis that there are 200 cabs

and that the likelihood of seeing each cab number is the same is P(S(X) ≤ 14)=

14/200 = 7%.

In general, if X

,...∼

using a perfect simulation algorithm, then the p-value

is exactly the chance that S(X) is at least as extreme as S applied to the data, and so

constitutes th e exact type of experiment where GBAS can be applied.

2.5.1 .1 Testing for Hardy-Weinberg equilibrium

An allele is a variant of a gene in an organism. For instance, one allele mig ht cause

fur to become brown, while a different allele of a gene causes f ur to be black in color.

If a gene is in Hardy-Weinberg equilibrium, the ﬁtness (propensity to reproduce)

of the organ ism is independent of which allele the organism carries. Such alleles will

not be selected either for or against in the population.

A diploid organism carries two copies of each gene, each of which could come

APPROXIMATE DENSITIES FOR PERFECT SIMULATION 37

from different alleles. The order o f the alleles does not m a tter. Hence if the two

alleles for a gene are denoted A and a, the genotype of the organism is either AA, Aa,

or aa. In Hardy -Weinberg equilibrium, the frequency of allele AA will be the square

of the frequency of allele A in the population. The frequency of Aa will be twice

the frequency of A times the frequency of a, and that of aa will be the square of the

frequency of a in the population.

Consider the problem of testing whether a population is in Hardy-Weinberg equi-

librium. For m alleles A

,...,A

, the data consists of the number of sampled individ-

uals with genotype A

for 1 ≤ i < j ≤ m.

Given data table D, Guo and Thompson [43] suggested using the statistics S(X)=

P(X = D),whereX is a random draw from tables of genetic data from populations

in Hardy-Weinberg equilibrium. When there are n individuals, this made the statistic

S(D)=

∏

i=1

∑

j=i+1

(2n)!

∏

j>i

∑

j>i

. (2.4)

The exact p-value is then P(S(Y ) ≤S(D)),whereY is a perfect draw from tables

under the Hardy-Weinberg equilibrium model. The Guo and Thompson method for

directly samp ling fro m such tables was later improved in [59].

2.5.1.2 Testing for differential gene expression

A related task is to test for how gene expression changes under environmental con-

ditions. Each gene encodes proteins that are produced by the cell: the amount of

protein produced can be affected by the environment of the cell. Consider an experi-

ment (such as in [122]) where the expression rate of a gene is measured conditioned

on different environmental factors.

The data is a matrix where K

is the number of experiments under condition j

that resulted in expression of gene i. If the gene is equally likely to be expressed

under all conditions, then (as in the Hardy-Weinberg example), it is easy to generate

data tables in linear tim e.

The test statistic is once again the probability of the table being generated, and the

p-value is the probability that a randomly drawn table has test statistics at m ost equal

to the test statistic applied to the data. Once again, GBAS can be used to estimate this

p-value with exact conﬁdence intervals.

2.6 Approximate densities for perfect simulation

The key step in AR fo r sampling from density f using density g is when fo r X ∼ g,

the C ∼ Bern( f (X)/[cg(X)] coin is ﬂipped. However, it is not necessary to know

f (X )/[cg(X)] exactly in order to ﬂip the C coin.

To draw C ∼Bern(p),drawU ∼Unif([0,1]),andletC = 1(U ≤ p). Now suppose

that for all n, a

≤ p ≤ b

. Then it is not necessary to know p exactly in order to

determine if U ≤ p.

To be precise, suppose that a

≤ a

≤··· and b

≥ b

≥··· have

lima

= lim b

= p. Then draw U ∼ Unif([0, 1]).IfU ≤ lim a

= p,thenC = 1, if

38 ACCEPTANCE/REJECTION

U > lim b

= p,thenC = 0. Fortunately, it is not necessary to compute the entire

inﬁnite sequence of {a

} and {b

} in order to determine where U falls with respect

to the p.IfU ≤ a

for some n, then certainly U ≤ p. On the other hand, if U > b

then deﬁnitely U > p , and the algorithm can stop.

Bounded mean Bernoulli Input: p, {a

}, {b

} Output: C ∼ Bern(p)

1) n ← 1, U ←Unif([0,1])

2) While U ∈ (a

] do

3) n ← n + 1

4) C ← 1(U ≤ a

)

While the algorithm only needs to generate one uniform random variable U ,it

might take quite a few ﬂoating point comparisons to ﬁnd C.

Lemma 2.6. The expected value o f n is 1 +

∑

∞

i=1

−a

Proof. The value of n = 1 initially. For ﬁxed i, the probability that i > n is just P(U ∈

]) = b

−a

. The result then follows from the tail sum formula for expectation

of nonnegative integer valued random variables.

Therefore, in order for this approach to be efﬁcient, it is necessary to have bounds

} and {b

} that approach each other rapidly.

2.6.1 Example: First passage time of Brownian motion

Standard Brownian motion {B

}

t∈R

is a stochastic process that star ts at 0 (so X

= 0),

is continuous with probability 1, has independent increments (so for any a < b < c <

d, B

−B

and B

−B

are independent), and normal increments (so for any a < b,

−B

∼ N(0 , b −a)). See Chapter 10 for a formal deﬁnition and a more detailed

description of the properties o f Brownian motion.

Because the map t → B

is continuous with probability 1, it makes sense to de-

ﬁne stopping times such as T

= inf{t : B

∈{−1 , 1}}. Stopping times of this form

are known as a ﬁrst passage time. Simulation of T

arises in mathematical ﬁnance

applications when pricing barrier-options.

Lemma 2.7. The density of T

is given by the following alternating inﬁnite series

(t)=

√

−1/(2t)

∞

∑

j=1

(−1)

√

=(2 j + 1 ) exp



−

(2 j + 1)



−(2 j −1)exp



−

(2 j −1)



Burq and Jones [19] showed that

(t)/[cg

(t)] ≤ 1 (2.5)

for a gamma density g

(x)=

b−1

−

/Γ(b) for c = 1 .25, b = 1.09,

= 1.24.

For t ∈ [0, 1],leta

(t)=(2

)

−1/2

exp(−1/(2t)). For i ≥ 1, let b

(t)=a

(t)+

SIMULATION USING MARKOV AND CHERNOFF INEQUALITIES 39

)

−1/2

(2i − 1)exp (−(2i − 1)

/(2t)) and a

i+1

(t)=b

(t) − (2

)

−1/2

(2i +

1)exp(−(2 j + 1)

/(2t)).

Since max

x≥0

xexp(−x

/(2t)) occurs at x =

√

t and the function is strictly de-

creasing thereafter, for 2i −1 >

√

t the sequences a

and b

become monotone and

oscillate around the target value. Moreover, the difference b

−a

goes down faster

than exponentially, therefore only a constant number of steps (on average) are nec-

essary to generate one draw from the distribution using Bounded

mean Bernoulli.

2.7 Simulation using Markov and Chernoff inequalities

In full generality, any upper bound on the size or number of a set of objects is equiv-

alent to some AR-style algorithm.

As an example of this, Bucklew [18], showed h ow Markov’s inequality and Cher-

noff in e qualities [23] could actually be viewed as AR algorithms.

2.7.1 Ma rkov’s inequality as an AR algorithm

Begin with Markov’s inequality.

Lemma 2.8 (Markov’s Inequality). Consider a random variable X that is nonnega-

tive with probability 1 and has ﬁnite expectation E[X ]. Then for all a > 0,

P(X ≥ a) ≤ E[X]/a. (2.6)

From a sim ulation perspective, Markov’s inequality allows us to sample from

[X|X ≥ a]. In other words, if X has density f

(x), then the goal is to sample from

the unnormalized density f

(x)1(x > a).TouseAR, use the unnormalized density

(x).Then f

(x)1(x ≥ a)/[xf

(x)] is 0 when x < a,and1/x when x ≥ a,andso

is never greater than 1/a. This gives rise to the following algorithm.

AR Markovs Inequality Input: density f , a Output: X ∼ f given X > a

1) Repeat

2) Draw X from unnormalized density xf(x)

3) If X ≥ a,drawC ∼ Bern(a1(X ≥ a)/x)

4) Until C = 1

Note the normalizing constant for xf(x) is just



xf(x)

(dx)=E[X].Sothe

chance of accepting is just



x≥0

(x)

E[X]

a1(X > a)

(dx)=



x>a

(x)

E[X]

(dx)=

aP(X > a)

E[X]

. (2.7)

Since this chance of accepting must be at most 1, this fraction is at most 1, which

proves Markov’s inequality.

Using the ideas of Section 2.4, this can also then be used to build an (

)-ras

P(X > a) in O([E[X ]/P(X > a)]

−2

ln(

−1

)) samples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Acceptance/Rejection (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2: Acceptance/Rejection (3/4)