APPROXIMATE DENSITIES FOR PERFECT SIMULATION 37
from different alleles. The order o f the alleles does not m a tter. Hence if the two
alleles for a gene are denoted A and a, the genotype of the organism is either AA, Aa,
or aa. In Hardy -Weinberg equilibrium, the frequency of allele AA will be the square
of the frequency of allele A in the population. The frequency of Aa will be twice
the frequency of A times the frequency of a, and that of aa will be the square of the
frequency of a in the population.
Consider the problem of testing whether a population is in Hardy-Weinberg equi-
librium. For m alleles A
1
,...,A
m
, the data consists of the number of sampled individ-
uals with genotype A
i
A
j
for 1 ≤ i < j ≤ m.
Given data table D, Guo and Thompson [43] suggested using the statistics S(X)=
P(X = D),whereX is a random draw from tables of genetic data from populations
in Hardy-Weinberg equilibrium. When there are n individuals, this made the statistic
S(D)=
n!
∏
m
i=1
∑
m
j=i+1
D
ij
!
(2n)!
∏
j>i
D
ij
!
2
∑
j>i
D
ij
. (2.4)
The exact p-value is then P(S(Y ) ≤S(D)),whereY is a perfect draw from tables
under the Hardy-Weinberg equilibrium model. The Guo and Thompson method for
directly samp ling fro m such tables was later improved in [59].
2.5.1.2 Testing for differential gene expression
A related task is to test for how gene expression changes under environmental con-
ditions. Each gene encodes proteins that are produced by the cell: the amount of
protein produced can be affected by the environment of the cell. Consider an experi-
ment (such as in [122]) where the expression rate of a gene is measured conditioned
on different environmental factors.
The data is a matrix where K
ij
is the number of experiments under condition j
that resulted in expression of gene i. If the gene is equally likely to be expressed
under all conditions, then (as in the Hardy-Weinberg example), it is easy to generate
data tables in linear tim e.
The test statistic is once again the probability of the table being generated, and the
p-value is the probability that a randomly drawn table has test statistics at m ost equal
to the test statistic applied to the data. Once again, GBAS can be used to estimate this
p-value with exact confidence intervals.
2.6 Approximate densities for perfect simulation
The key step in AR fo r sampling from density f using density g is when fo r X ∼ g,
the C ∼ Bern( f (X)/[cg(X)] coin is flipped. However, it is not necessary to know
f (X )/[cg(X)] exactly in order to flip the C coin.
To draw C ∼Bern(p),drawU ∼Unif([0,1]),andletC = 1(U ≤ p). Now suppose
that for all n, a
n
≤ p ≤ b
n
. Then it is not necessary to know p exactly in order to
determine if U ≤ p.
To be precise, suppose that a
1
≤ a
2
≤ a
3
≤··· and b
1
≥ b
2
≥ b
3
≥··· have
lima
i
= lim b
i
= p. Then draw U ∼ Unif([0, 1]).IfU ≤ lim a
i
= p,thenC = 1, if