Chapter 9
Advanced Acceptance/Rejection
Happiness can exist only in acceptance.
George Orwell
Chapter 2 presented the acceptance/rejection (AR) protocol, which runs into trou-
ble wh e n naively applied to permutation problems suc h as estimating the permanent
and h igh-dimensional Markov random eld problems, such as sampling from the
Ising model. This is because typically the number of combinatorial objects in a set
grows exponentially in the dimension. In this chapter three ideas are introduced to
deal with this problem, sequential acceptance/rejection (SAR), partially recur sive ac-
ceptance/rejection (PRAR), and popping. Together with the ability to preprocess a
given problem instance, SAR gave the rst perfect simulation algorithms for estimat-
ing the permanent that has provably polynomial time over a nontrivial subclass of
nonnegative matrices. PRAR allows the acceptance/rejection approach to be applied
to Markov random elds such as the Ising or autonormal model without involving
Markov chains in any way. Popping allows the p ieces of the sample that are blocking
acceptance to be pruned away, giving a perfect simulation algorithm for sampling
from the rooted trees of a graph.
9.1 Sequential a cceptance/rejection
Theoretically, AR can be used on any set where the set has an upper bound. For
instance, if A is a nite set with #A M,simplydrawU Unif([0,M]) until U
(0,#A]. If the elements of #A are numbered, then return element U.
Unfortunately, numbering the elements of A is not always possible in practice,
especially when the upper bound is not given through a complex argument rather
than a simple assignment of number to object.
Sequential acceptance/rejection (SAR) can be employed when an upper bound
on the partition function can be shown using either induction o r strong induction.
For instance, suppose that the goal is to sample from an unnormalized density
on the set of permutations on n objects, so P(X =
σ
)=w(
σ
)/
σ
S
n
w(
σ
),where
σ
S
n
w(
σ
) is difcult to compute.
When X(1) is assigned, the remaining components X(2),...,X(n) form a permu-
tation of the elements {1, 2,...,n}{X(1)}. A problem where assigning one com-
ponent leaves a subset of objects of the same type is called self-reducible, and this
161
162 ADVANCED ACCEPTANCE/REJECTION
is a key property for using random samples from an unweighted density in order to
estimate the partition function (see [72] for details).
For convenience, SAR will be described for discrete sets, but this method ap-
plies equally well to continuous problems. For a nonnegative function w,letZ(A)=
xA
w(x),wherex C
n
is a vector in an n-dimensional space whose elements are
drawn from C.
Let i ∈{1,...,n}.Ifforallx A with w(x) > 0, x(i)=a
i
, say that the ith co m-
ponent of x is xed, otherwise it is free.
Lemma 9.1. Let M(A) be a function with the following properties.
If A = {x},thenM(A) w(x).
For any A with free dimension i, and c C, let A
c
= {x : x(i)=c}.Then
M(A)
cC
M(A A
c
). (9.1)
Then M(A) Z(A)=
xA
w(x).
Proof. Use induction on the number of free dimensions in A.IfA has no free dimen-
sions, then A = {x},andM(A) w(x)=Z(A), so the base case is done.
The induction hypothesis is that the result applies when A has k or fewer free
dimensions; n ow consider A with k + 1 free dimensions. Choose any one of the di-
mensions. The key observation is that Z(A)=
cC
Z(A A
c
). Hence
M(A)
cC
M(A A
c
)
cC
Z(A A
c
)=Z(A). (9.2)
[Note it is necessary to use a strong induction hypothesis here and suppose the
technique works for k or fewer free dimensions. This is because for certain w func-
tions, xing one dimension might also effectively x other dimensions as well!]
SAR gives samples from unnormalized density w(x) given the ability to compute
M(A) with the properties in Lemma 9.1. As before, let C = {c
1
,...,c
k
} denote the
set of possible labels for any x(i).
SAR Input: A, Output: X
1) A
0
A
2) While #A > 1
3) Let i be any free dimension of A
4) For all j ∈{1 ,...,k}, A
j
←{x A : x(i)=c
j
}
5) Draw I so P(I = j)=M(A
j
)/(M(A)), P(I = 0)=1
j
M(A
j
)/M(A)
6) If I = 0thenX SAR(A
0
) and return X
7) Else A A
I
8) Draw B Bern(w(x)/M({x}))
9) If B = 0thenX SAR(A
0
),returnX
10) Else let X be the only element of A
APPLICATION: APPROXIMATING PERMANENTS 163
Before showing that the overall algorithm works, it helps to have a preliminary
understanding of what happens when lines 2 through 9 complete without recursively
calling SAR.
Lemma 9.2. Let E
x
denote the event that SAR reaches line 10 with A = {x}.Then
P(E
x
)=w(x)/M(A
0
).
Proof. The proof is by induction on the size of A at line 2. When A has one element,
then lines 8 and 9 make the lemma true. Suppose that the lemma holds for #A n,
now suppose #A = n + 1.
Let x A.Thenx A
j
for some j ∈{1,...,k}.SoP(E
x
) is the probability
that set A
j
is chosen times the probability that x is chosen uniquely from A
j
.This
second factor is w(x)/M(A
j
) by the strong induction hypothesis. Hence P(E
x
)=
[M(A
j
)/M(A)][w(x)/M(A
j
)] = w(x)/M(A) and the induction is complete.
Lemma 9.3. Suppose there exists at least one x with w(x) > 0. The output X is a
draw from the unnormalized density w. The expected running time is M(A)/Z(A)
times the number of free dimensions in A.
Proof. Conditioned on acceptance occurring (so no recursive calls to SAR are made,
P(X = x|accept) w(x).OurAR theory then says that P(X = x) w(x).
The chance of accepting is
x
w(x)/M(A)=Z(A)/M(A). The running time result
then fo llows from the basic AR theory.
As with all AR-type algorithms, if n is the number of steps needed to obtain a
single sample, M(A)/n provides an immediate estimate of Z
A
or this algorithm can
be used as part of a GBAS approximation (see Section 2.5.)
9.2 Application: approximating permanents
In Section 1.6.2 the permanent of a matrix A =(A(i, j)) was given as
per(A)=
σ
S
n
n
i=1
A(i,
σ
(i)). (9.3)
This should not be confused with the determinant of a matrix, dened as
det(A)=
σ
S
n
(1)
sign(
σ
)
n
i=1
A(i,
σ
(i)). (9.4)
While the p roblems appear similar on the surface, the determinant of a matrix can be
computed using (to the same order) the time needed to perform matrix multiplication,
whereas nding the permanent of a matrix is a #P-complete problem. [124].
When A is a nonnegative matrix, the permanent is the normalizing constant of
the unnormalized density on permutations
w(
σ
)=
n
i=1
A(i,
σ
(i)). (9.5)
164 ADVANCED ACCEPTANCE/REJECTION
This problem has been studied extensively. A Markov chain for this distribution
was shown to have polynomial running time in 2001 ([71, 73]) but the large constants
involved in this result (and subsequent improvements [12]) make it impractical for
applications.
The techniques of this section not only generate samples from the distribution
given by w, they also immediately give an approximation algorithm for per(A) with-
out any extra work. These methods are provably polynomial when the matrix is
dense, with small constants.
9.2.1 Basic AR
Suppose that A
is formed from A by multiplying a row of A by a constant k.Since
each term in per(A) will then be multiplied b y the same k,per(A
)=k per(A).Soif
B is formed by dividing each row o f A by its largest element,
per(B)=
n
i=1
max
j
A(i, j)
per(A), (9.6)
and the weights for B give the same distribution as the weights for A . The difference
is that the elements of B now fall into [0 , 1].
To use the basic AR approach, draw
σ
uniformly from S
n
, the set of all permu-
tations. Then accept as a draw from the weights associated with A with probability
n
i=1
B(i,
σ
(i)).
The expected number of samples required is
n!/per(B). (9.7)
Unfortunately, this can easily be exponential in n. For instance, if A is the identity
matrix, so is B, and per(B)=1, so n! samples are required to generate a single sample
using basic AR.
9.2.2 Sequential AR
Sequential acceptance/rejection can be used with better bounds on the permanent to
greatly improve upon basic AR for estimating the permanent.
First consider the case when the entries of the matrix (a
ij
) are either 0 or 1. In
this case, w(
σ
)=1 if and only if a
i,
σ
(i)
= 1foralli. Therefore for each i,thereisa
set D
i
such that
σ
(i) must fall into the set to obtain positive weight. Then the goal is
to draw uniformly from the set of permutations such that
σ
(i) D
i
for all i.
For example, with the matrix
A =
110
011
101
, (9.8)
D
1
= {1, 2}, D
2
= {2, 3},andD
3
= {1, 3}. The permanent o f this matrix is 2 , since
(1,2,3) and (2,3, 1) are the only two permutations that have
σ
(i) D
i
for all i.
APPLICATION: APPROXIMATING PERMANENTS 165
In the rstrow,itispossibletohave
σ
(1)=1or
σ
(1)=2. For a matrix A,letA
i, j
be the matrix where
σ
(i)= j,sotheentryA(i, j) is unchanged, and the other entries
in row i are changed to 0. Then each of these choices gives rise to a subproblem:
A
1,1
=
100
011
101
, A
1,2
=
010
011
101
, A
1,3
=
000
011
101
. (9.9)
The set of permutations allowed by A consists of the union of the set of allowed per-
mutations of A
1,1
, the set of allowed perm utations of A
1,2
, and the set of allowed
permutations of A
1,3
. These last three sets are disjoint. So per(A)=per(A
1,1
)+
per(A
1,2
)+per(A
1,3
).
In order to use SAR on this problem, it is necessary to have a bound per(B)
M(B) that can be proved inductively, so M(B) M(B
1
)+···+ M(B
n
).Themost
basic approach uses a very simple bound on the permanent.
Lemma 9.4. Let r
i
= #D
i
be the sum of the elements of row i of the matrix A. Dene
M
1
(A)=
n
i=1
r
i
. (9.10)
Then for any row i,
M
1
(A
i,1
)+M
1
(A
i,2
)+···+ M
1
(A
i,n
) M
1
(A), (9.11)
and per(A) M(A).
Proof. Prove (9.11) rst. Note that if A(i, j)=0, M
1
(A
i, j
)=0 as well, so (9.11) is
really just
M
1
(A)
j:A(i,j)=1
M
1
(A
i, j
). (9.12)
But for j such that A(i, j)=1, M
1
(A
i, j
)=M
1
(A)/r
i
. Hence
j:A(i,j)=1
M
1
(A
i, j
)=
j:A(i,j)=1
M
1
(A)
r
i
= M
1
(A), (9.13)
which completes the proof of (9.11).
Now show per(A) M(A) by induction on the number of rows i with r
i
> 1. If
r
i
= 0orr
i
= 1foralli then the result is true, so the base case holds.
Let i be any row with r
i
> 1. Then Z(A)=
j:A(i,j)=1
Z(A
i, j
), and by induction
Z(A
i, j
) M
1
(A
i, j
).So
Z(A)
j:A(i,j)=1
M
1
(A
i, j
)=M
1
(A), (9.14)
which nishes the proof.
Therefore the conditions to apply SAR are met. Since the M
1
(A
i, j
) are equal and
sum to M
1
(A) for all j such that A(i, j)=1, at each step choose uniformly from this
set of j to be the value f or
σ
(i).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset