Chapter 9: Advanced Acceptance/Rejection (3/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

APPLICATION: APPROXIMATING PERMANENTS 171

Restricted permutations generalized Bregman bound

Input: A, Output:

, n

1) n ← 0

2) Repeat

3) n ← n + 1,

inv

← (0, 0,...,0), j ← 0

4) Repeat

5) j ← j + 1, C

←{i : A(i, j) > 0}

6) If C

= {i},then

(i) ← j

7) Else if ∃i ∈C

where A(i, j)=r

,then

inv

( j) ← i

8) Else

9) Calculate M

(A)

10) For all i ∈C

, calculate M



i, j

)

11) Draw U ← [0,M

(A)]

12) If U >

∑



, j

) then

inv

← reject

13) Else let

inv

( j) ← min{i :

∑



, j

) > U}

14) Until

inv

= reject or j = n

15) Until

inv

= reject

16) Return

−1

inv

9.2.5 Lower bounds on permanents

To give an upper bound on the running time of the algorithm, it is necessary to show

that M(A)/per(A) is small. This requires a lower bound on the permanent.

In the 1920s, Van der Waerden conjectured the following lower bound. This con-

jecture was independently shown to be true by Falikman [34] and Ergorychev [33].

Theorem 9.2 (Falikman-Egorychev Theorem). Let A be a nonnegative matrix whose

row and column sums are at most 1. Then per(A) ≥ n!/n

A matrix whose row and column sums are 1 is called doubly stochastic. The theo-

rem states that the doubly stochastic matrix with the smallest value of the permanent

is the one where every entry in the matrix is 1/n. That is, the mass of the entries is

spread out as far as possible.

In [52], it was considered how to use this theorem to bound the running time o f

SAR using the Generalized Bregman bound.

Deﬁnition 9.2. Say that a matrix is regular if all the row and column sums are the

same.

For an n by n matrix A that is regular with common row sum r, dividing each row

by r gives a doubly stochastic matrix. Hence per(A) ≥ r

n!/n

. This gave rise to the

following result.

Lemma 9.8. Suppose that each row and column sum of a {0,1} matrix A equals

for

∈[1/n,1].ThenRestricted permutations generalized Bregman bound(A)

takes on average at most O(n

1.5+0.5/

) steps.

Proof. From our earlier AR theory, the number of trials will be bounded above by

172 ADVANCED ACCEPTANCE/REJECTION

∏

(r)/[r

n!/n

]=(n

/n!)(h

(r)/r)

. By Sterling’s inequality, n! ≥ (n/e)

√

Hence n

/n! ≤ e

−1/2

.Sincer ≥ 1, h

(r)=e

−1

[r +(1/2)ln(r)+exp(1) −1],

(r)

= e

−1



1 +

(1/2)ln(r)+e −1



≤ e

−1

[(1/2) ln(r)+e−1]/r

Therefore,



(r)



≤

√

·e

−n

exp((1 /2)ln(r)+e −1)/

=(2

−1/2

(

0.5/

(e−1)/

= O(n

−0.5+0.5/

Each step requires at most n

operations to compute the bounds, making the

running time (on average) at most O (n

1.5+0.5/

) steps.

9.2.6 Sinkhorn balancing

So how does the lower bound help when the matrix is not regular? Recall that if

a row (or column) of the matrix is multiplied by a constant, then the permanent is

multiplied by the same constant.

Therefore, by dividing each row by its sum, it is possible to make a matrix whose

row entries are all 1, and whose permanent is related in a known way to the original

matrix.

However, the column sums might not be 1. So divide the columns by their sums.

This might have messed up the rows, however. So next divide the rows by their sum.

Continue moving back and forth between rows and columns until the matrix is almost

regular. This procedure is known as Sinkhorn balancing [116].

More precisely, if X and Y are diagonal matrices, then XAY is a scaling of A

where per(A)=per(XAY)/[per(X)per(Y )]. From the deﬁnition, the permanent of a

diagonal matrix is just the product of its entries. By keeping track of the scaling done

to rows and columns at each step, when Sinkhorn balancing is ended the matrices X

and Y will be known.

The question then becomes: how many steps must be taken before the resulting

matrix is close to balanced? To answer this question, it is necessary to have a notion

of how close the matrix is to being doubly stochastic.

Deﬁnition 9.3. Say that diagonal matrices X and Y scale A to accuracy

if both the

row and columns sums of XAY all fall in [1 −

,1 +

In [75], it was shown that the ellipsoid method could be used to obtain such a

scaling in O(n

ln(ln(1/

)n/

)) time. While a useful theoretical result, the ellipsoid

method is very co mplex to code and can suffer from numerical instability.

In practice, simply employing the method of rescaling the rows, then the

APPLICATION: APPROXIMATING PERMANENTS 173

columns, then the rows again, and so on until accuracy

is reached, is much faster.

It can be shown to require only O((

−1

+ ln(n))

√

n log(1/

)) steps, and each step

takes Θ(n

) time. This method is called Sinkhorn balancing.

Sinkhorn balancing

Input: A,

Output: A, s

1) s ← 1

2) Repeat

3) Let r

be the sum of row i of A

4) Divide row i of A by r

, multiply s by r

5) Let c

be the sum of column i of A

6) Divide column i of A by c

, multiply s by c

7) Until max

{|r

−1|,|c

−1|}≤

Now for matrices B with row and column sums exactly 1, p er(B) ≥n!/n

.When

they are only 1 within additive accuracy

, the following holds.

Lemma 9.9. Let B = X AY be A scaled to accuracy

≤ 0.385/n. Then

per(B) ≥

exp(−4n

). (9.27)

See [86] for the proof. So for

= 0.25/n

, only a factor of e is lost in the running

time of the acceptance/rejection result. Sinkhorn balancing takes O(n

4.5

ln(n)) steps

to accomplish this level of accuracy.

Now that the graph has been made regular, what is the best way to generate sam-

ples? Well, Minc’s inequality works best wh en the row sums are as large as possible.

To make them larger, simply multiply the row by the inverse of the largest element.

That makes the row sums as large as possible while keeping all of the elements of

the matrix in the set [0,1].

Next, the SAR method using the generalized Bregman bound can be applied to

the regularized matrix to obtain an estim ate of p.

Estimate permanent

Input: A, k Output: ˆa

1) Let n be the number of rows and columns of A

2) (A,s) ← Sinkhorn

balancing(A , 0.25/n

)

3) For every row i of A

4) c ← max

A(i, j)

5) Divide row i of A by c, multiply s by c

6) R ← 0

7) Repeat k times

8) (

,n) ← Restricted permutations generalized Bregman bound(A)

9) Draw R

,...,R

iid Exp(1)

10) R ← R + R

+ ···+ R

11) ˆa ← M

(A)s(k −1)/R

174 ADVANCED ACCEPTANCE/REJECTION

Recall from Section 2.5, the reason for drawing the exponentials at each step is

so that ˆa/per(A) is k −1 times an inverse gamma distribution with parameters k and

1. Therefore E[ ˆa]=per(A) and the relative error of the estimate does not depend on

per(A) at all.

The balancing at the beginning only needs to be done once, so the running time

of the procedure will be the time to do the balancing plus the time to do the accep-

tance/rejection part. The ﬁrst step in bounding this running time is to show that the

maximum element of each row in the scaled matrix is small when the original matrix

A is sufﬁciently dense.

Lemma 9.10. Let

∈ (1/2 , 1], and A be an n-by-n matrix with entries in [0,1] and

at least

n entries equal to 1 in every row and column. Let B = XAY with X and Y

diagonal matrices where B is doubly stochastic with accuracy

.Then

max

i, j

B(i, j) ≤

(1 +

)

−1)n −3n

−2

. (9.28)

Proof. Fix i and j in {1,...,n}. Introduce S

= {j



= j : A(i, j



)=1} and S

= {i



=

i : A(i



, j)=1}. Given the density of A,#S

and #S

are at least

n −1. Break A into

four submatrices based on S

and S

. For subsets S

and S

of {1 ,...,n},letA(S

)

be the submatrix of A that contains elements that are in rows o f S

and columns of

.Then

= A(S

), A

= A(S

), A

= A(S

), A

= A(S

and B

are the corresponding submatrices of B.

Let s(C) denote the sum of the entries of a matrix C.ThensinceB is doubly

stochastic to accuracy 1 −

, each row and column adds up to at least 1 −

. Hence

s(B

)+s(B

) ≥ #S

(1 −

)

s(B

)+s(B

) ≥ #S

(1 −

And since each row adds up to at most 1 +

s(B

)+s(B

) ≤ n(1 +

)

(1 −

)+#S

(1 −

) −s(B

)+s(B

) ≤ n(1 +

Since the submatrix B

includes B(i,i), B(i,i) ≤ s(B

) and

B(i,i) ≤ n(1 +

)+s(B

) −(#S

+ #S

)(1 −

). (9.29)

From the density assumption, #S

+ #S

≥ 2

n −2. To upper bound s(B

), ﬁrst

deﬁne x(i)=X(i,i) and y(i)=Y (i,i) so that B(i



, j



)=x(i



)A(i



, j



)y( j



).Then

s(B

) ≤

∑



∈S

∑

j∈S

x(i



)A(i



, j



)y( j





∑



∈S

x(i



)



∑



∈S

y( j



)



. (9.30)

APPLICATION: APPROXIMATING PERMANENTS 175

Given B is doubly stochastic to within accuracy

∑



x(i



)A(i



, j)y( j) ≤1+

and

∑



x(i)A(i, j



)y( j



) ≤ 1 +

.Thisgives

∑



∈S

x(i



) ≤−B(i, j)+

∑



x(i



)A(i



, j) ≤−B(i, j)+y(j)

−1

(1 +

)

∑



∈S

y( j



) ≤−B(i, j)+

∑



A(i, j



)y( j



) ≤−B(i, j)+x(i)

−1

(1 +

So s(B

) ≤x(i)

−1

y( j)

−1

(1+

).NowwhenA(i, j)=0, B(i, j)=x(i)A(i, j)y( j)=

0 as well. When A(i, j)=1, B(i, j)=x(i)y( j).Sos(B

) ≤ B(i, j)

−1

(1 +

)

At this point we have

0 ≤ B(i, j) ≤ n(1 +

)+B(i, j)

−1

(1 +

)

−(2

n −2)(1 −

). (9.31)

Solving gives

B(i, j) ≤ (1 +

)

/[(2

−1 −2

γα

−

)n −2]. (9.32)

Using 2

+ 1 ≤ 3 ﬁnishes the proof.

Lemma 9.11. Let B be an n-by-n nonnegative matrix with max

i, j

B(i, j) ≤ [(2

−

1)n]

−1

+o(n

−1

) that is doubly stochastic to accuracy 0.385/n

. Then for C the matrix

formed by dividing each row of B(i, j) by its maximum value,

(C)

per(C)

≤ 5(2

−1/2

(n(2

−1)+o(n))

−1)/2

−1)(e−1)

. (9.33)

Proof. By Lemma 9.9, per(B) ≥ n!n

−n

exp(−4n

(0.385/n

)) ≥ 0.2n!n

−n

.Letm

denote the maximum element of row i of B.Thenper(B)=per(C)

∏

−1

.So

per(C) ≥ n!s

−1

−n

∏

−1

For the upper bound, the rows of C add to at least [1 −0.385/n

]/m

.Since

for r ≥ 1, h

(r)=e

−1

[r +(1/2)ln(r)+e −1], that gives an upper bound per(C) ≤

−n

∏

−1

+(1/2)ln(m

−1

)+e −1).

On th e other hand 0.2n!n

−n

≥ 0.2e

−n

√

n. Hence

(C)

per(C)

≥

−n

∏

−1

+(1/2)ln(m

−1

)+e −1)

0.2

√

−n

∏

1 +

(1/2)ln(m

−1

)+e −1

−1

≤ 5(2

−1/2

exp

∑

(1/2)ln(m

−1

)+e −1

−1

≤ 5(2

−1/2

exp

∑

(1/2)ln(n(2

−1)+o(n)) + e −1

n(2

−1)+o(n)

≤ 5(2

−1/2

exp((2

−1)

−1

[(1/2)ln(n(2

−1)+o(n)) + e −1])

= 5(2

−1/2

(n(2

−1)+o(n))

(

−1/2)

−1

−1)

−1

(e−1)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Advanced Acceptance/Rejection (3/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9: Advanced Acceptance/Rejection (3/5)