6.2. The Computational Complexity

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.2. The Computational Complexity

6.2.1. SPA Algorithms

Definition 6.1

Assume that in SA search SPRT is used as statistical inference method S, and for judging m

subtrees the significance level

, is chosen. Under the above conditions, the SA search is called SPA1 with significant level

, or simply SPA1 algorithm.

Variable

obeys the

distribution. Construct a simple hypothesis:

In Formulas (6.4) and (6.5),

and

replacing

and

, respectively, then we have the following lemma.

Lemma 6.1

For judging m

subtrees, the asymptotically mean complexity of SPA1 is

where

Proof

From Formula (6.5), when

, the mean of stopping variable R can be represented by the following expression approximately.

Therefore, in order to judge m

subtrees with significant level

, the asymptotically mean complexity is

where

Theorem 6.2

and

are given. Let

has a normal distribution

. Using SPA1 algorithm under level

, the mean complexity of finding a solution path in G is

with probability

. b is the error probability, where

, the Type I error

and Type II error

Proof

From Lemma 6.1, the mean complexity for judging m i-subtrees is

. Thus, using SPA1 algorithm, the mean complexity of finding a goal is:

From the Sterling formula

, we obtain:

For judging m 1-subtrees,

. In general, for judging m i-subtrees,

. The total error probability of Type I is:

Similarly,

It is noted that we use the minimum of mean statistics among all i-subtrees to estimate

, and the average of mean statistics of the rest (except the minimal one) of i-subtrees to estimate

. This will produce new errors. We will construct new SA algorithms to overcome the defect.

Corollary 6.2

Assume that

has a distribution function

. Let

The mean complexity of SPA1 algorithm is O(N ln N).

The theorem and corollary show that SPA1 algorithm can overcome the exponential explosion of complexity only in average sense. This means that in some cases, the statistical inference will still encounter a huge computational complexity. In order to overcome the shortage, we will discuss the revised version SPA2 of SPA1.

Definition 6.2

In SA, SPRT is performed over m i-subtrees using a level

and a given threshold

, where

. If the sample size exceeds

then the hypothesis

is rejected. We define the SA as SPA2 under level

, and denoted by SPA2 for short.

It’s noted that parameters

and

are generally unknown. We may use the following formula to estimate

where

Theorem 6.3

and

are given. Let

has a normal distribution. Using SPA2 under level

, the upper bound of the complexity of finding a solution path in G is

with probability

. b is the error probability, where

, the Type I error

and Type II error

Proof

In i-subtrees, the threshold is

. So the upper bound of the total complexity is

Thus, the upper bound of complexity of SPA2

Now, we consider the error probability.

If in the searching process the sample size has never surpassed the threshold, from Formulas (6.5) and (6.6), it is noted that judging m i-subtrees, the error probability of Type I

. So the total error probability is:

In some searching stage, if the sample size surpasses the threshold and H₀ is rejected, the error probability does not change if the subtrees being deleted do not contain the goal, and the error probability will increase if the subtrees being deleted contain the goal. We estimate the incremental error probability as follows.

From Property 6.6 in Section 6.1.2, the distribution of the stopping variable R of SPRT is

(6.10)

Assume

. In i-subtrees their level is

and the mean of R is

, where

. From Formula (6.10), we have

Thus,

is the value of c corresponding to i-subtrees.

The probability that the sample size surpasses the threshold

Namely, when the sample size surpasses the threshold, the rejection of H₀ will cause the new error probability

. The totally incremental error probability of Type I is

Finally, the total error probability of Type I is

Similarly, Type II error

Certainly, when the sample size surpasses the threshold, the rejection of H₀ does not change the error probability of Type II.

Corollary 6.3

Assume that

has a distribution function

and satisfies

The upper bound of the complexity of SPA2 is

SPA algorithms constructed have the following shortcoming. The distribution function

should be known beforehand. Generally this is unpractical so it has to be assumed as a normal distribution

sometime. Even so, its parameters

and

are still unknown generally. Although their values can be estimated from some data it will cause new errors certainly. We will use ASM statistical inference method to overcome the shortcoming.

6.2.2. SAA Algorithms

is given. Let

and

. Assume that

. For any node

, there are m successors

is a subtree rooted at

. For any subtree

, let

. Apply ASM statistical inference to SA search, we have confidence intervals

Assume that

is the leftmost interval among m confidence intervals along a number line. If

and

are disjoint,

is accepted; otherwise all subtrees are rejected and the algorithm fails.

In fact, ASM is a sequential testing method. First, letting R=1 and using Formula (6.6) as hypothesis testing, if the formula is satisfied, then we have the corresponding interval

, otherwise the sampling continues.

Definition 6.3

In SA search, if the ASM is used as statistical inference method S, and when testing i-subtrees

, then the SA search is called SAA algorithm with level

, or simply SAA algorithm.

Theorem 6.4

Assume that

satisfies Hypothesis I and has a finite forth moment. Given

, letting

, then SAA algorithm with level

can find the goal with probability

, and the order of its mean complexity is

Proof

Since for i-subtrees

, from Formula (6.8) we have that the order of mean complexity of ASM for testing i-subtrees is

Thus, the order of total mean complexity is

For judging i-subtrees, the error probability of Type I is

and Type II is

. The total error probability for judging i-subtrees is

Thus, the total error probability of SAA algorithm is

SAA is superior to SPA in that it's no need to know what distribution function

is in advance, and the calculation of statistics is quite simple.

In general,

is unknown. So it’s difficulty to choose a proper width of confidence interval in SAA based on

. In practice, approximate

can be chosen as a rule of thumb. Sometime, SAA may work in the following way. Given an arbitrary constant δ₁, if

intersects with other intervals

, new constant

is tried, until a proper value of δ is got.

The revised SAA is as follows.

Let

be a sequence of strictly and monotonically decreasing positive numbers, e.g.,

. In the i-th turn search, let the width of confidence interval be

. Since

so long as

, c is a constant, there always exists

such that if

then

. So we can overcome the difficulty brought about by the unknown c; certainly the computational complexity of SAA will increase

times, i.e., the order of complexity becomes

If we choose a lower bound

, when

no longer decreases, i.e., let

. Then, the order of mean complexity of SAA will not increase.

Under the same significance level, the mean

of stopping variable of SPRT is minimal, i.e., the mean complexity of SA constructed by SPRT is minimal. But the distribution function

should be known beforehand in the SPA search, i.e., under a more rigor condition.

6.2.3. Different Kinds of SA

In Section 6.2.1 and 6.2.2 we construct SA by using SPRT and ASM as statistical inference methods. Since the two methods are sequential and fully similar to search, it’s easy to understand that the introducing the methods to the benefit of search. If there is any other kind of statistical inference method, e.g., non-sequential, can this get the same effect? We'll discuss below.

Assume that

and

are two i.i.d. random variables having finite fourth moments. Their distribution functions are

are

, respectively. Let simple hypotheses be

. From statistics, it's known that if a statistical inference method S satisfies the following properties, the statistical decision can be made in a polynomial time.

The properties are

(1) Given significance level

, the mean of the stopping variable R satisfies

, where E(R) is the mean of R and

(2)

, S terminates with probability one.

(3) When n>0,

, where c>0 is a constant.

As we know, both SPRT and ASM satisfy the above properties.

is proportional to

which underlies the complexity reduction of the SA search algorithms by using SPRT and ASM as statistical inference methods. If the variances

and

are known,

test and t-test may be adopted. For example, using

test to determine the validity of

, that is, whether the mean of random variable X is equal to that of Y, we may use the following composite statistic.

where

and

are the means of

and

, respectively, l and n are the sample sizes of

and

, respectively.

When search reaches node p it has m sub-nodes

. Let

be a sub-tree rooted at

. The means of statistics

of sub-trees

are assumed to be

, respectively.

Now, we use

testing method to judge whether the means of

and

are equal. Significance level

and sample size

are given, where

and

are the numbers of expanded nodes of

and

, respectively. In the testing process, sample size

is gradually increased, for example,

From

, we have

, where

is a standard normal distribution function.

then

are deleted.

then the total sample size

is increased by 2, i.e., sub-trees

and

are expanded by search algorithm A.

test continues.

It’s noted that in the

test, when calculating the composite statistic

we replace

and

In order to terminate the search in time, we choose a threshold

for the i-th level nodes. If the sample size

, then sub-tree

is accepted.

It can be proved that the above algorithm has the same order of mean complexity as that of SPA algorithm.

If the variances of

and

are finite but unknown, the sequential t-test constructed from Cox theorem can be used.

By combining different kinds of statistical inference methods and heuristic searches and successively using these searches, a variety of SA algorithms can be obtained.

If the global statistics extracted from each subtree in G satisfy Hypothesis I, then the SA search constructed from S has the following properties which are our main conclusions about SA.

(1) The mean complexity of the SA is

, N is the depth at which the goal is located.

(2) Given

, using the SA search for the first time the goal can be found with probability α.

(3) Based on the property

, given a proper threshold, then the upper bound of the complexity of each time SA search is

(4) In some SA search stage, a wrong search direction might be chosen but the search can terminate in a polynomial mean time, due to the polynomial judgment time of the statistical inference. Consequently, by applying the SA search successively, the goal can also be found with polynomial mean complexity.

6.2.4. The Successive Algorithms

In a word, under a given significance level

, the SPA (SPA1 or SPA2) search can avoid the exponential explosion and results in the polynomial complexity

(or

). Unfortunately, the solution path may mistakenly be pruned off or a wrong path may be accepted, i.e., the error probability is

. In other words, in light of the SPA search, a real goal can only be found with probability (1-b).

The mean of stopping variable R of the statistical inference is

. No matter the i-subtree containing goal can be found or not, the search stops with mean computation

certainly. Thus, the search stops with mean complexity

in the first round search. Imagine that if the goal node cannot be found in the first round search, SA search is applied to the remaining part of G once again. Thus, the probability of finding a real goal is increased by

, or error probability is decreased to

,…, the repeated usage of SA continues until the goal is found. We call this procedure successive SA, or SA for short. How about its computational complexity? Does it still remain the polynomial time?

Using the SA search, in the first time the probability of finding the goal is