Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 105
Executing the SQL in Example 3-1 results in a 10% sample (corresponding to
0.1) of all the rows in the CUSTOMERS table.
Example 3-1 RAND function
SELECT * FROM CUSTOMERS
WHERE RAND() < 0.1
Note that this is a "Bernoulli
2
sample". In the above SQL, if there were 100,000
rows in the CUSTOMERS table, the actual number of rows in the sample is
random, but is equal on average to (100,000 / 10) = 10,000.
Since this technique involves a complete scan of the CUSTOMERS table, it is
appropriate in situations where a sample is created once, and then used
repeatedly in multiple queries. In other words, the cost of creating the sample is
amortized over multiple queries.
3.2.9 STDDEV
The STDDEV function returns the population standard deviation (as opposed to
the sample standard deviation) of a set of numbers.
The relationship between population standard deviation (SD
pop
) and sample
standard deviation (SD
samp
) is as follows:
Where:
n is the population size.
The input must be numeric and the output is double-precision floating point.
The STDEV function is applied to the set of values derived from the argument
values by the elimination of null values.
If the input data set is empty the result is null. Otherwise, the result is the
standard deviation of the values in the set.
2
In Bernoulli sampling, each row is selected for inclusion in the sample with probability q=(n/N)
where n is the desired sample size, and N is the total number of rows and rejected with probability
(1-q), independently of the other rows. The final sample size is random, but is equal to n on average.
SD
pop
n
1()
n
-----------------
SD
samp
×=
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset