Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 107
3.2.12 Regression functions
The regression functions support the fitting of an ordinary-least-squares
regression line of the form:
Where:
Y is the dependent variable.
X is the independent variable.
a is the slope of the regression line.
b is the y-intercept.
Both a and b are called coefficients.
There are nine distinct regression functions. They are:
REGR_SLOPE Calculates the slope of the line (the parameter a in the
above equation).
REGR_INTERCEPT (REGR_ICPT) calculates the y-intercept of the regression
line (b in the above equation).
REGR_COUNT Determines the number of non-null pairs used to
determine the regression.
REGR_R2 Expresses the quality of the best-fit regression.
(R-squared) is referred to as the coefficient of
determination or the goodness-of-fit for the regression.
REGR_AVGX Returns quantities that can be used to compute various
diagnostic statistics needed for the evaluation of the
quality and statistical validity of the regression model
(They are defined further in this section).
REGR_AVGY (Refer to the foregoing description.)
REGR_SXX (Refer to the foregoing description.)
REGR_SYY (Refer to the foregoing description.)
REGR_SXY (Refer to the foregoing description.)
YaXb
+=
108 High-Function Business Intelligence in e-business
The following considerations apply to regression functions:
? The input for all of the regression functions must be numeric.
? The output of REGR_COUNT is integer and all the remaining functions output
in double-precision floating point.
? The regression functions are all computed simultaneously during a single
pass through the data set.
? If the input set is not empty, and after elimination of the null pairs:
VARIANCE(
expression2
) is positive, then REGR_COUNT returns the
number of non-null pairs in the set, and the remaining functions return
results that are defined in Table 3-2.
VARIANCE(expression2) is equal to zero, then the regression line either
has infinite slope or is undefined. In this case, the functions
REGR_SLOPE, REGR_INTERCEPT, and REGR_R2 each return a null
value, and the remaining functions return values defined in Table 3-2.
? If the input set is empty, REGR_COUNT returns zero, and the remaining
functions return a null value.
Important: Each function is applied to the set of values derived from the input
numeric pairs (
expression1
,
expression2
) by the elimination of all pairs for
which either
expression1
or
expression2
is null. In other words, both values
must be non-null to be considered for the function.
Attention:
expression1
corresponds to the Y variable and
expression2
corresponds to the X variable.
Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 109
? When the result is not null:
REGR_R2 is between 0 and 1.
REGR_SXX and REGR_SYY is non-negative. This non-negative value is
used to describe the spread of the values for either X or Y from the their
average values.
Table 3-2 Function computations
The order in which the values are aggregated is undefined, but every
intermediate result must be within the range of the result data type.
Function Computation
REGR_SLOPE(
expr1,expr2
)
COVAR(
expr1,expr2
)/VAR(
expr2
)
REGR_ICPT(
expr1,expr2)
AVG(
expr1
) - REGR_SLOPE(
expr1
,
expr2
) *
AVG(
expr2
)
REGR_R2(
expr1
,
expr2
)
POWER(CORR(
expr1
,
expr2
), 2) if
VAR(
expr1
)>0
REGR_R2(expr1,expr2) 1 if VAR(expr1) = 0
REGR_AVGX(expr1,expr2) AVG(expr2)
REGR_AVGY(
expr1
,
expr2
)
AVG(
expr1
)
REGR_SXX(
expr1
,
expr2
) REGR_COUNT(
expr1
,
expr2
) * VAR(
expr2
)
REGR_SYY(
expr1
,
expr2
) REGR_COUNT(
expr1
,
expr2
) * VAR(
expr1
)
REGR_SXY(
expr1
,
expr2
) REGR_COUNT(
expr1
,
expr2
) *
COVAR(
expr1
,
expr2
)
Important: The difference between REGR_AVG and AVG is that all nulls are
excluded in the REGR_AVG computations, while they are included in the
AVG(expression) computation.
Tip: In general, it is more efficient even in the absence of null values, to use
the regression functions to compute the statistics needed for a regression
analysis, than to perform the equivalent computations using ordinary column
functions such as AVG, VARIANCE, and COVARIANCE.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset