200 High-Function Business Intelligence in e-business
Example 4-35 Compute
R
2
SELECT
REGR_COUNT(hits,days) AS num_days,
DECIMAL(REGR_SLOPE(log(hits), log(days)),10,4) AS a,
DECIMAL(EXP(REGR_ICPT(log(hits), log(days))),10,4) AS b,
DECIMAL(REGR_R2(log(hits), log(days)),10,4) AS r2
FROM traffic_data
The results of the foregoing query, using the built-in R2 function, are shown as
follows.
num_days a b r2
-------------- ------------- --------------- ----------------
100 1.9874 21.4302 0.9912
However, in order to correctly compute R
2
for the non-linear fit of the original
untransformed data, the SQL shown in Example 4-36 should be used.
Example 4-36 Correct
R
2
computation on original untransformed data
WITH coeffs(a,b) AS
(
SELECT
REGR_SLOPE(LOG(hits),LOG(days)) AS a,
EXP(REGR_ICPT(LOG(hits),LOG(days))) AS b
FROM traffic_data
),
residuals(days,hits,error) AS
(
SELECT
t.days,t.hits,t.hits - c.b*power(t.days,c.a)
FROM traffic_data t, coeffs c
)
SELECT 1e0-(SUM(error*error)/REGR_SYY(hits,days)) AS rr2
FROM residuals
The result of this query is as follows:
-------------------rr2
+9.55408646608249E-001
Note that the correct value R
2
is 0.955 which is lower than 0.991. This is typical.
Computing of R
2
for the transformed data usually results in an overestimate of
the goodness of fit.
The curve fitting data and R
2
value is shown in Figure 4-40 as charted by
Kaleidograph.