Chapter 3. DB2 UDB’s statistics, analytic, and OLAP functions 115
DB2 has a
R
2
function, REGR_R2. The properties of R
2
are:
?
R
2
bound is between 0 and 1.
? If
R
2
equals 1 then all the points fit on the regression line exactly.
? If
R
2
equals zero then the two attributes are independent.
The closer
R
2
is to 1, the better the computed linear regression model. In
general, an
R
2
greater than 0.75 or so, is considered a good fit for most
applications. However, it varies by application and it is ultimately up to the user to
decide what value constitutes a good model.
The DB2 SQL could look as shown in Example 3-9:
Example 3-9 Linear regression example 2
SELECT REGR_R2 (bonus , salary) AS r2
FROM employee
WHERE workdept = 'D11'
The result of this query is 0.54624.
Since
R
2
is not very close to 1, we conclude that the computed linear regression
model does not appear to be a very good fit.
Another example of using regression involves the assumption of a linear
relationship between the advertising budget and sales figures of a particular
organization that conforms to the equation:
Where:
‘y’ is the sales dependent variable.
‘x’ is the advertising budget independent variable.
‘a’ is the slope.
‘b’ is the y-axis intercept corresponding to budget cost even with zero sales.
Note: The columns referenced in the regression functions are reversed from
those in the variance and covariance examples. Since we wish to determine
BONUS as a function of SALARY, it is listed first before SALARY.
yaxb
+=