Chapter 16 – Statistical Aggregate Functions

“You can make more friends in two months by becoming interested in other people than you will in two years by trying to get other people interested in you."

- Dale Carnegie

Numeric Manipulation Functions

image

The functions above are often used for algebraic, trigonometric, or geometric calculations.

The Stats Table

image

Above is the Stats_Table data in which we will use in our statistical examples.

The VARIANCE Function

Col1 Numbers

123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using VARIANCE:

VARIANCE(<column-name>)

SELECT VARIANCE (col1)  AS VSCol1

FROM Stats_Table;

 VSCol1 

74.92

The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation.

A VARIANCE Example

image

The CORR Function

Col1 Numbers

123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using CORR:

CORR(<column-name>, <column-name>)

SELECT CORR(col1, col2) AS CCol1and2
FROM Stats_Table;

CCol1and2

0.99

The correlation coefficient is a number between -1 and 1. It is calculated from a number of pairs of observations or linear points (X,Y) Where:

1 = perfect positive correlation

0 = no correlation

-1 = perfect negative correlation

The CORR function is a binary function, meaning that two variables are used as input to it. It measures the association between 2 random variables. If the variables are such that when one changes the other does so in a related manner, they are correlated. Independent variables are not correlated because the change in one does not necessarily cause the other to change.

A CORR Example

image

Another CORR Example so you can compare

image

The REGR_INTERCEPT Function

image

Syntax for using REGR_INTERCEPT:

REGR_INTERCEPT(dependent-expression, independent-expression)

SELECT REGR_INTERCEPT(col1, col2)  AS RIofCol1_2
FROM Stats_Table;

RIofCol1_2

-1.35

A regression line is a line of best fit, drawn through a set of points on a graph for X and Y coordinates. It uses the Y coordinate as the Dependent Variable and the X value as the Independent Variable. Two regression lines always meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y). This is usually not one of the original data points.

A REGR_INTERCEPT Example

image

Another REGR_INTERCEPT Example so you can compare

image

The REGR_SLOPE Function

image

Syntax for using REGR_SLOPE:

REGR_SLOPE(dependent-expression, independent-expression)

SELECT REGR_SLOPE(col1, col2)  AS RSCol1_2
FROM Stats_Table;

RSCol1_2

1.94

A regression line is a line of best fit drawn through a set of points on a graph of X and Y coordinates. It uses the Y coordinate as the Dependent Variable and the X value as the Independent Variable. The slope of the line is the angle at which it moves on the X and Y coordinates. The slope is Y on X and the horizontal slope is X on Y.

A REGR_SLOPE Example

image

Another REGR_SLOPE Example so you can compare

image

The REGR_AVGX Function

image

Syntax for using REGR_AVGX:

REGR_AVGX(dependent-expression, independent-expression)

SELECT REGR_AVGX(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

8.67

The REGR_AVGX function is the average of the independent variable (sum(X)/N).

A REGR_AVGX Example

image

Another REGR_AVGX Example so you can compare

image

The REGR_AVGY Function

image

Syntax for using REGR_AVGY:

REGR_AVGY(dependent-expression, independent-expression)

SELECT REGR_AVGY(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

15.5

The REGR_AVGY is the average of the dependent variable (sum(Y)/N).

A REGR_AVGY Example

image

Another REGR_AVGY Example so you can compare

image

The REGR_COUNT Function

image

Syntax for using REGR_COUNT:

REGR_COUNT(dependent-expression, independent-
expression)

SELECT REGR_COUNT(col1, col2)  AS
RSCol1_2

FROM Stats_Table;

RSCol1_2

30

The REGR_COUNT is the number of input rows in which both expressions are non-null.

A REGR_COUNT Example

image

The REGR_R2 Function

image

Syntax for using REGR__R2:

REGR_R2(Y, X)

SELECT REGR_R2(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

0.97

The REGR_R2 is the square of the correlation coefficient.

A REGR_R2 Example

image

The REGR_SXX Function

image

Syntax for using REGR_SXX:

REGR_SXX(Y, X)

SELECT REGR_SXX(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

578.67

The REGR_SXX is the sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable).

A REGR_SXX Example

image

The REGR_SXY Function

image

Syntax for using REGR_SXY:

REGR_SXY(Y, X)

SELECT REGR_SXY(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

1125

The REGR_SXY is the sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable).

A REGR_SXY Example

image

The REGR_SYY Function

image

Syntax for using REGR_SYY:

REGR_SYY(Y, X)

SELECT REGR_SYY(col1, col2)  AS RSCol1_2

FROM Stats_Table;

RSCol1_2

2247.5

The REGR_SYY is the sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable).

A REGR_SYY Example

image

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset