“Don’t count the days, make the days count.”
- Mohammed Ali
This ANSI version of CSUM is SUM() Over. Right now, the syntax wants to see the sum of the Daily_Sales after it is first sorted by Sale_Date. Rows Unbounded Preceding makes this a CSUM. The ANSI Syntax seems difficult, but only at first.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The first thing the above query does before calculating is SORT all the rows by Sale_Date. The Sort is located right after the ORDER BY.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The keywords Rows Unbounded Preceding determines that this is a CSUM. There are only a few different statements and Rows Unbounded Preceding is the main one. It means start calculating at the beginning row, and continue calculating until the last row.
SELECTProduct_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The second “SUMOVER” row is 90739.28. That is derived by the first row’s Daily_Sales (41888.88) added to the SECOND row’s Daily_Sales (48850.40).
SELECTProduct_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The third “SUMOVER” row is 138739.28. That is derived by taking the first row’s Daily_Sales (41888.88) and adding it to the SECOND row’s Daily_Sales (48850.40). Then, you add that total to the THIRD row’s Daily_Sales (48000.00).
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS SumOVER
FROM Sales_Table ;
You can have more than one SORT KEY. In the top query, Product_ID is the MAJOR Sort, and Sale_Date is the MINOR Sort.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING)
AS SumANSI
FROM Sales_Table ;
The PARTITION Statement is how you reset in ANSI. This will cause the SUMANSI to start over (reset) on its calculating for each NEW Product_ID.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS Subtotal,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS GRANDTotal
FROM Sales_Table ;
Above are two OLAP statements. Only one has PARTITION BY, so only it resets.
The SUM () Over allows you to get the moving SUM of a certain column.
The SUM OVER places the sort after the ORDER BY.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS Sum3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row?
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS Sum3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row? It is the sum of 48850.40, 54500.22 and 36000.07. The current row of Daily_Sales plus the previous two rows of Daily_Sales.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS SUM3,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
The ROWS 2 Preceding gives the MSUM for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MSUM.
Use a PARTITION BY Statement to Reset the ANSI OLAP. Notice it only resets the OLAP command containing the Partition By statement, but not the other OLAPs.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3
FROM Sales_Table ;
Notice the Moving Window of 3 in the syntax and that it is a 2 in the ANSI version. That is because in ANSI, it is considered the Current Row and 2 preceding.
The AVG () Over allows you to get the moving average of a certain column. The Rows 2 Preceding is a moving window of 3 in ANSI.
Much like the SUM OVER Command, the Average OVER places the sort keys via the ORDER BY keywords.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 46450.23 amount derived in the AVG_3_ANSI column in the third row?
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
AVG of 48850.40, 54500.22, and 36000.07
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3_ANSI column in the fourth row?
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
AVG of 54500.22, 36000.07, and 40200.43
With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3_ANSI column in the fourth row?
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG3,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
The ROWS 2 Preceding gives the MAVG for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MAVG.
Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets the column using the statement. Notice that only Continuous resets.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000) ;
This is the RANK() OVER. It provides a rank for your queries. Notice how you do not place anything within the () after the word RANK. Default Sort is ASC.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales DESC)
AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Is the query above in ASC mode or DESC mode for sorting?
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (PARTITION BY Product_ID
ORDER BY Daily_Sales DESC)
AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What does the PARTITION Statement in the RANK() OVER do? It resets the rank.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales DESC)
AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
Limit 6 ;
The Limit statement limits rows once the Rank's been calculated.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
We now have added a Partition statement which resets on Product_ID so this produces 7 rows for each of our Product_IDs.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK()
OVER ( ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Percentage_Rank is just like RANK, however, it gives you the Rank as a percent, but only a percent of all the other rows up to 100%.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER ( ORDER BY Daily_Sales DESC)
AS PercentRank1
FROM Sales_Table ;
Percentage_Rank is just like RANK, however, it gives you the Rank as a percent but only a percent of all the other rows up to 100%.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
What caused the Product_IDs to be sorted?
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
What caused the Product_IDs to be sorted? It was the PARTITION BY statement.
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
This is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s) ROWS UNBOUNDED PRECEDING causes Seq_Number to start at the beginning and increase sequentially to the end.
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the Max() Over shows the Max Value up to that point.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MAX(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The largest value is 64300.00 in the column MaxOver. Once it was evaluated, it did not continue until the end because of the PARTITION BY reset.
SELECT Product_ID, Sale_Date ,Daily_Sales
,MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the MIN () Over shows the Max Value up to that point.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The last two answers (MinOver) are blank, so you can fill in the blank.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The last two answers (MinOver) are filled in.
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER
(ORDER BY Product_ID, Sale_Date) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The ROW_NUMBER() Keyword(s) caused Seq_Number to increase sequentially. Notice that this does NOT have a Rows Unbounded Preceding, and it still works!
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
There are actually three different versions of the standard deviation:
STDDEV - returns the standard deviation of the
expression, which is the square root of VARIANCE.
When VARIANCE returns null, this function returns null.
STDDEV_POP - computes the population standard
deviation. This function is the same as the square root of
the VAR_POP function. When VAR_POP returns null,
this function returns null.
STDDEV_SAMP - computes the sample standard
deviation, which is the square root of VAR_ SAMP.
When VAR_SAMP returns null, this function returns null.
The above information is an introduction to Standard Deviation in an OLAP statement.
The following ANSI syntax is used with the
three standard deviation functions:
{ STDDEV | STDDEV_POP | STDDEV_SAMP (<column-name>) OVER
([ PARTITION BY <column-list> ]
ORDER BY <column-list>
[ ROWS [BETWEEN] { UNBOUNDED | <number>} PRECEDING
[ AND UNBOUNDED | <number> } FOLLOWING ] ]
[EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES |
EXCLUDE NO OTHERS] ) ;
The above information is the syntax for standard deviation using the OVER clause. In order to provide the moving functionality, it is necessary to have a method that designates the number of rows to include in the STDDEV. It allows the function to calculate values from columns contained in rows that are before the current row and also rows that are after the current row.
Wow! What an amazing example.
The following ANSI syntax is used with
the three standard deviation functions:
{ VARIANCE | VAR_SAMP (<column-name>) OVER
([ PARTITION BY<column-list> ]
ORDER BY <column-list>
[ ROWS [BETWEEN] { UNBOUNDED | <number>} PRECEDING
[ AND UNBOUNDED | <number> }FOLLOWING ] ]
[EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES |
EXCLUDE NO OTHERS] ) ;
The above information is the syntax for VARIANCE and VAR_SAMP using the OVER clause. In order to provide the moving functionality, it is necessary to have a method that designates the number of rows to include in the VARIANCE. It allows the function to calculate values from columns contained in rows that are before the current row and also rows that are after the current row.
VARIANCE Function
Returns the variance of the expression. If you apply this function to an empty set, it returns null. Matrix calculates the expression as follows:
• 0 if the number of rows in expression = 1
• VAR_SAMP if the number of rows in expression > 1
VARIANCE = (SUM(expr2) - ((SUM(expr))2 / COUNT(expr))) / (COUNT(expr) - 1)
VAR_POP Function
Returns the population variance of a set of numbers after discarding the nulls in this set. If you apply this function to an empty set, it returns null.
VAR_POP = (SUM(expr2) - ((SUM(expr))2 / COUNT(expr))) / COUNT(expr)
VAR_ SAMP Function
Returns the sample variance of a set of numbers after discarding the nulls in this set. If you apply this function to an empty set, it returns null.
The above information is an introduction to the Variance functions used in conjunction with an OVER statement.
SELECTProduct_ID AS "Prod", Sale_Date, Daily_Sales AS "Sales"
,VARIANCE(Daily_Sales) OVER ( ORDER BY sale_date
ROWS unboundedPRECEDING) AS VAR_S
,VARIANCE(Daily_Sales) OVER ( PARTITION BY sale_date
ORDER BY sale_dateROWS unbounded PRECEDING ) VAR_P
,VAR_POP(Daily_Sales) OVER ( ORDER BY sale_date
ROWS unboundedPRECEDING) AS VAR_POP
,VAR_SAMP(Daily_Sales) OVER ( ORDER BY sale_date
ROWS unboundedPRECEDING) AS VAR_SMP
FROMsql_class..sales_table
WHERE EXTRACT(MONTH FROM Sale_Date) = 9 ;
Wow! Another amazing example. The above example uses all three standard deviation functions to produce output sorting on the sales date for the dates in September.
The FIRST_VALUE and LAST_VALUE functions allow you to specify sorted
aggregate groups and return the first or last value of each group. The function
needs to know the length of the data at run time and does not allow a decimal value.
Syntax for FIRST_VALUE and LAST_VALUE:
{FIRST_VALUE | LAST_VALUE} ({ <column reference> | <value expression> | *})
OVER
([PARTITION BY <column reference>[,...]]
ORDER BY {<column reference> [ASC | DESC] } [,...]]
[ ROWS | RANGE {{ CURRENT ROW | UNBOUNDED
| <literal value> PRECEDING} | BETWEEN {CURRENT ROW |
UNBOUNDED PRECEDING | <literal value> PRECEDING}
AND {CURRENT ROW
| UNBOUNDED FOLLOWING | <literal value> FOLLOWING}}]
[ EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES
| EXCLUDE NO OTHERS ] ) ;
The above information provides information and the syntax for FIRST_VALUE and LAST_Value.
SELECTLast_name, first_name, dept_no
‚FIRST_VALUE(first_name)
OVER (ORDER BY dept_no, last_name desc
rows unbounded preceding) AS "First All"
‚FIRST_VALUE(first_name)
OVER (PARTITION BY dept_no
ORDER BY dept_no, last_name desc
rows unbounded preceding) AS "First Partition"
FROMSQL_Class..Employee_Table;
The above example uses FIRST_VALUE to show you the very first first_name returned. It also uses the keyword Partition to show you the very first first_name returned in each department.
SELECTLast_name, first_name, dept_no
‚LAST_VALUE(first_name)
OVER (ORDER BY dept_no, last_name desc
rows unbounded preceding)AS "Last All"
‚LAST_VALUE(first_name)
OVER (PARTITION BY dept_no
ORDER BY dept_no, last_name desc
rows unbounded preceding) AS "Last Partition"
FROMsql_class.Employee_Table;
The FIRST_VALUE and LAST_VALUE are good to use anytime you need to propagate a value from one row to all or multiple rows based on a sorted sequence. However, the output from the LAST_VALUE function appears to be incorrect or incomplete until you understand a few concepts. The SQL request specifies "rows unbounded preceding" and LAST_VALUE looks at the last row. The current row is always the last row, and therefore, it appears in the output.
The LAG and LEAD functions allow you to compare different
rows of a table by specifying an offset from the current row.
You can use these functions to analyze change and variation.
Syntax for LAG and LEAD:
{LAG | LEAD} (<value expression>, [<offset> [, <default>]]) OVER
([PARTITION BY <column reference>[,...]]
ORDER BY <column reference> [ASC | DESC] [,...] );
“Only he who attempts the ridiculous may achieve the impossible.”
– Don Quixote
The above provides information and the syntax for LAG and LEAD.
SELECT last_name, dept_no
,lead(dept_no)
over (order by dept_no, last_name) as "Lead All"
,lead(dept_no) over (Partition by dept_no
order by dept_no, last_name) as "Lead Partition"
FROM employee_table;
As you can see, the first LEAD brings back the value from the next row except for the last which has no row following it. The offset value was not specified in this example, so it defaulted to a value of 1 row.
SELECT last_name, dept_no
,lead(dept_no,2)
over (order by dept_no, last_name) as "Lead All"
,lead(dept_no,2) over (Partition by dept_no
order by dept_no, last_name) as "Lead Partition"
FROM employee_table;
Above, each value in the first LEAD is 2 rows away, and the partitioning only shows when values are contained in each value group with 1 more than offset value.
SELECT last_name, dept_no
,lag(dept_no)
over (order by dept_no, last_name) as "Lag All"
,lag(dept_no)
over (Partition by dept_no
order by dept_no, last_name) as "Lag Partition"
FROM employee_table;
From the example above, you see that LAG uses the value from a previous row and makes it available in the next row. For LAG, the first row(s) will contain a null based on the value in the offset. Above is defaulted to 1. The first null comes from the function, whereas the second row gets the null from the first row.
SELECT last_name, dept_no
,lag(dept_no,2)
over (order by dept_no, last_name) as "Lag All"
,lag(dept_no,2)
over (Partition by dept_no
order by dept_no, last_name) as "Lag Partition"
FROM employee_table;
For this example, the first two rows have a null because there is not a row two rows before these. The number of nulls will always be the same as the offset value. There is a third null because Jones Dept_No is null.