“Don’t count the days, make the days count.”
- Mohammed Ali
SELECT Product_ID
,CAST(Sale_Time as Date) Sale_Time , Daily_Sales,
ROW_NUMBER() OVER
(ORDER BY Product_ID, Sale_Time) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The ROW_NUMBER Keyword caused Seq_Number to increase sequentially. Notice that this does NOT have a Rows Unbounded Preceding statement, and it still works!
SELECT Product_ID
,CAST(Sale_Time as Date) Sale_Time , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID
,CAST(Sale_Time as Date) Sale_Time , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
WITH Results AS
( SELECT
ROW_NUMBER()
OVER(ORDER BY Product_ID, Sale_Time) AS RowNumber,
Product_ID, Sale_Time
FROM Sales_Table
) SELECT *
FROM Results
WHERE RowNumber BETWEEN 8 AND 14
In the example above, we are using a derived table called Results and then using a WHERE clause to only take certain RowNumbers.
Above is an example of the Ordered Analytics using the keyword OVER.
SELECT Product_ID, Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales ASC)as Rank,
DENSE_RANK() OVER(Order By Daily_Sales ASC) as DenseRank
FROM Sales_Table
WHERE Product_ID in(1000, 2000)
Above is an example of the RANK and DENSE_RANK commands. Notice the difference in the ties and the next ranking.
SELECT Product_ID ,CAST(Sale_Time as date) Sale_Time, Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales) AS Rank1
FROMSales_Table WHERE Product_ID IN (1000, 2000) ;
The RANK OVER command defaults the Sort to ASC.
SELECT Product_ID , CAST(Sale_Time as date) Sale_Time, Daily_Sales
,RANK()OVER(ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Utilize the DESC keyword in the ORDER BY statement to rank in descending order.
SELECT Product_ID ,Sale_Time , Daily_Sales,
RANK()OVER (PARTITION BY Product_ID
ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What does the PARTITION Statement in the RANK OVER do? It resets the rank.
SELECT Product_ID ,Sale_Time , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS P_Rank
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
The second row's 0.17 Percent Rank means that only 0.17 rows performed better for Product_ID 1000.
SELECT Product_ID ,Sale_Time , Daily_Sales,
PERCENT_RANK()
OVER ( ORDER BY Daily_Sales DESC) AS P_Rank
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Percentage_Rank is just like RANK, however, it gives you the Rank as a percent, but only a percent of all the other rows up to 100%. The fifth row's 0.31 percent rank means that only 0.31 rows performed better.
SELECT Product_ID ,Sale_Time , Daily_Sales
, PERCENT_RANK() OVER ( ORDER BY Daily_Sales DESC) P_Rank
FROM Sales_Table;
Percentage_Rank is just like RANK, however, it gives you the Rank as a percent but only a percent of all the other rows up to 100%.
SELECT Product_ID Prod,
CAST(Sale_Time as Date)Sale_Time,Daily_Sales,
SUM(Daily_Sales)OVER(ORDER BYSale_Time
ROWS UNBOUNDED PRECEDING) AS CsumAnsi
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The keywords Rows Unbounded Preceding determines that this is a cumulative sum (CSUM). There are only a few different statements and Rows Unbounded Preceding is the main one. It means start calculating at the beginning row, and continue calculating until the last row.
SELECT Product_ID,
CAST(Sale_Time as Date)Sale_Time,Daily_Sales,
SUM(Daily_Sales)OVER(ORDER BYSale_Time
ROWS UNBOUNDED PRECEDING) AS CsumAnsi
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The second “SUMOVER” row is 90739.28. That is derived by the first row’s Daily_Sales (41888.88) added to the SECOND row’s Daily_Sales (48850.40).
SELECT Product_ID,
CAST(Sale_Time as Date)Sale_Time,Daily_Sales,
SUM(Daily_Sales)OVER(ORDER BYSale_Time
ROWS UNBOUNDED PRECEDING) AS CsumAnsi
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
The third “SUMOVER” row is 138739.28. That is derived by taking the first row’s Daily_Sales (41888.88) and adding it to the SECOND row’s Daily_Sales (48850.40). Then, you add that total to the THIRD row’s Daily_Sales (48000.00).
SELECT Product_ID,
CAST(Sale_Time as Date) Sale_Time, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BYProduct_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS SumOVER
FROM Sales_Table;
You can have more than one SORT KEY. In the top query, Product_ID is the MAJOR Sort, and Sale_Time is the MINOR Sort. Remember, the data is sorted first and then the cumulative sum is calculated. That is why they are called Ordered Analytics.
SELECT Product_ID Prod,
CAST(Sale_Time as Date) Sale_Time
, Daily_Sales as Sales
,SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) as SUMOVER,
SUM(1) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS Num
FROM Sales_Table WHERE Product_ID = 1000 ;
With “Seq_Number”, because you placed the number 1 in the area which calculates the cumulative sum, it’ll continuously add 1 to the answer for each row.
SELECT Product_ID ,
CAST(Sale_Time as Date) Sale_Time, Daily_Sales,
SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS SumANSI
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
The PARTITION Statement is how you reset in ANSI. This will cause the SUMANSI to start over (reset) on all calculations for each NEW Product_ID.
SELECT Product_ID
,CAST(Sale_Time as Date) Sale_Time, Daily_Sales
,SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS Subtotal
,SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS GrandTotal
FROM Sales_Table;
Above are two OLAP statements. Only one has PARTITION BY, so only it resets. The other continuously does a CSUM.
SELECT Product_ID, CAST (Sale_Time as date) Sale_Time
,Daily_Sales
,SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS CumulativeTotal
FROM Sales_Table ORDER BY CumulativeTotal
Above we used the statement “ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING” to produce a CSUM, but notice that the Product_ID and the Sale_Time are reversed. We see the Product_ID of 3000 and the latest date first.
SELECT Product_ID, CAST(Sale_Time as date) Sale_Time, Daily_Sales
,SUM(Daily_Sales) OVER( PARTITION BY Product_ID ORDER BY
Product_ID, Sale_Time ROWS BETWEEN 1 PRECEDING AND CURRENT
ROW ) as Row_Preceding ,SUM(Daily_Sales) OVER( PARTITION BY
Product_ID ORDER BY Product_Id, Sale_Time ROWS BETWEEN CURRENT
ROW AND 1 FOLLOWING) as Row_Following
FROM Sales_Table
The example above uses ROWS BETWEEN 1 PRECEDING AND CURRENT ROW, and then uses a different example with ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Notice how the report came out?
The SUM () Over allows you to get the moving SUM of a certain column. The moving window always includes the current row. A Rows 2 Preceding statement means the current row and two preceding, which is a moving window of 3.
The SUM OVER places the sort after the ORDER BY.
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row?
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row? It is the sum of 48850.40, 54500.22 and 36000.07. The current row of Daily_Sales plus the previous two rows of Daily_Sales.
SELECT Product_ID as Prod ,CAST(Sale_Time as date) Sale_Time,
Daily_Sales, SUM(Daily_Sales) OVER (ORDER BY Product_ID,
Sale_Time ROWS 2 Preceding) AS SUM3,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
The ROWS 2 Preceding gives the MSUM for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MSUM.
Use a PARTITION BY Statement to Reset the ANSI OLAP. Notice it only resets the OLAP command containing the Partition By statement, but not the other OLAP.
The AVG () Over allows you to get the moving average of a certain column. The Rows 2 Preceding is a moving window of 3 in ANSI.
SELECT Product_ID, CAST(Sale_Time as Date) Sale_Time, Daily_Sales
,AVG(Daily_Sales)OVER (ORDER BY Product_ID, Sale_Time
ROWS 2 Preceding) AS AVG_3
FROM Sales_Table ;
Notice the Moving Window of 3 in the syntax and that it is a ROWS 2 Preceding. That is because in ANSI is considered the Current Row and 2 preceding.
We have done a cast to get rid of the decimals.
SELECT Product_ID, CAST(Sale_Time as Date) Sale_Time, Daily_Sales
, AVG(Daily_Sales)OVER (ORDER BY Product_ID, Sale_Time
ROWS 2 Preceding) AS AVG3
,AVG(Daily_Sales)OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
The ROWS 2 Preceding gives the MAVG for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MAVG.
Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets the column using the statement. Notice that only Continuous resets.
SELECT Product_ID ,CAST(Sale_Time as Date) Sale_Time, Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
This is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s) ROWS UNBOUNDED PRECEDING causes Seq_Number to start at the beginning and increase sequentially to the end.
SELECT Product_ID, CAST(Sale_Time as Date) Sale_Time, Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Time) AS No_Seq
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
When you don’t have a ROWS UNBOUNDED PRECEDING this still works just fine.
SELECT Product_ID, CAST(Sale_Time as Date) Sale_Time, Daily_Sales,
COUNT(*)OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID, Cast(Sale_Time as Date) Sale_Time, Daily_Sales,
COUNT(*) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the Max Over shows the Max Value up to that point.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales
, MAX(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The largest value is 64300.00 in the column MaxOver. Once it was evaluated, it did not continue until the end because of the PARTITION BY reset.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales
,MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDEDPRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the MIN () Over shows the MIN value up to that point.
SELECT Product_ID, Cast (Sale_Time as date) Sale_Time, Daily_Sales
, MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Time
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The PARTITION BY keyword will reset when the Product_ID changes. This is why the MINOVER changed for Product_ID 2000, even though 41888.88 is higher than 32800.50.
SELECT ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Sale_Time) As Rnbr
,Product_Id as PROD, Cast(Sale_Time as date) Sale_Time
,MIN(Cast(Sale_Time as date)) OVER (PARTITION BY Product_ID
ORDER BY Sale_Time ROWS BETWEEN 1 FOLLOWING AND 1
FOLLOWING)As Next_Start_Dt
,Daily_Sales ,SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Sale_Time ROWS UNBOUNDED PRECEDING)
As To_Date_Revenue FROM Sales_Table
The above example shows the cumulative SUM for the Daily_Sales and the next date on the same line.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales
,NTILE (4) OVER (ORDER BY Daily_Sales , Sale_Time ) AS Quartiles
FROM Sales_Table WHERE Product_ID = 1000;
Assigning a different value to the <partitions> indicator of the Ntile function changes the number of partitions established. Each Ntile partition is assigned a number starting at 1 and increases. So, with an Ntile of 4 the partitions are 1 through 4. Then, all the rows are distributed as evenly as possible into each partition from highest to lowest values. Normally, extra rows with the lowest value begin back in the lowest numbered partitions.
SELECT Last_Name, Grade_Pt,
NTILE(5) OVER (ORDER BY Grade_Pt) as Tile
FROM Student_Table
ORDER BY "Tile" DESC;
The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. For example, the example above has 10 rows, so NTILE (5) splits the 10 rows into five equally sized tiles. There are 2 rows in each tile in the order of the OVER clause's ORDER BY.
SELECT Dept_No, EmployeeCount,
NTILE(2) OVER (ORDER BY EmployeeCount) Tile
FROM (SELECT Dept_No, COUNT(*) EmployeeCount
FROM Employee_Table
GROUP BY Dept_No
) Q
ORDER BY 3 DESC;
The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. For example, the example above has 6 rows, so NTILE (2) splits the 10 rows into 2 equally sized tiles. There are 3 rows in each tile in the order of the OVER clause's ORDER BY.
SELECT Claim_ID, Cast(Claim_Timeas date) Claim_Date, ClaimCount,
NTILE(100) OVER (ORDER BY ClaimCount) Percentile
FROM (SELECT Claim_ID, Claim_Time, COUNT(*) ClaimCount
FROM Claims
GROUP BY Claim_ID, Claim_Time) Q
ORDER BY Percentile DESC
The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. Above is a way to get the percentile.
SELECT Product_ID,
Cast(Sale_Time as date) Sale_Time, Daily_Sales
,NTILE(100) OVER (ORDER BY Daily_Sales) Quantile
FROM Sales_Table
WHERE Product_ID < 2000 ;
This example determines the percentile for every row in the Sales table based on the daily sales amount. It sorts it into sequence by the value being categorized, which here is daily sales.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales
,NTILE (4) OVER (Order by Daily_Sales,Sale_Time ) AS Quartiles
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
Instead of 100, the example above uses a quartile (QUANTILE based on 4 partitions). The NTILE function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is omitted, the entire input will be sorted using the ORDER BY clause, and then divided into the number of buckets specified.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
NTILE(4) OVER (ORDER BY Daily_Sales) AS Bucket
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The NTILE function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is omitted, the entire input will be sorted using the ORDER BY clause, and then divided into the number of buckets specified.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
NTILE(10) OVER (ORDER BY Daily_Sales DESC)AS Bucket
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The NTILE function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is omitted, the entire input will be sorted using the ORDER BY clause, and then divided into the number of buckets specified.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
NTILE(3) OVER (PARTITION BY Product_ID
ORDER BY Daily_Sales)AS Bucket
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
In this example, because PARTITION BY is listed, the data will first be sorted by Product_ID and then sorted using the ORDER BY clause (within Product_ID), and then divided into the number of buckets specified.
SELECT Last_name, first_name, dept_no
,FIRST_VALUE(first_name) OVER (ORDER BY dept_no, last_name desc
rows unbounded preceding) AS First_All
,FIRST_VALUE (first_name) OVER (PARTITION BY dept_no
ORDER BY dept_no, last_name desc
rows unbounded preceding)AS First_Partition
FROM Employee_Table;
The above example uses FIRST_VALUE to show you the very first first_name returned. It also uses the keyword Partition to show you the very first first_name returned in each department.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - First_Value (Daily_Sales)
OVER (ORDER BY Sale_Time) AS Delta_First
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
Above, after sorting the data by Sale_Time, we computed the difference between the first row's Daily_Sales and the Daily_Sales of each following row. All rows Daily_Sales are compared with the first row's Daily_Sales, thus the name First_Value.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - First_Value (Daily_Sales)
OVER (ORDER BY Daily_Sales DESC) AS Delta_First
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
Above, after sorting the data by Daily_Sales DESC, we computed the difference between the first row's Daily_Sales and the Daily_Sales of each following row. All rows Daily_Sales are compared with the first row's Daily_Sales, thus the name First_Value. This example shows that how much less each Daily_Sales is compared to 64,300.00 (our highest sale).
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - First_Value (Daily_Sales)
OVER (PARTITION BY Product_ID
ORDER BY Sale_Time) AS Delta_First
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
We are now comparing the Daily_Sales of the first Sale_Time for each Product_ID with the Daily_Sales of all other rows within the Product_ID partition. Each row is only compared with the first row (First_Value) in its partition.
Sometimes you need to find the first or last occurrence of something. This query finds the First_Value and then combines the Row_Number analytic to retrieve only the first occurrence. This is done using a derived table and then querying it.
Sometimes you need to find the first or last occurrence of something. This query finds the First_Value and then combines the Row_Number analytic to retrieve only the first occurrence. This is done using a derived table and then querying it.
SELECT Last_name, first_name, dept_no, LAST_VALUE(first_name)
OVER (ORDER BY dept_no, last_name desc
rows unbounded preceding) AS Last_All
,LAST_VALUE(first_name) OVER (PARTITION BY dept_no
ORDER BY dept_no, last_name desc
rows unbounded preceding) AS Last_Partition
FROM sql_class.Employee_Table;
The FIRST_VALUE and LAST_VALUE are good to use anytime you need to propagate a value from one row to all or multiple rows based on a sorted sequence. However, the output from the LAST_VALUE function appears to be incorrect and is a little misleading until you understand a few concepts. The SQL request specifies "rows unbounded preceding”, and LAST_VALUE looks at the last row. The current row is always the last row, and therefore, it appears in the output.
SELECT Product_ID, Cast (Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - LAST_Value (Daily_Sales)
OVER (ORDER BY Sale_Time) AS Delta_Last
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
Above, after sorting the data by Sale_Time, we computed the difference between the last row's Daily_Sales and the Daily_Sales of each following row (from the same Sale_Time). Since there are only two product totals for each day, there is always a 0.00 for one of the rows.
Compatibility: Hadoop Extension
The LAG and LEAD functions allow you to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation.
Syntax for LAG and LEAD:
{LAG | LEAD} (<value expression>, [<offset> [, <default>]]) OVER
([PARTITION BY <column reference>[,...]]
ORDER BY <column reference> [ASC | DESC] [,...] ) ;
The above provides information and the syntax for LAG and LEAD.
select product_id, Sale_Time, daily_sales
, lead(daily_sales) over(partition by product_id order by
Sale_Time) AS Next_Value
from sales_table
This LEAD example shows the value of Daily_Sales and then the next value on the same line.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - LEAD(Daily_Sales, 1, 0)
OVER (ORDER BY Product_ID, Sale_Time) AS Lead1
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
Above, we computed the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the next row's Daily_Sales, or one whose Daily_Sales is the same). The expression LEAD (Daily_Sales, 1, 0) tells LEAD() to evaluate the expression Daily_Sales on the row that is positioned one row following the current row. If there is no such row (as is the case on the last row of the partition or relation), then the default value of 0 is used.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
Daily_Sales - LEAD(Daily_Sales, 1, 0)
OVER (PARTITION BY Product_ID ORDER BY Sale_Time) AS Lead1
FROMSales_TableWHERE Product_ID IN (1000, 2000) ;
Above, we computed the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the next row's Daily_Sales, or one whose Daily_Sales is the same). We also partitioned the data by Product_ID.
WITH Derived_Tbl AS
(select Product_ID as Prod
,Cast(Sale_Time as Date) as Sale_Date
,Daily_Sales
,lead(daily_sales) over(partition by product_id
order by Sale_Time ASC)AS Next_Sales
,Row_Number() over (partition by product_id
order by Sale_Time ASC) AS Row_Num
from sales_table)
Select * from Derived_Tbl Where Row_Num = 1 ;
Use a Derived Table and a Row_Number command and you can find the first occurrence of your data. Above, we partitioned by Product_ID, but we did the order by statement in ASC mode on Sale_Time. We just got the first Sale_Time. We also threw in the next row's Daily_Sales, which happened after the first row's Sale_Time.
WITH Derived_Tbl AS
(select Product_ID as Prod
,Cast(Sale_Time as Date) as Sale_Date
,Daily_Sales
,lead(daily_sales) over(partition by product_id
order by Sale_Time DESC) AS Previous_Sales
,Row_Number() over (partition by product_id
order by Sale_Time DESC) AS Row_Num
from sales_table)
Select * from Derived_Tbl Where Row_Num = 1 ;
Use a Derived Table and a Row_Number command and you can find the last occurrence of your data. Above, we partitioned by Product_ID, but we did the order by statement in DESC mode on Sale_Time. We just got the last Sale_Time. We also threw in the previous row's Daily_Sales, which happened before the last row's Sale_Time.
SELECT
Last_Name, Dept_No
,LEAD(Dept_No)
OVER (ORDER BY Dept_No, Last_Name)as "Lead All"
,LEAD(Dept_No) OVER (PARTITION BY Dept_No
ORDER BY Dept_No, Last_Name) as "Lead Partition"
FROM Employee_Table;
As you can see above, the first LEAD brings back the value from the next row except for the last which has no row following it. The offset value was not specified in this example, so it defaulted to a value of 1 row.
SELECT Last_Name, Dept_No
,LEAD(Dept_No,2)
OVER (ORDER BY Dept_No, Last_Name)as "Lead All"
,LEAD(Dept_No,2)
OVER (PARTITION BY Dept_No
ORDER BY Dept_No, Last_Name) as "Lead Partition"
FROM Employee_Table;
Above, each value in the first LEAD is 2 rows away, and the partitioning only shows when values are contained in each value group with 1 more than offset value.
SELECT Last_Name, Dept_No
,LAG(Dept_No)
OVER (ORDER BY Dept_No, Last_Name) as "Lag All"
,LAG(Dept_No)
OVER (PARTITION BY Dept_No
ORDERBY Dept_No, Last_Name) as "Lag Partition"
FROM Employee_Table;
From the example above, you see that LAG uses the value from a previous row and makes it available in the next row. For LAG, the first row(s) will contain a null based on the value in the offset. Here it defaulted to 1. The first null comes from the function where as the second row gets the null from the first row.
SELECT Last_Name, Dept_No
,LAG(Dept_No,2)
OVER (ORDERBY Dept_No, Last_Name) as "Lag All"
,LAG(Dept_No,2)
OVER (PARTITION BY Dept_No
ORDER BY Dept_No, Last_Name) as "Lag Partition"
FROM Employee_Table;
For this example, the first two rows have a null because there is not a row two rows before these. The number of nulls will always be the same as the offset value. The last null is because Jones Dept_No is null.
Above, we computed the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the previous row's Daily_Sales, or one whose Daily_Sales is the same). The expression LAG (Daily_Sales, 1, 0) tells LAG() to evaluate the expression Daily_Sales on the row that is positioned one row before the current row. If there is no such row (as is the case on the first row of the partition or relation), then the default value of 0 is used.
SELECT Product_ID,
To_char(Sale_Time, 'mm/dd/yyyy') "Sale_Time", Daily_Sales,
Daily_Sales - LAG(Daily_Sales, 1, 0)
OVER (PARTITION BY Product_ID ORDER BY Sale_Time) AS Lag1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Above, we computed the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the previous row's Daily_Sales, or one whose Daily_Sales is the same). The expression LAG (Daily_Sales, 1, 0) tells LAG() to evaluate the expression Daily_Sales on the row that is positioned one row before the current row. If there is no such row (as is the case on the first row of the partition or relation), then the default value of 0 is used.
SELECT Product_ID, Cast (Sale_Time as date) Sale_Time, Daily_Sales,
CUME_DIST() OVER (ORDER BY Daily_Sales DESC) AS CDist
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula. That formula is (number of rows preceding or peer with current row) / (total rows). We order by Daily_Sales DESC, so that each row is ranked by cumulative distribution. The distribution is represented relatively, by floating point numbers, from 0 to 1. When there is only one row in a partition, it is assigned 1. When there are more than one row, they are assigned a cumulative distribution ranking ranging from 0 to 1.
SELECT Product_ID, Cast(Sale_Time as date) Sale_Time, Daily_Sales,
CUME_DIST() OVER (PARTITION by Product_ID
ORDER BY Daily_Sales DESC) AS CDist
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The CUME_DIST is a cumulative distribution function that assigns a relative rank to each row, based on a formula. That formula is (number of rows preceding or peer with current row) / (total rows). We Partition by Product_ID and then Order By Daily_Sales DESC, so that each row is ranked by cumulative distribution within its partition.
SELECT Product_ID , SUM(Daily_Sales) as Summy,
SUM(SUM(Daily_Sales)) OVER (ORDER BY Sum(Daily_Sales) )
AS Prod_Sales_Running_Sum
FROM Sales_Table
GROUP BY Product_ID ;
Window functions can compute aggregates of aggregates as shown in the example above.