Chapter 34. Doing Analytical Calculations

When Prep Builder was originally released, it was missing a number of functions that Desktop users were used to. Therefore, part of a Desktop user’s normal analytical process was not straightforward within Prep Builder. Some of these capabilities, known as analytical calculations, are currently being added to the tool.

Analytical calculations actually refers to functions within a calculation. In this chapter, we’ll look at a type of calculation familiar to Desktop users called Table Calculations, how and why to use them in Prep, and some examples and use cases.

What Is a Table Calculation?

Table Calculations in Tableau Desktop are highly flexible and easy to use with a couple of clicks. Table Calculations allow you to handle secondary aggregations, like rank or running total, that would not otherwise be possible in Tableau.

The downside of Table Calculations is exactly the same as what makes them great: their flexibility. A Table Calculation can be easily reconfigured to work on whatever dimensions are within the view. Therefore, whenever a user wants to use the calculation, they need to understand how to reconfigure it. With complicated views, if the user doesn’t reconfigure the calculation correctly, this can impact both the performance of the view and the accuracy of the data output. Taking this work on during the data preparation stage can help mitigate these risks.

For example, let’s use Tableau’s Superstore data set to cover a use case assessing the ranking of each region based on its sales. In Tableau Desktop, you’d need to include both the Date and Sales numbers in the view as well as breaking down quarterly sales into Region rows. Figure 34-1 shows what that view might look like in Desktop.

Superstore’s quarterly sales in Tableau Desktop
Figure 34-1. Superstore’s quarterly sales in Tableau Desktop

To rank those regional sales in each quarter in Tableau Desktop, we’d use a Table Calculation. Table Calculations are always applied to continuous or measure fields in the view, as they make use of the results shown in the view—in this case, the sum of sales for each region and quarter. To create the rank value, right-click on the sum(Sales) data field, select Quick Table Calculation, and then select Rank in the submenu (Figure 34-2).

Table Calculation menu in Tableau Desktop
Figure 34-2. Table Calculation menu in Tableau Desktop

The rank values returned are not actually what we need, as the calculation is returning the quarterly sales ranked within, not across, each region (Figure 34-3).

The default results from applying a Rank Table Calculation
Figure 34-3. The default results from applying a Rank Table Calculation

We wanted to see how the regions rank against each other per quarter, not how each region ranks against its past performance. To change this, we need to edit the Table Calculation. Right-clicking on sum(Sales), which now has a delta symbol on the right-hand side, opens a menu with the option Edit Tableau Calculation. Selecting this option displays the screen in Figure 34-4.

The Table Calculation Editor
Figure 34-4. The Table Calculation Editor

To reflect how we want to process the Rank calculation, in the Table Calculation Editor, we need to change the Specific Dimensions Tableau Desktop is using for the calculation. The rank of the selected dimensions will be calculated for every combination of the dimensions that are unselected. Figure 34-5 shows the results of the updated calculation.

Updated results for the Rank Table Calculation
Figure 34-5. Updated results for the Rank Table Calculation

Table Calculations are extremely useful because they update as additional dimensions are brought into the view. However, as you’ve seen, creating them in Tableau Desktop is a multistep process and also requires some background knowledge to be able to make them work as required. So, as mentioned earlier, precalculating this metric in Prep Builder can save users from having to do it themselves, although they will lose some of the flexibility of updating the calculation as other dimensions are brought into the view.

Applying Table Calculation Logic in Prep Builder

To save Desktop end users the trouble, you can create the calculation in the data preparation stage, which can also benefit from the addition of analytical calculations. You’ve seen that the analytical process can either begin or be completed in Prep Builder, so having these functions already there can make analysis easier.

The syntax for analytical calculations differs from that of other functions found in Tableau Prep:

{ PARTITION [Dimension] : {ORDERBY [Measure] : ANALYTICAL_CALC( ) }}

Let’s explore each option listed in the Analytic Reference menu within the Calculation Editor (Figure 34-6).

The analytical calculation options
Figure 34-6. The analytical calculation options

Keywords

The Analytic Reference menu includes the following keywords:

PARTITION
Like partitions in Table Calculations, this option allows you to restart an analytical calculation if required (for example, ranking each subcategory’s sales but restarting the rank for each category).
ORDERBY and ASC/DESC
This dictates the order in which the analytical calculation will be processed. The ORDERBY keyword sorts the data, and ASC and DESC specify ascending or descending, respectively.

Analytical Calculations

The analytical calculations are:

RANK()
This function returns the standard competition rank (for each rank, we’ll use the example values 3, 7, 7, 10, so this would return 1, 2, 2, 4).
RANK_DENSE()
This function returns the dense rank (1, 2, 2, 3).
RANK_MODIFIED()
This option returns the modified competition rank (1, 3, 3, 4).
RANK_PERCENTILE()
This option returns the percentile rank (0.25, 0.75, 0.75, 1).
ROW_NUMBER()
This function returns a row number.

Unlike Desktop, RANK() in Prep Builder does take into account nulls. The analytical calculation functions fall into a subset of Tableau calculations that do not require anything within their parentheses, along with NOW(), TODAY(), and PI().

Now, with this introduction to the analytical calculations behind us, let’s look more closely at how they are used in practice.

The ranking functions

Within Desktop, the ranking functions are used for many purposes, such as determining the best and worst performers in a data set, and setting limits on data sets (e.g., the Top 100).

To simplify this example, let’s look at total sales for each subcategory (Figure 34-7).

Calculating total sales in an Aggregate step
Figure 34-7. Calculating total sales in an Aggregate step

After the sales aggregation, let’s rename the Sales field to Sub-Category Sales (Figure 34-8).

Resulting data set from #table_calculation_menu_in_tableau_deskt
Figure 34-8. Resulting data set from Figure 34-2

To form a rank calculation, you must specify how to rank the values. In this case, setting the ranking along Sub-Category Sales using the ORDERBY() keyword will rank the values from 1 upward based on the size of the values from largest to smallest (Figure 34-9).

Creating a rank based on the Sub-Category Sales field
Figure 34-9. Creating a rank based on the Sub-Category Sales field

The result of the calculation is 1 through 17, as there are 17 Sub-Category Sales values (Figure 34-10).

Resulting data set after creating the rank
Figure 34-10. Resulting data set after creating the rank

The default ordering of the calculation is from largest to smallest, or descending. You can set this option by including DESC in the ORDERBY calculation. To reverse the rank, you’d include ASC (ascending) instead (Figure 34-11).

Creating an ascending rank
Figure 34-11. Creating an ascending rank

The ROW_NUMBER() function

The Row_Number() function allows you to assign a value to each row based on a direction specified in the ORDERBY keyword. The default order is descending (largest to smallest, or latest to earliest), so you’d need to add an asc to make the row number increase instead (Figure 34-12).

Setting up a row number to increase over time
Figure 34-12. Setting up a row number to increase over time

This increments each row number as the Order Date values increase (Figure 34-13).

Result of the row number calculation
Figure 34-13. Result of the row number calculation

The ROW_NUMBER() function can help you sort data when doing more complicated cleaning tasks to make it easier to analyze.

Use Cases

In Prep, there are numerous use cases that can benefit from analytical calculations. This section will cover just a few examples.

Filtering for the Top N

In earlier versions of Prep Builder, creating “Top N” filters was quite the challenge. Ranking data involved a complicated self-join with tricky join conditions. Now, the rank functionality allows you to easily compile “Top N” lists by filtering. Let’s return to the Sub-Category Sales rank from earlier and filter it down to just the Top 5 Sub-Categories by Sales (Figure 34-14).

Calculating a Top 5 rank
Figure 34-14. Calculating a Top 5 rank

This returns just the Top 5 values (Figure 34-15).

Resulting data set after applying the Top 5 filter
Figure 34-15. Resulting data set after applying the Top 5 filter

Filtering Out a Percentage of Data

Using a similar technique as the Top 5 ranking, you can set how much of a data set you wish to return (based on a value in the data). This is a useful technique if the volume of data, in this case Sub-Categories, will increase over time, but you still wish to return the same-size sample (Figure 34-16).

Calculating a rank to return a percentile
Figure 34-16. Calculating a rank to return a percentile

With this rank calculation, the values returned are between 0 (largest value—in this case, highest sales) and 1 (lowest). The resulting Rank Percentile field can be used in a filter to return the highest 50% of subcategories (Figure 34-17).

Setting the filter to return the largest half
Figure 34-17. Setting the filter to return the largest half

For the 17 subcategories in the Superstore data set, this filter returns 9 subcategories (Figure 34-18).

Data set returned from the filter in #setting_the_filter_to_return_the_larges
Figure 34-18. Data set returned from the filter in Figure 34-17

The RANK_PERCENTILE() calculation doesn’t have to be used just as a filter. You could also use it to set groupings in mailing lists based on the open rate or frequency of customers ordering.

Summary

Analytical calculations enable not just easier data preparation but also deeper analysis before the data is output to a visualization tool. Not everyone has the luxury of super-fast computing power, so any work you can save the visualization software from having to process will help end users find the answers to their questions more quickly. Analytical calculations are not just for making the visualization work easier, however. Both the ranking and row number functions allow you to build more advanced cleaning logic and, therefore, to tackle harder challenges in Prep without having to call on ad hoc programming scripts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset