As you’ve seen, joining data sets together is a very common task in data preparation. But there is one use of joins that needs highlighting due to its slightly different use—a technique called appending. Within Tableau’s suite of products, appending is often used in Level of Detail (LOD) calculations, even if you don’t realize you are doing it.
This chapter will use LOD calculations to show the benefits of appending values to your data set for easier analysis. LOD calculations can be tricky to learn, so this chapter will walk you through what the LOD calculation is actually doing to the data set to cement your understanding.
Appending is the addition of a data field, or fields, onto an existing data set. This sounds very similar to joining, but we use the term append when adding a field that contains a constant, or set of constants, rather than the more row-to-row relationship that is found within joins.
As Prep Builder is designed to prepare data for Tableau Desktop and Server, many of you likely will be familiar with Level of Detail calculations from using them within Desktop. If not, let’s have a quick recap.
Tableau sets a measure’s level of aggregation by the granular dimension, or combination of dimensions, used within the view of Tableau Desktop. In the example in Figure 33-1, using data from Tableau’s sample data set Superstore, we’ll use an LOD calculation to set the total sales for Category—no matter the view makeup in Desktop—so the percentage of sales within each category can be calculated for a subcategory.
What is being calculated is the percentage of total sales across all categories. We could determine the subcategory percentage of categorical sales with a Table Calculation as well, but it’s not necessary for the data set we’re trying to return (Figure 33-2).
As of version 2020.1.3, you can create LOD calculations in Prep Builder. As mentioned in Chapter 18, an LOD calculation sets a hierarchical level at which an aggregation will be processed. This is applied to just one measure per calculation, compared to the Aggregate step, where the whole data set’s granularity is changed to the specified hierarchical level.
You have two different options in Prep Builder to create an LOD calculation: Custom Calculation or Fixed LOD. Both are accessed from a data field’s ellipsis menu in the Clean step through the Create Calculated Field submenu (Figure 33-3).
Selecting Fixed LOD opens the Visual Calculation Editor, which was shown in Chapter 18. So, this chapter will focus instead on the other option: creating an LOD calculation in the Custom Calculation Editor. The syntax for LOD calculations is:
{ fixed [optional categorical field(s)] : aggregation ( measure ) }
To build the LOD calculation needed to output the data set in Figure 33-2, first we’ll create the field Categorical Sales (Figure 33-4).
This calculation has set a constant sales total for each category. The Profile pane is a good place to check for the effects of the calculation (Figure 33-5).
By selecting a category in the Profile pane, you can see the values in other columns containing that category. In this case, Furniture has a constant Categorical Sales value, but there are still a range of Sales values within the selected rows. Prep Builder is appending the Categorical Sales value to the existing data set, as you can see on a per-row level in the Data pane (Figure 33-6).
Due to the hierarchy of the Superstore data set, a product can belong to only one category. This data set is ready for use within Tableau Desktop. If you want a quick answer for your analysis, you can select Preview in Desktop by right-clicking on any step, and Tableau Desktop will launch with a Hyper file of that data set already connected (Figure 33-7).
To create the set shown earlier in Figure 33-2, divide the Sales value by Categorical Sales data in Desktop (Figure 33-8).
This calculation determines the percentage contribution by each row of data, which is then summed up by the dimension(s) used in the view in Desktop; in this case, that’s Sub-Category, although other members of the product hierarchy could be used instead (Figure 33-9).
By appending the Categorical Sales value to the data set, you save your end users from having to create complex calculations themselves. However, these calculations (as well as any data fields) must be used correctly to avoid incorrect or misleading analysis.
To help describe what the LOD calculation is doing, let’s walk through appending a value to a data set. To make it easier to see what is happening within Prep Builder, we’ll use the same calculation as before, just with the data set aggregated to the Sub-Category level (Figure 33-10).
Using an Aggregate step, group by Category and sum the Sales values (Figure 33-11). This will result in each category having a single Categorical Sales value. Using an Aggregate step removes the other data in the set.
Completing a self-join—joining an earlier stage of the data preparation flow to the output of the Aggregate step—adds the original data at a Sub-Category level back into the flow (Figure 33-12).
Using an inner join with a join condition of Category = Category then adds Categorical Sales back to the original data set (Figure 33-13). With an inner join, you should delete the matching field, now called Category-1, as Prep doesn’t allow duplicate field names.
Level of Detail (LOD) calculations are very useful when analyzing data but can be difficult for end users of the data set to form themselves. LOD calculations work by aggregating a measure and then appending the result to the data set. The LOD calculations formed in Prep Builder can be used in further calculations or exported for use within Tableau Desktop to simplify the analytical process for users.