An example of leveraging data densification

Beyond simple examples of showing empty rows or missing dates, there are cases where you can use data densification to solve problems or get around limitations of the data that would be very difficult otherwise.

Consider, for example, if you had data that indicated dates when certain generators were turned on or off, such as the following, for example:

Generator

Date

Action

A

1/13/2017

On

B

1/22/2017

On

C

1/25/2017

On

D

1/25/2017

On

B

1/27/2017

Off

E

1/29/2017

On

A

1/30/2017

Off

C

1/30/2017

Off

 

What if you wanted a visualization that showed how many generators were On for any given date? The challenge is that the dataset is sparse. That is, there are only records for dates when an On or Off action occurred. The following shows how easy it is to visualize this in Tableau:

We only have eight marks to work with. But we've already seen that Tableau can fill in missing dates, and additionally, we can further force data densification by using certain Table Calculations to fill in a value for each generator for every date.

We'll start with a calculation that takes the human friendly value of On or Off and change it to a value we can easily add. The calculation is named Action value, and is used value with the following code:

IF [Action] = "On" THEN 1 ELSE 0 END 
 

This will give us a 1 to count the generator when it is On, and a 0 otherwise.

An additional calculation, combined with enabling the Show Missing Values of the Date field, allows us to fill in every date with a value. The new calculated field is called Action Value for Date and has the following code:

IF NOT ISNULL(MIN([Action Value]))
THEN MIN([Action Value]) 
ELSE ZN(PREVIOUS_VALUE(MIN([Action Value]))) 
END 

This implements a Table Calculation that we will set to calculate across the table. If MIN([Action Value]) is not null, then we have arrived at a date where the data gives an actual value and we'll keep that. Otherwise, we'll carry forward the PREVIOUS_VALUE() (a 1 if the generator was turned on or a 0 if it was turned off). The ZN() function will turn any null values into 0 (we'll assume the generator is off until we encounter an On). We'll move across the table, carrying forward values until we come across a value present in the data. Then we'll carry that one forward. The result is a table with all dates filled in with values, like this:

Notice that every generator gets a 1 on the date it was turned on and that 1 is carried across the table until the generator is turned off, at which point we get the 0 and carry it across.

We're close, because all we have to do now is sum up the values for all generators for a given date to get the number of generators that were on. For example, on day 13, there was only one. On day 25, there were four.

We can accomplish this using one more calculation that nests our existing calculation. We'll name this calculated field Generators Operating and use the following code:

WINDOW_SUM([Action Value for Date])

The key here is that we want to sum all the values down the table, but we want those values to be calculated across the table. When you use nested Table Calculations (Table Calculations referenced within the code of other Table Calculations), you can specify the scope/direction or addressing/partitioning for each nested calculation.

Here, for example, we'll add the Generators Operating calculated field to Rows and use the drop-down menu to select the Edit Table Calculation option:

Observe that under the Nested Calculations heading, there is a drop-down list where you can change the Compute Using options for each nested calculation. Here, we'll set Generators Operating to Table (down) and Action Value for Date to Table (across).

Our final view, after a bit of a cleanup, will look something like this:

The final view gives us a clear indication of how many generators were on for any given date, even though many of those dates did not exist in the data and we certainly didn't have a record for every generator for every date.

We've cleaned up the view a bit by doing the following:

  • We've added a First() == 0 filter, which is computed Table (down) because we're getting the total sum of action values for Date for each generator and we only need to show one set of totals.
  • We've hidden the column headers for Generator, because the field needs to be in the view to define the view level of detail, but does not need to be shown.
  • We've changed the Marks to Area.

Would it have been easier or better to densify the data at the source by joining every generator with every date to get a record for each combination? Quite likely. If it is possible, given your data source, to perform such a join, you may end up with a dataset that is far easier to work with in Tableau (without having to use data densification or complex Table Calculations).

You'll have to evaluate the feasibility of filling in missing data at the source based on volumes of data, underlying capabilities of the database, and how quickly such transformations can be accomplished. Having an understanding of data densification gives you some options to consider if you don't have the ability to shape the source.

However, with Tableau Prep, the possibilities for reshaping data easily are often within reach. We'll consider some of these possibilities when we examine Tableau Prep in Chapter 10Introducing Tableau Prep.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset