A practical example – filling out missing/sparse dates

Let's say you have a table in a server database (such as SQL Server or Oracle) that contains one row per hospital patient and includes the Admit Date and Discharge Date as separate columns for each patient:

While this data structure works well for certain kinds of analysis, you would find it difficult to use if you want to visualize the number of patients in the hospital day-by-day for the month of December.

For one, which date field do you use for the axis? Even if you pivoted the table so that you had all of the dates in one field, you would find that you have gaps in the data. Sparse data, that is, data in which records do not exist for certain values, is quite common in certain real-world data sources. Specifically, in this case, you have a single record for each Admit or Discharge date, but no records for days in-between.

Sometimes, it might be an option to restructure the data at the source, but if the database is locked down, you may not have that option. You could also use Tableau's ability to fill in gaps in the data (data densification) to solve the problem. However, that solution could be intricate and potentially brittle or difficult to maintain.

An alternative is to use a cross database join to create the rows for all dates. So, you might quickly create an Excel sheet with a list of dates you want to see, like this:

The Excel file includes a record for each date. Our goal is to cross join (join every row from one table with every row in another) the data between the database table and the Excel table. With this accomplished, you will have a row for every patient for every date.

Joining every record in one dataset with every record in another dataset creates what is called a Cartesian product. The resulting dataset will have N1 * N2 rows (where N1 is the number of rows in the first dataset and N2 is the number of rows in the second). Take care in using this approach. It works well with smaller datasets. As you work with even larger datasets, the Cartesian product may grow so large that it is untenable.

You'll often have specific fields in the various tables that will allow you to join the data together. In this case, however, we don't have any keys that define a join. The dates also do not give us a way to join all the data in a way that gives us the structure we want. To achieve the cross join, we'll use a join calculation. A join calculation allows you to write a special calculated field specifically for use in joins.

In this case, we'll select Create Join Calculation... for both tables and enter the single, hard-coded value, that is, 1, for both the left and right sides:

Since 1 in every row on the left matches 1 in every row on the right, we get every row matching every row—a true cross join.

As an alternative, with many other server-based data sources, you can use Custom SQL as a data source. On the Data Source screen, with the Patient Visits table in the designer, you could use the top menu to select Data | Convert to Custom SQL to edit the SQL script that Tableau uses for the source. Alternatively, you can write your own custom SQL using the New Custom SQL object on the left sidebar. 

The script in this alternative example has been modified to include 1 AS Join to create a field called Join with a value of 1 for every row. Fields defined in Custom SQL can also be used in joins:

Based on the join calculation, our new cross-joined dataset contains a record for every patient for every date and we can now create a quick calculation to see whether a patient should be counted as part of the hospital population on any given date. The calculated field, named Patients in Hospital, has the following code:

IF [Admit Date] <= [Date] AND [Discharge Date] >= [Date] 
THEN 1 
ELSE 0 
END 

This allows us to easily visualize the flow of patients, and even potentially perform advanced analytics based on averages, trends, and even forecasting:

Ultimately, for a long-term solution, you might want to consider developing a server-based data source that gives the structure that's needed for the desired analysis. However, cross database joins allowed us to achieve the analysis without waiting on a long development cycle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset