As you saw in Chapter 26, Tableau Prep’s Group and Replace functionality is pretty incredible with the rich, easy-to-use options that are built into the tool (Figure 28-1).
If you haven’t had a chance to explore these, I recommend both revisiting Chapter 26 and attempting Preppin’ Data 2019: Week 2, where you get to use these techniques on the City field. Do this before continuing to read this chapter if you want to avoid challenge spoilers!
The Tableau developers have added an extra level of validation to data cleaning with data roles.
For string data fields, you can now set a specific data role for Prep to test the data against. You can test geographic roles, email addresses, and URLs against Tableau’s own list to see if they are valid.
When you click on the Data Type icon to change the data type, you will see a drop-down list of options (Figure 28-2). In this example, I’m assigning the City role.
By selecting City to compare the list of city names from the Week 2 exercise, you can see what Prep recognizes as a true city name and what it doesn’t (Figure 28-3).
With this highlighting, you can easily work through all of the problems using Prep’s fantastic Group and Replace functionality.
But what if you want to get rid of those errors? Well, by having an active data role, you actually get a few more options to play with (Figure 28-4).
You can select whether you want to see:
You can also group by the data role members, but I haven’t found a use case where this has helped my data preparation yet. Maybe that will be a future Preppin’ Data challenge.
As of version 2019.3.1, not only can you use Tableau’s data roles, but you can also create a custom data role from any string or integer data fields. The data role will comprise all of the values in the field. To create a custom data role, in a Clean step open the ellipsis menu in the data field you want to use (Figure 28-5).
Custom data roles are held on Tableau Server or Tableau Online, so you’ll need to publish them. Selecting Publish as Data Role will place an output for the data role into the Flow pane (Figure 28-6).
The configuration of the output for the custom data role is similar to that for a data source being published to a Tableau Server or Online location. You must specify the server, site, and project to publish the data role to. Custom data roles can be published only to a single site on the Tableau Server or Tableau Online instance. The site also needs to be the same as the data set’s output location if you are publishing the data source. In the configuration, you can give the custom data role a name as well as a description to help others understand it (Figure 28-7).
Clicking Run Flow publishes the custom data role. Your custom data role and others published can be found in the Explore section of the Tableau Server or Tableau Online instance (Figure 28-8).
You can view the data role by clicking on its name. This reveals the values it includes and gives you the opportunity to edit the description (Figure 28-9).
To be able to use the custom data role, you will need to be signed into the site on the Tableau Server or Online instance in which you published it. Once you are logged in, you can select the custom data role or any of the default data roles available from Prep Builder (Figure 28-10).
If you change Central to North and apply the custom data role Region Data Role, Prep Builder highlights the values that do not match (Figure 28-11).
Custom Data Roles have the same options as the default data roles: Valid, Not valid, or All values (Figure 28-12).
Data roles in Prep Builder are a similar concept to Desktop’s geographical roles but go further by giving you insight into whether data needs to be cleaned. By selecting the “Not valid” option, you can choose to filter or clean those values. Selecting “Valid” can give you the reassurance that the remaining data values meet set criteria. Custom data roles can allow an organization to match the values that are important to it, or those that are otherwise difficult to validate.