Chapter 28. Using Data Roles

As you saw in Chapter 26, Tableau Prep’s Group and Replace functionality is pretty incredible with the rich, easy-to-use options that are built into the tool (Figure 28-1).

Using Group and Replace in Prep to clean string data
Figure 28-1. Using Group and Replace in Prep to clean string data
Note

If you haven’t had a chance to explore these, I recommend both revisiting Chapter 26 and attempting Preppin’ Data 2019: Week 2, where you get to use these techniques on the City field. Do this before continuing to read this chapter if you want to avoid challenge spoilers!

The Tableau developers have added an extra level of validation to data cleaning with data roles.

For string data fields, you can now set a specific data role for Prep to test the data against. You can test geographic roles, email addresses, and URLs against Tableau’s own list to see if they are valid.

How to Use Data Roles

When you click on the Data Type icon to change the data type, you will see a drop-down list of options (Figure 28-2). In this example, I’m assigning the City role.

Setting a data role in Prep, in this case assigning the City role
Figure 28-2. Setting a data role in Prep, in this case assigning the City role

By selecting City to compare the list of city names from the Week 2 exercise, you can see what Prep recognizes as a true city name and what it doesn’t (Figure 28-3).

With this highlighting, you can easily work through all of the problems using Prep’s fantastic Group and Replace functionality.

Results of applying the City data role to the City field
Figure 28-3. Results of applying the City data role to the City field

But what if you want to get rid of those errors? Well, by having an active data role, you actually get a few more options to play with (Figure 28-4).

Controlling the data returned after applying a data role
Figure 28-4. Controlling the data returned after applying a data role

You can select whether you want to see:

All
Will show the results that meet the data role, and those that don’t (which are indicated by an exclamation point)
Valid
Will show only the results that meet the data role.
Not valid
Will show only the results that don’t meet the data role (so you can start cleaning them without being distracted by the valid results).

You can also group by the data role members, but I haven’t found a use case where this has helped my data preparation yet. Maybe that will be a future Preppin’ Data challenge.

Custom Data Roles

As of version 2019.3.1, not only can you use Tableau’s data roles, but you can also create a custom data role from any string or integer data fields. The data role will comprise all of the values in the field. To create a custom data role, in a Clean step open the ellipsis menu in the data field you want to use (Figure 28-5).

Creating a custom data role in the ellipsis menu
Figure 28-5. Creating a custom data role in the ellipsis menu

Custom data roles are held on Tableau Server or Tableau Online, so you’ll need to publish them. Selecting Publish as Data Role will place an output for the data role into the Flow pane (Figure 28-6).

Data role output icon
Figure 28-6. Data role output icon

The configuration of the output for the custom data role is similar to that for a data source being published to a Tableau Server or Online location. You must specify the server, site, and project to publish the data role to. Custom data roles can be published only to a single site on the Tableau Server or Tableau Online instance. The site also needs to be the same as the data set’s output location if you are publishing the data source. In the configuration, you can give the custom data role a name as well as a description to help others understand it (Figure 28-7).

Custom data role output configuration pane
Figure 28-7. Custom data role output configuration pane

Clicking Run Flow publishes the custom data role. Your custom data role and others published can be found in the Explore section of the Tableau Server or Tableau Online instance (Figure 28-8).

The custom data role in Tableau Server
Figure 28-8. The custom data role in Tableau Server

You can view the data role by clicking on its name. This reveals the values it includes and gives you the opportunity to edit the description (Figure 28-9).

Detailed view of the data role in Tableau Server
Figure 28-9. Detailed view of the data role in Tableau Server

To be able to use the custom data role, you will need to be signed into the site on the Tableau Server or Online instance in which you published it. Once you are logged in, you can select the custom data role or any of the default data roles available from Prep Builder (Figure 28-10).

Selecting a custom data role for use within Prep Builder
Figure 28-10. Selecting a custom data role for use within Prep Builder

If you change Central to North and apply the custom data role Region Data Role, Prep Builder highlights the values that do not match (Figure 28-11).

Values not matching the custom data role are highlighted
Figure 28-11. Values not matching the custom data role are highlighted

Custom Data Roles have the same options as the default data roles: Valid, Not valid, or All values (Figure 28-12).

Options for returning data role’s matching values
Figure 28-12. Options for returning data roles’ matching values

Summary

Data roles in Prep Builder are a similar concept to Desktop’s geographical roles but go further by giving you insight into whether data needs to be cleaned. By selecting the “Not valid” option, you can choose to filter or clean those values. Selecting “Valid” can give you the reassurance that the remaining data values meet set criteria. Custom data roles can allow an organization to match the values that are important to it, or those that are otherwise difficult to validate.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset