Chapter 40. Connecting to Programming Scripts

In Prep Builder version 2019.3, the product’s development team added the ability to call programming scripts, so now we can use functionality in Prep Builder that is not natively available. The Script step (Figure 40-1) allows you to enter Python or R scripts written by others or those you create yourself. You can use these scripts to connect to website APIs or conduct data preparation tasks that are not possible within Prep Builder’s built-in functionality.

The Script step icon
Figure 40-1. The Script step icon

This chapter will look at situations where you might want to connect a script into Prep, explain how to set up your computer to use the script functionality, and walk you through a short example.

When to Use the Script Step in Prep

The simple answer to when you should use the Script step is whenever Prep does not have the functionality you need to achieve your data preparation goals. Programming is a very flexible way to instruct computers to complete the tasks you require. Python and R are taught in many universities due to the number of packages available in the two languages. A package utilizes a programming language to complete common tasks, saving the user from having to write all the instructions from scratch.

Tableau Prep is designed to be easy to use, but the introduction of the Script step sits at the other end of the easy-to-use spectrum for many users. So how might nonprogrammers take advantage of the feature? Find a programmer who can help! This person might be a colleague or a helpful member of the Tableau community. Of course, you need to trust the source before using the script in order to ensure that you are not going to cause any harm with malicious code. The script file can be created once and then reused multiple times. The Tableau community loves to share, so Prep users will be able to take advantage of an ever-growing volume of useful scripts.

One constraint of the Script step is that you will need to install a version of TabPy or Rserve on your computer or a server you can connect to. This may not be possible within your organization, as some companies are protective of what programs can be run on their computers. If you don’t have this constraint, you will be able to use scripts that can potentially save you a lot of time and effort within flows that require advanced techniques.

Setting Up Your Computer to Use Scripts in Prep

If you have the permissions on your computer to set up TabPy or Rserve, follow the instructions in this section to get started. Rather than providing step-by-step instructions for both languages, this chapter will focus on Python, which is becoming the most prevalent scripting language for data preparation in the business world.

To use Python scripts in Prep, first you need to download Python to your computer. Anaconda is free and easy to install from anaconda.com (Figure 40-2).

Anaconda home screen
Figure 40-2. Anaconda home screen

Once Anaconda is installed, you will need to download TabPy. TabPy is hosted on GitHub and shown in Figure 40-3.

TabPy on GitHub
Figure 40-3. TabPy on GitHub

To install TabPy, run the install command shown in Figure 40-4 in either Terminal (Mac) or Command Prompt (Windows).

Downloading TabPy in Terminal
Figure 40-4. Downloading TabPy in Terminal

If installation is successful, TabPy will be available on localhost (your local computer’s address) and you should see the message Web service listening on port 9004 in Terminal/Command Prompt (Figure 40-5).

View in Terminal once TabPy is installed
Figure 40-5. View in Terminal once TabPy is installed

When TabPy is running, you will be able to connect to the instance in Prep Builder using the Script step. The Script step cannot be the first step in your flow, as it relies on reading in a data set from an Input or other preparation step. The Script step involves quite a bit of configuration. Again, we’ll use Python for this example (Figure 40-6).

The Script configuration pane
Figure 40-6. The Script configuration pane

When you initially add the Script step, you’ll see an error. This is because you need to point the Script step to the programming server you want to use and configure the tool correctly. To configure the step, first select the connection type you want to use (TabPy, in this example). Once you’ve made the selection, you will need to connect to the server running that language (Figure 40-7).

TabPy server connection
Figure 40-7. TabPy server connection

The example used here shows the settings from a default install. You do not need to enter a username and password if your instance doesn’t require them. A successful connection will show a light gray message reading “Connection to <server name>”—in this case, localhost:9004—under the Server heading (Figure 40-8).

The next step is to link the script you want to run in Prep Builder. To do this, click Browse and navigate to the .py file you want to run. You can write .py files in many text editors. The final step is picking the function you want Prep Builder to return. This is set in the Function Name section.

Connection made to localhost through port 9004
Figure 40-8. Connection made to localhost through port 9004

Using a Script Step

As this chapter is targeted to Prep Builder novices, let’s go over a simple way to create a row ID that can be used as an ID for analysis. In Prep Builder 2020.1, Prep Builder added the ability to use the ROW_NUMBER() function, but before that, the technique we’ll cover here was the easiest way to add a row ID to a data set where it didn’t exist. You’ll see that adding a data field to an input is easier to do with the Python tool than by adding more lines of code.

The data set for this example is Superstore, the demo data set that comes with every install of Tableau Desktop or Prep. The data set is added into the Datasources folder of your My Tableau Prep Repository, a set of files added to your computer’s My Documents or Documents folder. In this example, we’ll add a new data field to Superstore called New Index that could be used as a reference with an ID per product, per order, for the Superstore.

After connecting to Superstore, add a Clean step to create a Calculated Field that will host the values affected by the script. Create a Calculated Field containing just the value 1 and call it “New Index” (Figure 40-9).

Dummy field to be populated by the Python script
Figure 40-9. Dummy field to be populated by the Python script

The index script used in this example requires the field in question to be an integer, so ensure the data type is correct. You can build the script to run in the scripting tool, but it will need to be stored as a .py script somewhere that Prep Builder can access. This particular script creates a function called Set_Index() that alters New Index by creating an index starting at 0, adding 1,000, and then returning the values as a data frame. The Set_Index() function has to be the function called in the Function Name setting at the bottom of the Script configuration pane. Figure 40-10 shows the full script.

Python script to form the New Index field
Figure 40-10. Python script to form the New Index field

Data frames are the values returned from the script into the Prep flow. If you are creating new data fields, you will need to define the data frame being returned. This takes additional coding, which we avoid in this example by creating the dummy New Index field to be overwritten by the Script step. If you do require additional data fields, you will need to add the schema function getOutputSchema() to your script. In this example, the output of the Script step is an updated New Index field that starts from 1,000 (Figure 40-11).

Field updated by the Python script
Figure 40-11. Field updated by the Python script

Summary

Prep Builder opened up its capabilities dramatically with the introduction of the Script step. Despite the feature’s greater programming complexity, it offers significant flexibility in the data preparation process. Still, using programming techniques will not be suitable for every user of Prep Builder. Like any skill, scripting takes time to learn and develop, but it can supercharge your data preparation abilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset