Overview
In this chapter, you will learn about Visual Analytics and why it is important to visualize your data. You will connect to data using Tableau Desktop and familiarize yourself with the Tableau workspace. By the end of this chapter, you will be well acquainted with the Tableau interface and some of the fundamental important concepts that will help you get started with Tableau. The topics that are covered in this chapter will mark the start of your Tableau journey.
At a very broad level, the whole data analytics process can be broken down into the following steps: data preparation, data exploration, data analysis, and distribution. This process typically starts with a question or a goal, which is followed by finding and getting the relevant data. Once the relevant data is available, you then need to prepare this data for your exploration and analysis stage. You might have to clean and restructure the data to get it in the right form, maybe combine it with some additional datasets, or enhance the data by creating some calculations. This stage is referred to as the data preparation stage. After this comes the data exploration stage. It is at this stage that you try to see the composition and distribution of your data, compare data, and identify relationships if any exist. This step gives an idea of what kind of analysis can be done with the given dataset.
Typically, people like to explore the data by looking at it in its raw form (that is, at the data preparation stage); however, a quick and easy way to explore the data is to visualize it. Visualizing the data can reveal patterns that were difficult to recognize in the raw data.
The data exploration stage is followed by the data analysis stage, in which you analyze your data and develop insights that can be shared with others. These insights, when visualized, will enable easier interpretation of data, which in turn leads to better decision making. In very simplistic terms, the process of exploring and analyzing the data by visualizing it as charts and graphs is called "visual analytics." As mentioned earlier, the idea behind visualizing your data is to enable faster decision making. Finally, the last step in the data analytics cycle is the distribution stage, wherein you share your work with other stakeholders who can consume this information and act upon it.
In this chapter, we will discuss all these topics in detail, starting with a further exploration of the value of the titular process.
As mentioned earlier, "Visual Analytics" can be defined as the process of exploring and analyzing data by visualizing it as charts and graphs. This enables end users to quickly consume the information and, in turn, empowers them to make quicker and better decisions.
In this section, you will learn why data visualization is a better tool for evaluation than looking at large volumes of data in numeric format.
All of us have at some point heard the expression "A picture is worth a thousand words." Indeed, it has been found that humans are great at identifying and recognizing patterns and trends in data when consumed as visuals as opposed to large volumes of data in tabular or spreadsheet formats.
To understand the importance and the power of data visualization/visual analytics, let's look at one of the classic examples: Anscombe's Quartet. Anscombe's quartet is comprised of four distinct datasets with nearly identical statistical properties, yet completely different distributions and visualizations.
Note
This was developed in 1973 by an English statistician named Francis John (Frank) Anscombe, after whom it was named.
Let's take a deeper look at these datasets.
As you can see in the preceding figure, each dataset consists of 11 X and Y points. Now, if you were to analyze these datasets using typical descriptive statistics such as mean, standard deviation, and correlation between X and Y, you would see that the output is identical.
Looking at the preceding figure, you can see the following:
So, by looking at the above statistical inferences, you would assume that these datasets are identical until you decide to visualize each of them, the results of which are displayed below.
The images show how these datasets appear when visualized as graphs. Now, let's compare each of these visualizations side by side so that you can see how different each of these datasets really are.
The preceding example highlights how data visualization can help uncover patterns in data that it was not possible to see by simply looking at the numbers and/or just analyzing the data statistically. This is exactly why Francis Anscombe created his "quartet." He wanted to counter the argument that "numerical calculations are exact, but graphs are rough," which, back then, was a quite common impression among statisticians.
Next, take a look at one more example of how visualizing data helps us find quick insights. Refer to the following figure:
In the preceding figure, you can see a grid view of fields such as Product Type, Product, Market, Marketing, and Profit. In the data that you have used, Marketing is the money that is spent on any marketing efforts to promote products, and Profit is the profit generated after those marketing efforts. Further, these values are broken down by dimensions such as Product Type, Product, and Market. The idea is to evaluate how each product is doing in terms of Marketing and Profit across different markets.
Now, displaying this information in a grid format, as shown above, results in 84 numbers being shown in the view, and doing any kind of comparison across these 84 numbers is going to be very difficult. So, imagine you want to find out whether there are any products in any specific markets where losses are made even after spending significant money on the marketing efforts. Then you will end up comparing these numbers horizontally as well as vertically, which, honestly, is a bit tedious. However, let's see whether visualizing this data makes any difference. Refer to the following figure:
In the preceding figure, you can see that the length of the bar is the money spent on Marketing, whereas the color of the bar represents the Profit value. So basically, the longer the bar, the more money was spent on marketing; the darker the shade of blue, the more profitable the product; and the darker the shade of orange, the greater the loss accrued.
Looking back at that figure, note that the longest bar is Caffe Mocha in the East market. This means that Caffe Mocha has the highest marketing spending, but because the color of that bar is orange, you also know that it is accruing a loss.
This is another example that demonstrates the power of data visualization.
Now that you have understood what visual analytics is and why it is important, let's look at some data visualization tools in the next section.
There are a lot of tools available on the market offering various features and functionalities that you can use to visualize your data. When it comes to business analytics and data visualization, Tableau is one of the leading tools in this space because of its ease of use and drag and drop functionality, which makes it easier even for a business user to start making sense of their data. Tableau has different tools for different purposes, available in the Tableau product suite, which we'll explore in this section.
The entire suite can be divided into three parts: data preparation, data visualization, and consumption or distribution. Refer to the following figure:
As shown in the preceding figure, you have Tableau Prep in the Data Preparation layer, which is used for cleaning, combining, reshaping, and enhancing your data. This tool helps get your data ready for analysis and visualization.
Now, once your data is ready and is in the right form and structure, you will start analyzing and visualizing it. For this purpose, you will use either Tableau Desktop or Tableau Public.
Tableau Desktop is where you create your visualizations, analytics, and dashboards. This is typically the tool you would spend your time on as most of your development is done using Tableau Desktop. Tableau Public can also be used for creating your analytics and visualizations. However, the catch here is that you cannot save your work locally or offline, and it will necessarily be saved to a Tableau Public server, which can be viewed by anybody. Tableau Public is a free version that is like Tableau Desktop and is typically used by bloggers, journalists, researchers, and so on who deal with public or open data.
Tableau Public is a great tool for anyone wanting to build visualizations for public consumption but is not recommended for anyone working with confidential data. When dealing with confidential data, it is best to use Tableau Desktop.
Once you are done building your visualizations, you can share your work with others using an online methodology with Tableau Server or Tableau Online or share an offline copy of your work, which can then be opened using Tableau Reader.
Tableau Server is an on-premises hosted browser and mobile-based collaboration platform used to publish dashboards created in Tableau Desktop and share them with your end users. It allows you to share and, to some extent, edit and publish dashboards, while also managing access rights and making your visualizations accessible securely over the web. It allows you to refresh your dashboards at a scheduled frequency and maintain live data connectivity to the backend data sources, which in turn allows users to consume the up-to-date dashboards online from anywhere. Tableau Server also allows you to view your dashboards on a mobile tablet through an app available on both iOS and Android. Tableau Online, on the other hand, is a cloud-hosted version (or SaaS version) of Tableau Server. It brings the server capabilities of the cloud without the infrastructure cost.
However, if you want to consume dashboards offline, you can use Tableau Reader. This is a free desktop application that can be used to open, view, and interact with dashboards and visualizations built in Tableau Desktop. So basically, it allows you to filter, drill down, view the details of data, and interact with dashboards to the full extent of what the author has intended. That said, being a reader, you cannot make any changes or edit the dashboard in any way beyond what has already been built in by the author.
The upcoming section, as well as the following chapters, will focus on Tableau Desktop. You will be familiarizing yourself with the interface of Tableau Desktop, to understand its workspace and see how you can create your visualizations and build your dashboards.
The point to note here is that Tableau Desktop is a licensed product and if you don't have the necessary license, then you can even use Tableau Public to try out the examples covered in the book. As mentioned earlier, Tableau Desktop and Tableau Public are the two main developer products offered by Tableau and the only difference between these two products is the range of data source connectivity offered, the ability to save files locally, and the security of your work. While Tableau Desktop offers all this, Tableau Public has limitations.
However, the rest of the functionalities and the look and feel of both these tools is the same. The next section explores how to use Tableau to connect, analyze, and visualize your data.
Please note that we are using a licensed version of Tableau Desktop in this book.
Now, that you have identified and chosen Tableau Desktop for the creation of your visuals and dashboards, let's dive deeper into the product, its interface, and its functionality. So, once you have downloaded and installed the product, you will be able to use the products to connect to your data and start building your visualizations.
The landing page of Tableau Desktop is shown in the following screenshot:
Review the following list for explanations of the highlighted sections in the screenshot:
In this exercise, you will connect to a data source for the first time, which is the very first step when analyzing data in Tableau.
There are many types of data sources that you can connect to, but for the purposes of this exercise, you will work with an Excel file—in this case, Sample-Superstore.xls, which comes in-built with Tableau and contains sales and profit data for a company.
Perform the following steps to complete the exercise:
This data is the sample dataset that comes along with the product. Once you have downloaded and installed Tableau Desktop, you will notice the My Tableau Repository folder being created under your Documents folder. This is where you will find this sample dataset.
The Returns table contains the list of all the transactions/orders that were returned. So, again, only two columns: Returned and Order ID. Refer to the following screenshot to take a glance at the Returns table:
The preceding figure shows the view after fetching the Orders worksheet into the Drag sheets here section. Review the highlighted sections in the screenshot and the corresponding notes below to understand more.
Now that you understand the data connection page of Tableau, you can finally start using Tableau to analyze and visualize your data.
The preceding screenshot shows the Tableau workspace. This is the space in which you will create your visualizations going forward. Let's quickly go through the highlighted sections in the screenshot to understand the workspace in more detail.
Now that you are familiar with the workspace of Tableau, you can create your first visualization. To create your views or visualizations, you can either try the manual drag and drop approach or the automated approach of using the Show Me button. Let's explore both of these options.
You will begin with the manual drag and drop approach and then explore the automated approach using the Show Me button in the following exercise.
The aim of this exercise is to create a chart to determine which ship mode is better in terms of Sales by Region using the manual drag and drop method. In this case, you will create one stacked bar chart using the Ship Mode, Region, and Sales fields from the Orders data from Sample - Superstore.xlsx and another by manually dragging the fields from the Data pane and dropping them into the necessary shelves.
Perform the following steps to complete this exercise:
In this exercise, you created a stacked bar chart to show which ship mode is better in terms of Sales across Regions using the manual drag and drop method. As you can see in the preceding screenshot, the Standard Class ship mode seems to be performing best by comparison to other modes.
In the following exercise, you will create another sales comparison chart—but this time with the Show Me button.
The aim of this exercise is to create a chart to determine which Ship Mode is better in terms of Sales by Region using the automated method via the Show Me button. Just like the previous exercise, you will create one stacked bar chart using the Ship Mode, Region, and Sales field from the Orders data of Sample-Superstore.xlsx and another using the Show Me button. You will then compare the resulting charts to determine which mode helps generate the highest sales.
In a new worksheet, perform the following steps to complete the exercise:
Note
You will need to keep the CRTL key pressed while doing multiple selections. Furthermore, if you are on an Apple device, use the Command key instead. Refer to the following link to find the list of equivalent macOS commands and keyboard shortcuts for both Windows and macOS: https://help.tableau.com/current/pro/desktop/en-us/shortcut.htm.
Once you have clicked on the Show Me button, you will see the list of visualizations that are possible with your current selection of fields, that is, two dimensions (Region and Ship Mode) and one measure (Sales). Further, you will also see that the horizontal bar chart is highlighted. The highlighted chart (this is highlighted by Tableau in version 2020.1 with an orangish-brown rectangular border in the following screenshot) is the result of the in-built recommendation engine that is based on the best practices of data visualization.
You now have two options: you can either go ahead with the chart recommended by Tableau, which will create a horizontal bar chart (which is not the aim here), or select some other chart that is available and enabled in the Show Me button (ideally a stacked bar chart like the one that you created in the previous exercise). So, select the chart right next to the recommended one (the one that is highlighted using a black dotted circular border in the preceding screenshot). This is the stacked bar chart option, which is exactly what you wanted.
However, when you go ahead with this option, you see two things that are different from the output that you created in the previous exercise. Firstly, it is a vertically stacked bar chart and not a horizontal one, and, secondly, you have Region in the Color shelf instead of Ship Mode. Refer to the following screenshot:
Now, neither of these things are technically wrong, but they are not what you wanted in this case, and so you will need to change them.
To do this, press CTRL and select Region from the Color shelf as well as Ship Mode from the Rows shelf. Make sure the pills for these selected fields are now darker in color as the dark color indicates that the selection of these fields is retained.
This produces the following output:
In this exercise, you created a stacked bar chart to show which Ship Mode is better in terms of Sales by Regions using the manual drag and drop method. As you can see in the preceding screenshot, the Standard Class ship mode seems to generate more sales compared to the other ship modes.
In an earlier section, you familiarized yourself with the workspace of Tableau and learned how to create a visualization using the manual drag and drop method as well as the automated Show Me button. During the course of this book and across various chapters, you will get into more details of this workspace and learn about some more of the options available in the toolbar as well as the other shelves.
Now that you have some fundamental knowledge of how to create a visualization using the aforementioned methods, you will now explore some concepts of data visualization and how to use these in Tableau Desktop.
Ideally, when you present your analysis and insights, you want your end user to be able to quickly consume the information that you have presented and make better decisions more quickly. One way to achieve this objective is to present the information in the right format. Each chart, graph, or visualization has a specific purpose, and it is particularly important to choose the appropriate chart for answering a specific goal or a business question.
Now, to be able to choose the appropriate chart, you first need to look at the data and answer the question "What is it that you need to do with your data?".
To help you make your decision, consider the following:
Once you have addressed these points and determined what you wish to do with your data, you will also need to decide on the following:
With the help of this list, you will be able to figure out which chart is the most appropriate one to answer your business questions. To elaborate on this point, begin by first categorizing your charts into four sections—namely, charts that help you either compare, determine the composition, show the distribution of your data, or else the ones that help you find relationships in your data.
Comparison, composition, distribution, and relationships are often referred to as the four pillars of data visualization and are described in greater detail here:
Typically, you will see comparison being done across categorical data, that is, data members of a dimension (for example, comparison across regions wherein Region is a dimension, and East, West, North, and South are the data members of that dimension), but it can also be done across quantitative data, that is, across measures (for example, sales versus profit or actual sales versus budget sales).
Another type of comparison that is very common is a comparison over a period of time (for example, evaluating your monthly sales performance or which months are better for your business and whether there are any seasonal trends that you need to look out for).
So, based on the preceding information, you will further break down comparison as comparison across dimensional items or categorical data (for example, region-wise sales), comparison over time, and comparison across measures or quantifiable data (for example, sales versus quota).
The following list outlines the typical charts that should be used for each type of comparison:
Comparison across dimensional items:
Comparison over time:
Comparison across measures:
Typically, you end up showing a static snapshot of the composition of your data (for example, your market share along with the market share of your competitors at a given point in time), or you may also want to trend this information over a period of time (for example, how is your and your competitor's market share changing over a period of time). Both these perspectives are important and can provide some very valuable insights regarding your performance.
So, based on this information, you will further break down composition as composition (snapshot/static) and composition over time.
The following list outlines the typical charts that should be used for each type of composition:
Composition (snapshot/static):
Composition over time:
So, based on this information, you will further break down distribution as distribution for a single measure, and distribution across two measures.
The following list outlines the typical charts that should be used for each type of distribution:
Distribution for a single measure:
Distribution across two measures:
The following list outlines the typical charts that should be used for each type of relationship:
Now that you understand these concepts of Comparison, Composition, Distribution, and Relationships, and which charts to choose for each of these scenarios, you will also try to see how to create these in Tableau. All these abovementioned scenarios and charts are explained in more detail in the upcoming chapters.
Apart from the aforementioned use cases or scenarios, you may also want to explore the geographic aspect of your data (that is, if you have any geographical information in your data). This could mean having data at a country level, state level, city level, or even postal code level. Creating geographic maps to show this geographic data is another way of exploring and visualizing your data since visualizing geographic data on a map can help us highlight certain events or occurrences across geographies and possibly unearth some hidden spatial patterns and or perform proximity analysis.
Note
For more information on choosing the right chart, see the following article: https://www.tableau.com/learn/whitepapers/which-chart-or-graph-is-right-for-you.
Another important point to discuss when working with Tableau is how to save your files and share them with others. As you know, Tableau is an interactive tool that allows users to filter, drill down, and slice and dice data using the features that are provided within the tool. Now, when it comes to saving and sharing your work with others, some people may want their end users to have the flexibility to play with the report and use the interactivity that is provided, while others may simply want end users to have a static snapshot of information that doesn't provide any sort of interactivity. Further, some may want to share the entire dashboard with their end users, while others may only want to share a single visualization.
All these scenarios can be handled in Tableau. The following list will go through these options in detail, breaking them into two parts: static snapshots and interactivity versions:
Static snapshots: The following is the list of options to choose from when you want to save and share a static snapshot of your work:
The Copy > Image option allows you to copy the individual view as an image and then paste it into another application if desired, whereas the Export > Image option lets you directly export the view as an image rather than doing a copy and paste operation.
The preceding screenshots show the options of either copying or exporting just a single worksheet (that is, a single visualization). However, if you wish to save the entire dashboard as an image, then you will use the Dashboard > Copy Image or Dashboard > Export Image option in the toolbar. Refer to the following screenshot:
In the previous section, you explored different options for choosing a static output of your work. In this exercise, you will export or save your work as a PowerPoint export. For this, you will continue using the stacked bar chart of Ship Mode, Region, and Sales that was created in the previous exercise. This exercise will help you see how you can save your analyses as interactive versions and publish these works to different platforms—something you'll need to do fairly often as a Tableau developer.
You will continue working with the Sample Superstore dataset for this exercise.
The steps to accomplish this are as follows:
This will save your output as a .pptx file, which can later be opened in the Microsoft PowerPoint app.
Interactive versions: The following is the list of options to choose from when you want to save and share interactive versions of your work:
.TWB: This is the file extension used to save a file as a Tableau workbook, which is a proprietary file format. .twb is the default file extension when you try to save any of your Tableau workbooks. These .twb files are kind of work-in-progress files that constantly require access to data and, since these require constant connectivity to data, it will not be possible to open the file unless you have Tableau Desktop and access to data that is used for creating this .twb file. So, if you wish to share this .twb file with anyone, you need to make sure they have access to the data; and if not, then the data source file will have to be made available to them. To save the file as .twb, choose the File > Save As option from the toolbar menu.
This will open a new window that allows you to save the file. Make sure to choose the Tableau Workbook (.twb) option. Refer to the following screenshot:
TWBX: This is the file extension used to save the file as a Tableau packaged workbook, which contains the views as well as the copy of the data used for creating those views. Since the copy of the data is bundled along with the views that have been created, it allows the end user to access and interact with the file even when they don't have direct access to the raw data that is being used for analysis.
Further, since the copy of data is bundled along with the views, the data that is seen in the file is not the actual live data but a static snapshot of that data at a given point in time, which can be refreshed as and when required.
To save the file as .twbx, choose the File > Save As option from the toolbar menu. This will open a new window that allows you to save the file. Make sure to choose the Tableau Packaged Workbook (.twbx) option. Refer to the following screenshot:
To save the file as Tableau Packaged Workbook (.twbx), you can even choose the File > Export As Packaged Workbook option from the toolbar menu. Refer to the following screenshot:
In the following exercise, you will learn how to save your work in a packaged Tableau workbook.
In the previous section, you saw different options when it comes to choosing an interactive version of your work. The aim of this exercise is to export or save your work as a Tableau Packaged Workbook (.twbx). For this, you will continue using the stacked bar chart of Ship Mode, Region, and Sales that was created in the previous exercise.
Complete the following steps:
This will save your output as a .twbx file, which can later be opened in Tableau Reader or Tableau Desktop itself.
In the next section, you will practice your new skills by completing an activity using everything that you have learned in this chapter.
In this activity, you will identify and create the appropriate chart to find outliers in your data. The dataset being used has two measures—namely, Profit and Marketing. Marketing refers to the money being spent on marketing efforts, while Profit is the profit that you are making. You need to compare Marketing and Profit across different products and across different markets (so, two dimensions and two measures).
The outliers to be identified are as follows:
You will use the CoffeeChain Query table from the Sample-Coffee Chain.mdb dataset. The data can be downloaded from the GitHub repository of this book, at https://packt.link/MOpmr.
As the name suggests, the dataset contains information pertaining to a fictional chain of coffee shops.
Perform the following steps to complete this activity:
Note
The solution to this activity can be found here: https://packt.link/CTCxk.
In this chapter, you learned the definition and importance of visual analytics and data visualization. You were presented with several points for evaluation when choosing a data visualization tool and explored Tableau's product suite. Having identified Tableau Desktop as the best choice of platform for analyzing and visualizing your data, you looked at how to utilize it to connect to data and familiarized yourself with the Tableau Desktop workspace. You also considered various scenarios for data visualization and identified which charts to use for the given task and learned how to save and share your work with others.
In the next chapter, you will see how to build the various charts that you identified earlier. You will also learn how to prepare your data for analysis using Tableau Prep as well as Tableau Desktop.