Loading data and creating a data model

In order to create an example application, I've downloaded a dataset from the Center for Machine Learning and Intelligent Systems at the University of California, Irvine. They have a dataset repository you can use for training purposes. The datasets are organized by task (clustering, classification, regression, and others), by attribute type, by domain area, and so on. This is a very useful resource to practice your new skills and we'll be using it again in this book.

Note

You can find more information from Bache, K. and Lichman, M. (2013); UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]; Irvine, CA: University of California, School of Information and Computer Science.

In this chapter, we're going to use a dataset called Wholesale customers Data Set. The dataset is originated from a larger database – Abreu, N. (2011); Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional; Mestrado em Marketing, ISCTE-IUL, Lisbon. You can find the dataset on this page:

https://archive.ics.uci.edu/ml/datasets/Wholesale+customers#

The dataset contains 440 customers (observations) of a wholesale distributor. It includes the annual spending in monetary units on diverse product categories. The columns are explained as follows:

  • Fresh: annual spending (per 1,000) on fresh products
  • Milk: annual spending (per 1,000) on milk products
  • Grocery: annual spending (per 1,000) on grocery products
  • Frozen: annual spending (per 1,000) on frozen products
  • Detergents_Paper: annual spending per 1,000) on detergents and paper
  • Delicatessen: annual spending (per 1,000) on delicatessen products
  • Channel: Horeca (value = 1) or Retail (value = 2)

    Note

    In the food industry, Horeca stands for Hotel, Restaurant, or Café, so a business that prepares and serves food.

  • Region: Lisbon (value = 1), Porto (value = 2), or Other (value = 3)

In the following screenshot, you can see what the data looks like:

Loading data and creating a data model

An important difference between Rattle, or R, and Qlik Sense is that in Rattle, generally, our dataset is a simple table. Using Qlik Sense, we can easily work with more complex data models. Working with more complex data models allows us to discover hidden relationships. In this example, we have a table with customer data; if we're able to link the customer information with a salesperson or shipping information, our analysis would be richer.

In Qlik Sense Desktop, we have two ways to load data:

  • We can use the Data load editor option
  • We can use the Quick data load option

With the Quick data load option, you can load data by just dragging and dropping data files, but if you want to transform data into Qlik, you need to do it using the Data load editor option. We'll create our data model using only the Quick data load option, but we'll also see how we can do the same work using the Data load editor option.

Preparing the data

We're going to create a very simple data model.

Our data model has three tables. The main table, Customers, is the dataset we've downloaded. We also have two additional tables, Channel and Region. We're going to use the tables to convert from codes to descriptions; the value 1 in the field Region means Lisbon.

The original dataset contains six product categories – Fresh, Frozen, Milk, Grocery, Delicatessen, and Detergents_Paper. The six product categories can be grouped into two main categories – Food (Fresh, Milk, Grocery, Frozen, and Delicatessen) and Detergents_Paper. We'll create two new columns in the dataset called Food and Total_Spent. The new variable Food will contain the sum of Fresh, Milk, Grocery, Frozen, and Delicatessen. The new variable Total_Spent will contain the total annual expenditure for each customer. We can create the new fields in two ways. We can use a spreadsheet tool to create these new columns and set it as a CSV file to save the resulting data. We can also use the Data load editor to do it. In this example, we'll use the Data load editor to create these two variables and a third variable called Customer_ID. The variable Customer_ID will be a unique identifier for each customer.

In the original dataset, we need six values to represent the annual expenditure of a customer. It's hard to represent six variables in a two-dimensional chart. In the modified dataset, we can use two values—Food and Detergents_Paper—to summarize the six original values. In this way, we can graphically represent the annual expenditure of a customer by a point in a plane. This is a trick to see your customers in an easy way.

You can see our data model in the following diagram:

Preparing the data

In the preceding diagram, we can see that the three tables are associated. To associate two tables, Qlik Sense only needs to find two fields with the same name. If Qlik Sense finds a field called Channel in the Customers table and a field with the same name in the Channel table, its associative engine assumes that the two fields mean the same and associates the tables.

Create two CSV or Excel files containing the following two smaller tables. These tables will have the common columns Channel and Region when compared with the main Customers table:

Preparing the data

Open Qlik Sense Desktop and in the pop-up window, select Create new app, name it, and open the new application.

A Qlik Sense application has a main menu. We'll use this menu to move between the App overview, Data load editor, Data model viewer, and Open hub options, as shown here:

Preparing the data

Drag the customers file you have modified and drop it over your new application. A window showing the data you are going to load will appear; simply click on the Load data button. After these instructions are followed, you will see a screen similar to this:

Preparing the data

A pop-up window will inform you that the data has been loaded. Close the window and load the Channel and Region files (the two smaller CSV files that we created earlier). Every time you try to add a new data file, Qlik Sense will ask you if you want to replace or add data; choose Add data as shown in the following screenshot:

Preparing the data

Now we can review our work. Open the Data model viewer option. Qlik Sense opens the data model we've just built.

Check the data model we've just created. Close to the fields Channel and Region, there is an icon representing a key. I've circled the icon to make it easier to identify. This icon means that Qlik uses this field to associate a table with another one, as shown in the following screenshot:

Preparing the data

To load this data we've used the Quick data load functionality. This functionality is only present in Qlik Sense Desktop, and is not present in the server version of the product. Qlik Sense, as a platform not as a personal tool, focus on data governance. For an analytic tool data governance are mechanisms to ensure the data loaded in the system meets the organization's standards. For this reason this functionality is not present in in Qlik Sense.

Qlik Sense Desktop Quick data load functionality has done a lot of work for us. In order to understand what happened, we'll review the Data load editor and we'll use it to create the three new variables.

Look at the left-hand side vertical bar; Qlik Sense has created a sheet for every file that we've loaded. Look at this sheet and you will find a LOAD sentence like the one shown in the following screenshot:

Preparing the data

This LOAD sentence is very useful; if you want, you can modify data here before loading it. As we've said before, we want to add three new columns to this table – Customer_ID, Food, and Total_Spent. We can do this simply by adding the columns to the spreadsheet or by using the Qlik Data load editor option. In this example, we're going to use the Data load editor option.

Change the code as shown in the following screenshot and click on the Load data button. Qlik Sense will reload all data files. With this code, you've calculated the new field in Qlik Sense instead of doing it in the CSV file or the Excel spreadsheet:

Preparing the data

The first line adds a number and labels it as Customer_ID. The function RowNo() returns the number of the row. In this dataset, each row is a different customer, so with this line of code, we'll add an identifier to each customer. The last two lines add the new variables Food and Total_Spent.

Open the Data model viewer option again; now you can see the three fields that we've just created. This is shown in the following screenshot:

Preparing the data

As you've seen, you can create the same data model using your favorite spreadsheet tool and load it to Qlik Sense or use the Data load editor option. The Data load editor option is a powerful Qlik Sense feature. Like in other self-service visualization tools, Qlik Sense has the option to load data without writing a line of code, but you also have a powerful data loading and transformation tool. Personally, I prefer the Data load editor option because it provides me with precise control over my data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset