Chapter 11
In This Chapter
Reusing query steps
Consolidating data with the Append feature
Understanding join types
Using the Merge feature
Data is frequently analyzed in layers, with each layer of analysis using or building on the previous layer. You may not know it, but you already build layers all the time. For instance, when you build a pivot table using the results of a Power Query output, you’re layering your analysis. When you build a query based on a table created by a SQL Server view, you’re also creating a layered analysis.
Sure, you would probably love to be able to analyze a single data source and call it a day. But that’s not how data analysis works. You often find the need to build queries on top of other queries to get the results you’re looking for. That’s what this chapter is all about. In this chapter, I help you examine a few ways you can advance your data analysis by making your queries work together.
Data analysts commonly rely on the same main data tables for all kinds of analysis. Even the simple table shown in Figure 11-1 can be used to create different views: sales by employee, sales by business segment, or sales by region, for example.
Of course, you can build separate queries, each performing different grouping and aggregation steps, but that would mean repeating all the data clean-up steps you needed before performing any kind of analysis.
To get a better understanding of how query steps can help save time, take a moment to follow these steps:
Place the cursor anywhere inside the table, and then choose Data ⇒ From Table.
Power Query opens the Query Editor.
Select the Last_Name and First_Name fields, and then choose Transform ⇒ Merge Columns.
The Merge Columns dialog box appears.
Click the Group By command on the Transformation tab.
The Group By dialog box opens, as shown in Figure 11-3.
The goal is to Group By the Employee field to get the Sum of Sales Amount, as shown in Figure 11-3. Name the new aggregated column Revenue.
At this point, you’ve successfully created a view that shows total revenue by employee. As you can see in Figure 11-4, the query steps include all the preparation work you did before grouping.
What happens if you want to create another analysis using the same data? For instance, what if you want another view that shows Employee sales by business segment?
You could always start from Step 1 and import another copy of the source data, but you’d have to repeat the preparation steps (the steps for Filtered Rows and Merged Columns, in this case).
A better way is to reuse the steps you’ve already created by extracting them into a new query. The idea is to first decide what steps you want to reuse and then right-click the step immediately below it. In this scenario (refer to Figure 11-4), you keep all query steps until Grouped Rows.
Right-click the Grouped Rows step and select Extract Previous.
The Extract Steps dialog box opens.
After you click OK, Power Query does two things:
That is to say, both queries are sharing the extracted steps. You can see the new SalesByBusiness query in the pane on the left, as shown in Figure 11-6.
You now can click on the SalesByBusiness query and start applying any needed transformations. In Figure 11-6, a Group By step has been added to create a view of sales by employee and by business segment.
This concept of extracting steps can be a bit confusing. The bottom line is that instead of starting from square one with a brand-new query, you’re telling Power Query you want to create a new query that uses the steps you’ve already created.
Power Query’s Append feature allows you to add the rows generated from one query to the results of another query. In other words, you copy records from one query and add them to the end of another.
The Append feature comes in handy when you need to consolidate multiple identical tables into one table. For example, if you have tables from the North, South, Midwest, and West regions, you can consolidate the data from each region into one table using the Append feature.
To help you better understand the Append feature, I’ll walk you through an exercise that consolidates data from four different regions into one table. In this walk-through, I use the region data found on four different tabs in the Appending_Data.xlsx sample file, shown in Figure 11-7.
The Append feature works on only existing queries. That is to say, no matter what kind of data sources you have, you need to import them into Power Query before you can append them together. In this case, it means importing all the region tables into queries.
Follow these steps to import the needed base queries:
Go to the North Data worksheet, place the cursor anywhere inside the table, and then choose Data ⇒ From Table.
The Query Editor activates, showing you the contents of the table you just imported.
To finalize the creation of the query, you need to close and load the query. Now, because you’re creating this query simply for the purpose of appending it to other queries, you don’t need to close and load to the workbook. You can choose instead to close and load the data as connection-only.
After you’ve created queries for each region, open the Workbook Queries pane (choose Data ⇒ Show Queries) to see all queries. As you can see in Figure 11-8, each query is a connection-only query.
Now that your data is in queries, you can start appending.
In a perfect world, this section is where you would read about the nifty button that appends all your queries at one time. Unfortunately, Power Query doesn’t have a nifty button that lets you append many tables all in one shot. You can append only one table at a time.
To append data, follow these steps:
On the Home tab of the Query Editor, click the Append Queries command.
The Append dialog box opens.
After all queries have been appended, click the Close & Load command to save the data and exit the Query Editor.
At this point, the NorthData query contains the data for all regions. To see the full consolidated table, you need to change the load destination of the NorthData query to the workbook instead of the connection only.
In the Workbook Queries pane, right-click the NorthData query and select Load To.
The Load To dialog box opens.
Figure 11-10 illustrates the final output. You’ve successfully created a consolidated table of region data.
In your data adventures, you often find the need to build queries that join the data between two tables. For example, you may want to join an employee table to a transaction table to create a view that contains both transaction details and information on the employees who logged those transactions.
In this section, I describe how you can leverage the Merge feature in Power Query to join data from multiple queries.
Similar to VLOOKUP in Excel, the Merge feature joins the records from one query to the records in another by matching on a unique identifier. An example of a unique identifier is Customer ID or Invoice Number.
You can join two datasets in one of several ways. The kind of join you apply is important because it determines which records are returned from each dataset.
Power Query supports six kinds of joins, as described in the following list and shown in Figure 11-11:
To better understand the Merge feature, I’ll walk you through an exercise that merges interview questions and answers. In this walk-through, I use the predefined queries found in the Merging_Data.xlsx sample file available online at www.dummies.com/go/excelpowerpivotpowerqueryfd
.
As you can see in Figure 11-12, two existing queries are in the Workbook Queries pane: Questions and Answers. These queries represent the questions and answers from the interview. The goal is to merge these two queries to create a new table showing questions and answers side-by-side.
Follow these steps to perform the merge:
Choose Data ⇒ New Query ⇒ Combine Queries ⇒ Merge (see Figure 11-13).
This step opens the Merge dialog box.
In this dialog box, you use the drop-down boxes to select the queries you want to merge and then choose the columns that defined the unique identifier for each record. In this case, the InterviewID and QuestionID/AnswerID fields make up the unique identifier for each record.
Click the OK button to finalize and open the Query Editor.
In Figure 11-14, note the small numbers 1 and 2 in the InterviewID and QuestionID fields. These small numbers are assigned based on the order in which you selected them (refer to Steps 3 and 5).
The order in which you selected the unique identifiers in each query matters. The two columns tagged with the small number 1 will be joined regardless of column labels. The two columns tagged with the small number 2 will also be joined.
At the bottom of the Merge dialog box, Power Query shows you how many records from the lower query match the top query, based on the unique identifiers you selected. In this case, about 17,600 answer records match the 26,910 question records.
You don’t need a 100 percent match for the merge to be valid. There might be a good reason that the records in the two queries don’t all match up. In this case, not all questions were answered in all interviews, so the Answers query has fewer records.
With the new merged query open in the Query Editor, click the Expand icon in the NewColumn field and choose the fields you want included in the final output (as shown in Figure 11-15). In this case, just choose the Answer field.
At this point, you can apply more transformations, if needed.
When you’re happy with the way things look, click the Close & Load command to output the results to the workbook.
Figure 11-16 shows the final merged query.
If you need to adjust or correct a merged query, right-click the query in the Workbook Queries pane and select Edit. In the Query Editor, click the Gear icon next to the Source query step, as shown in Figure 11-17. This action opens the Merge dialog box, where the necessary changes can be applied.