13. Transforming Data

Excel 2016 adds amazing tools to Edit, Transform & Load data into Excel. The tools available in the Power Query add-in are now part of Excel 2016, rebranded as the Get & Transform group on the Data tab. This chapter introduces some of the tools available in Power Query as well as traditional tools such as Sort, Filter, Remove Duplicates, and Flash Fill.

Using Power Query

The Power Query add-in was created to make it easier to clean ugly data. Suppose that your IT department provides a data set every day that has a number of problems. Rather than wait for the IT department to rewrite the query, you can use Power Query to memorize the steps needed to clean the data. When the IT department provides a new file, you simply refresh the query and Excel repeats all the data cleansing steps.

Figure 13.1 shows an ugly data set. Two different fields are in column A, separated by a comma. The customer column is in uppercase. Columns D through O are a repeating group with various month values going across. To pivot this data, you must unpivot D:O, creating an extra date column and then 12 times as many rows.

Image

Figure 13.1 There are a number of problems in this data set created by the IT department.

Specifying the Data Source

You might need to load data from a web page, from SQL Server, or from any number of data sources. Using the New Query icon on the Data tab, you can specify where the data is located. (See Figure 13.2.)

Image

Figure 13.2 Specify the source of the data to be transformed.

Depending on the source, you will need to browse to the file, provide a connect string to the database, or provide a URL. Power Query shows you a preview of the data. If everything is perfect, you can choose to load the data. If you need to clean the data, choose Edit.

Transforming Data in Power Query

The data appears in a new window called the Query Editor. Ribbon tabs appear for Home, Transform, Add Column, and View.

As shown in Figure 13.3, Power Query is treating the header row as a data row. Use Home, Use First Row as Headers to convert that row to a header row. If you ever had the opposite problem, in which Power Query assumed the first data row is headers, you could open the same drop-down menu and choose Use Headers as First Row.

Image

Figure 13.3 The first edit is to identify the first row as headers.

Column A has both Region and Market in a single column, separated by a comma. Select that column by clicking on the REGION, MARKET header. Use Home, Split Column, Split by Delimiter.

Take a look at the Split by Delimiter dialog in Figure 13.4. This feels like the Text to Columns Wizard. Notice the new setting at the bottom called At the Left-Most Delimiter. This is a brilliant addition to Excel. It ensures that people with three names will get split into only two columns.

Image

Figure 13.4 Split by Delimiter is better than Text to Columns.

After you split a column, the new column names will be REGION, MARKET.1 and REGION.MARKET.2, as shown in Figure 13.5.

Image

Figure 13.5 The data in A is split, but with strange names.

Right-click the first column heading. Choose Rename as shown in Figure 13.6. Specify a name such as Region. Repeat for the second column, renaming it to Market.

Image

Figure 13.6 The right-click menu offers many choices.

If you need to convert a column of uppercase words to proper case, select the column and select Transform, Format, Capitalize Each Word. (See Figure 13.7.)

Image

Figure 13.7 The menu options for converting to proper case, uppercase, or lowercase.

Unpivoting Data in Power Query

It is very common to see data with months or years stretching across the columns. Pivot tables made from this structure are very difficult to use. In the past, fixing the data structure involved repeatedly copying and pasting, or using an obscure trick with Multiple Consolidation Ranges. Power Query makes this process amazingly simple.

In Figure 13.8, the first four columns are selected. Use Transform, Unpivot Columns, Unpivot Other Columns.

Image

Figure 13.8 Select the label columns and choose to unpivot the other columns.

In this example, you go from 81 rows of 16 columns to 972 rows of six columns. As you can see in Figure 13.9, the fifth column is called Attribute and the sixth column is called Value. You can use the Rename function to give these meaningful names, such as Month and Value or Month and Revenue.

Image

Figure 13.9 Unpivoting creates a data set that is easy to pivot.

If there are customers who had no revenue in a certain month, you could open the Filter drop-down menu on the Value column and uncheck 0 values to remove those records.

Correcting a Mistake in Power Query

The Month field is currently a text field using Jun-18 to mean June 2018. You might try a few commands on the Power Query tab to convert this to a date. But the obvious commands do not work. If you choose Transform, Date, Parse, the dates are converted to June 18 of the current year. This is clearly wrong. In the right side of the Query Editor, a list of Applied Steps appears. Click the X next to the last step to remove that step from the query and go back to the previous step.

To solve the current problem, you can use Transform, Replace Values. Change every occurrence of -18 to 1-2018, as shown in Figure 13.10. This will make the dates look like Jun-1-2018.

Image

Figure 13.10 Use Replace Values to convert the incoming data.

After adding the 1 as the day of the month, the Transform, Date, Parse correctly replaces the text date with real dates.

Adding Columns in Power Query

You can add new columns to the data set. The Add Column tab offers the capability to add an index column, duplicate a column, or add a variety of number calculations. In Figure 13.11, a Custom Column calculates a bonus as =if [Revenue]>20000 then .02*[Revenue] else 0.

Image

Figure 13.11 Add a new column to the query.

This syntax is incredibly picky. You must make the if, then, and else lowercase. For column names, insert from the Available Columns list. The syntax checker at the bottom attempts to guide you as you build the expression, but it does so in cryptic ways. For example, if you fail to type else 0, the message says “Token expected.”


Image Tip

You are actually writing the formula in a language known as M. The definitive guide to M has been written by Ken Puls and Miguel Escobar: M is for (Data) Monkey: A Guide to the M Power Language in Excel Power Query (Holy Macro! Books, 2015).


Reviewing the Query

As shown in Figure 13.12, the right side of the Query Editor shows all the transformation steps that you’ve taken so far. You can rearrange steps by dragging them to a new sequence. You can delete a step by using the X icon to the left of any step.

Image

Figure 13.12 A list of transformation steps is saved with the query.

In fact, Power Query is writing an entire program in the M language behind the scenes. Go to View, Advanced Editor to see the M that was generated as you performed the data cleansing steps. Figure 13.13 shows an example.

Image

Figure 13.13 This M language query is written while you use the interface.

Loading and Refreshing the Data

After doing all the steps to clean the data, you can choose Home, Close & Load, as shown in Figure 13.14. You can either load to an Excel worksheet or directly to the Data Model. Obviously, if you are loading more than 1,048,576 records, you will want to load the data to the Data Model.

Image

Figure 13.14 Load the query and query definition to Excel or the Data Model.

Here is the beautiful feature: After you load the data to Excel, you can use Data, Refresh All to have the query go back to the data source, load the current data, and perform all the data cleansing steps automatically.

This is one example of the transformations available in Power Query. There is more functionality, and new functions are added monthly. You can use Power Query to consolidate multiple files. You can use Power Query to load a list of folder contents into Excel.

Cleaning Data with Flash Fill

Suppose that you have data with first names in column A and last names in column B. The names are in uppercase. You would like to reshape the data so you have the full names in proper case.

Add a heading in column C. Type the first name in cell C2. As soon as you type the first letter in the second cell, Excel springs into action and offers to fill the rest of the column for you (see Figure 13.15). Provided the preview looks right or even close, press Enter.

Image

Figure 13.15 Type W in C3 and Excel offers to fill in the rest of the column.

In addition to filling the column, Excel provides two pieces of feedback. First, the status bar in the lower-left corner of the screen indicates that Flash Fill changed a certain number of cells.

Second, a tiny on-grid Flash Fill drop-down icon appears next to the first changed cell. The drop-down offers choices such as Undo and Accept. You can also choose to select all changed cells or all unchanged cells.

Coaching Flash Fill with a Second Example

After Flash Fill operates, look for any cells that don’t fit the pattern. You might have a person with two first names (Mary Ellen Walton) or no last name (Pele). Type a new value in column C and Flash Fill looks for other cells that match that pattern, correcting as it goes.

Flash Fill Will Not Automatically Fill In Numbers

With only 10 digits (in contrast to 26 letters), it is too likely that Excel could detect other patterns that are not the pattern you are intending. When Flash Fill sees a potential pattern, it temporarily “grays in” the suggestion but then removes the suggestion. Press Ctrl+E or click the Flash Fill icon on the Data tab to allow Flash Fill to work.

Flash Fill does not understand mathematical transformations. If the original number is 477 and you type 479 (add 2 to each cell) or 500 (round to the nearest hundred), Excel does not know how to Flash Fill the remaining cells.

Using Formatting with Dates

Dates are particularly troublesome. Suppose that you have a date of birth in column E with the format of YYYYMMDD. If you type 3/5/1970 in G2 and then press the Flash Fill icon, Excel does not correctly recognize the pattern. You get 3/5/ and the first four digits from E in each row. This is an interesting result. You can sort of understand how Excel was tricked into seeing the wrong pattern.

You can solve the date problem by formatting the column to show MM/DD/YYYY first.

Troubleshooting Flash Fill

The following are some tips for making Flash Fill work correctly:

Image There can be no blank columns. It is not necessary to be in the column immediately to the right of the data, but you can’t have any completely blank columns between where you want to Flash Fill and the source data.

Image For the automatic Flash Fill to work, you should type the first value and then immediately type the second value. Do not perform any other commands between the first and second values. Don’t type G2, go to Sheet 3, and then come back and type G3. By then, Flash Fill has stopped watching for patterns. The only exception is sorting. You could type G2, sort, type G3, and Flash Fill will work.

Image Type a heading in the column that you are filling to prevent Flash Fill from filling your heading. You could also bold the other headings. Flash Fill follows the same rules that the Sort dialog and the Ctrl+T Table dialog use to detect whether there are headings. If Ctrl+T opens with the My Data Has Headings box checked, then Flash Fill does not overwrite your headings. This matters more than you might think, because the headings don’t usually follow the pattern of the data and they confuse Flash Fill if it is trying to find a pattern.

Image Pressing Esc makes the Flash Fill preview go away. More than once, I’ve pressed Esc by mistake and lost the Flash Fill. Don’t worry. Type the first one or two cells and then use Ctrl+E or click the Flash Fill icon on the Data tab to force Excel to run Flash Fill again.

Image Flash Fill looks only for patterns. Flash Fill does not understand that AZ is the abbreviation for Arizona. It does not understand that Jan 23 is another way to write 1-23. Flash Fill doesn’t have any opinions. Typing Awesome next to Bruce Springsteen does not cue Flash Fill that you are trying to classify musical acts.

Flash Fill provides an easy way to solve many data problems. Even in the cases where an Excel pro knows a formula that can solve the problem, it is still easier to use Flash Fill.

Sorting Data

Sorting in Excel 2016 is handled in the Sort dialog or by using the AZ and ZA buttons on the Data tab. In all, there are five entry points for sorting:

Image Select the Home tab and then select Editing, Sort & Filter, Custom Sort.

Image Right-click any cell and choose Sort.

Image Select Sort from any filter drop-down.

Image Select the Data tab and then select Sort & Filter, AZ or Sort & Filter, ZA.

Image Open the Sort dialog box by selecting Sort & Filter, Sort on the Data tab.

The Sort dialog in Excel 2016 offers up to 64 different sorting levels. If you get into sorting by color, you often have to specify several rules for one column, so the theoretical number of columns you can sort by is probably fewer than 64.

Sorting by Color or Icon

Excel can sort data by fill color, font color, or icon set. This also works with color applied through conditional formatting or color that you applied by using the cell format icons.

Because color is subjective, there is not a default color sequence. If one column contains 17 colors, you need to set up 17 rules in the Sort dialog just to sort by that one column.

To sort by color, follow these steps:

1. Select a cell within your data.

2. Select the Sort icon on the Data tab. The Sort dialog appears.

3. Select the desired field from the Sort By drop-down.

4. Change the Sort On drop-down to Cell Color.

5. In the Order drop-down, choose the color that should appear first.

6. In the final drop-down, select On Top.

7. To specify the next color, click the Copy Level button at the top of the Sort dialog.

8. Choose the next color in the Order drop-down for the copied rule.

9. Repeat steps 7 and 8 for each additional color.

10. If you want to specify that values in another column should be used to break ties in the color column, select the Add Level button and specify the additional columns.

11. Click OK to sort the data.

Factoring Case into a Sort

Typically, an Excel sort ignores the case of the text. Values that are lowercase, uppercase, or any combination of the two are treated equally in a sort.

You can instead use a case-sensitive sort in Excel 2016 to sort lowercase values before uppercase values. For example, abc sorts before ABC. Similarly, ABc sorts before ABC.

If you want Excel to consider case when sorting, follow these steps:

1. Select a cell within your data.

2. Select the Sort icon on the Data tab. The Sort dialog appears.

3. Choose the column from the Sort By drop-down.

4. Click the Options button. The Sort Options dialog appears.

5. Select the Case Sensitive check box.

6. Click OK to close the Sort Options dialog.

7. Click OK to sort.

Reordering Columns with a Left-to-Right Sort

If you receive a data set from a colleague and the columns are in the wrong sequence, you could cut and paste them into the right sequence, or you could fix them all in one pass by using a left-to-right sort. To do this, follow these steps:

1. Insert a new blank row above the headings.

2. In the new row, type numbers corresponding to the correct sequence of the columns.

3. Make sure that one cell in the range is selected.

4. Select the Sort icon on the Data tab. The Sort dialog appears.

5. Click the Options button. The Sort Options dialog appears.

6. Select Sort Left to Right. Click OK to close the Sort Options dialog.

7. The Sort By drop-down now contains a list of row numbers. Choose the first row.


Image Tip

Excel does not change the original column widths. Select all cells with Ctrl+A and then use Home, Format, AutoFit Column Width to resize all the columns.


8. The remaining drop-downs should already include Values and Smallest to Largest.

9. Click OK to perform the sort.

10. Delete your temporary extra row at the top of the data set. The columns are then resequenced into the desired order.

Sorting into a Unique Sequence by Using Custom Lists

Sometimes company tradition dictates that regions or products should be presented in an order that is not alphabetic. For example, the sequence East, Central, West makes more sense geographically than the alphabetic sequence Central, East, West.

It is possible to set up a custom list to tell Excel that the region sequence is East, Central, West. You can then sort your data based on this sequence. You need to set up the custom list only once per computer. Follow these steps to do so:

1. Go to a blank section of any worksheet. Type the correct sequence for the values in a column.

2. Select this range.

3. Select File, Options. The Options dialog appears.

4. Click the Advanced Group. Scroll down to the General section and then select Edit Custom Lists. The Custom Lists dialog appears.

5. In the Custom Lists dialog, the bottom section shows the range of cells you selected in step 2. If it is correct, click the Import button. Your new list, with the correct sequence, is added to the default custom lists.

6. Click OK to close the Custom Lists dialog. Click OK to close the Options dialog.

7. Clear your temporary data range from step 1.

To use the list with custom sorting, follow these steps:

1. Select one cell in your data.

2. Select the Sort icon on the Data tab. The Sort dialog appears.

3. In the Sort By drop-down, choose the region with the custom sort sequence.

4. From the Order drop-down, select Custom List. You should now be back in the Custom Lists dialog.

5. Click your custom list and then click OK. The Sort dialog shows that the order is based on your custom list.

6. Click OK to sort into the custom sequence.

One-Click Sorting

All the examples discussed so far in this chapter have used the Sort dialog, which is required for left-to-right sorting, custom sorting, and case-sensitive sorting. It also makes color sorting easier. You can accomplish all other sorts by using the AZ buttons on the various tabs.

It is important to select a single cell in the column to be sorted. When you select a single cell, Excel extends the selection to encompass the entire current region. If you select two cells or even the whole column, Excel warns you that it is about to sort part of your data and ignore the adjacent data. This is rarely what you want.

You can find the one-click sorting options on the Home and Data tabs. On the Home tab, they are buried in the Sort & Filter drop-down. On the Data tab, they are clearly visible as AZ and ZA buttons.

You can also find sorting options by right-clicking a cell in the column you want to sort and selecting Sort. Options in this menu enable you to sort in ascending or descending order. You can also put the cell color, font color, or icon on top.

Additional quick-sorting options are located in the Filter drop-downs. You can use these options to sort in ascending order, in descending order, and by color.

Fixing Sort Problems

If it appears that a sort did not work correctly, check this list of troubleshooting tips:

Image If the headers were sorted into the data, it usually means that one or more columns had a blank heading. Every column should have a nonblank heading. If you want the heading to appear blank, use an underscore in a white font to fool Excel. If you cannot insert a heading, you will have to use the Sort dialog.

Image Unhide rows and columns before sorting. Hidden rows are not resequenced in a sort.

Image Use only one row for headings. If you need the headings to appear as if they are taking up several rows, put the headings in one row and wrap the text. To have control over where the text wraps, type the first line, press Alt+Enter, and then type the second line.

Image Data in a column should be a similar type. For example, if you have a column of ZIP Codes, you might have numeric cells for ZIP Codes of 10001 through 99999 and text cells for ZIP Codes of 00001 through 09999. This is one common way to keep leading zeroes. Because text cells are sorted sequentially after numeric cells, sorting the ZIP Codes in this case will appear not to work. To fix this problem, convert the entire column to one data type to achieve the expected results.

Image If your data has volatile formulas or formulas that point to cells outside the sort range, Excel calculates the range after sorting. If your sort sequence is based on this column, Excel accurately sorts the data, based on the information before the recalculation. If the values change after calculation, it will appear that the sort did not work.

Image If your data must have blank columns or rows, be sure to select the entire sort range before starting the sort process.

Discovering Interesting Things in Your Data Using the Quick Analysis

You have some data in Excel. Now what? Print it out? Take it to your manager?

When I worked as a data analyst, I loved numbers and I loved Excel. But the people for whom I worked were not numbers people. If I handed them a page full of numbers, I could see their eyes glazing over. It was my goal to find something interesting in that sheet of numbers and call attention to that one bit of information.

Somehow, walking in to the VP of Sales’s corner office with news like, “Wow, Walmart is up 20% over last year,” gave him a talking point. Rather than just filing the report, he had a bit of news that might stick in his head and come up again in a later conversation. Of course, in my head, that later conversation would go something like, “You should see the analysis that our star employee Bill Jelen gave me today; Walmart is up 20%. We should give Bill a big bonus for discovering that!” In reality, the VP of Sales likely took credit for the discovery on the next sales conference call. But that’s OK. He was the guy who kept me employed, and as long as I kept giving him a steady stream of sound bites, he would keep me from getting a pink slip.

Buried in Excel are many ways to find something interesting. It used to be intimidating to figure out where to start. Home, Conditional Formatting? Insert, PivotTable? Insert, Chart? Insert, Sparklines? Formulas, AutoSum? And after you get there, what do you do? I’ve written entire books about charts. I’ve written entire books about pivot tables. Others have written entire books about Excel formulas. The whole prospect is overwhelming.

How do you figure out where to start analyzing your data in Excel? You can use the Quick Analysis tool.

The Quick Analysis tool, introduced in Excel 2013, is an on-ramp that lets you experiment with all those data tools without having to dive in and read an entire book. It is quick and easy to use, and it gives you a view of what you would get without actually doing anything with the data.

Follow these steps:

1. Select one cell in your data.

2. Press Ctrl+* to select all the data.

3. Look for the Quick Analysis icon at the bottom right of your data. If your data fills more than one screen, the icon appears near the last row or column that fits on your screen.

4. Click the icon and then start to click around.


Image Note

Everything you can do in the Quick Analysis can also be done the old way using commands on the ribbon tabs. The actual command in the ribbon often offers even more choices than the thumbnails offered in the Quick Analysis.


The Quick Analysis appears, with five categories across the top: Formatting, Charts, Total, Tables, and Sparklines. When you select a category from the top, you see anywhere from 4 to 10 icons in the Quick Analysis.

You can hover over any icon in the lens, and Live Preview shows you what the results of using that tool would look like. The results appear either in a Live Preview in the data or in a thumbnail that appears above the Quick Analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset