Multilevel slicing

The good part is that given a dictionary of dataframes, pd.concat will create a multilevel column index, which will come in handy in a bit. This means, however, that it's not enough now to pass the column name as a string; we need to use multilevel slicing. Let's use an alias:

idx = pd.IndexSlice

Now, if we want to get a specific column in this dataframe, we have to use .loc with this indexing object for columns. The IndexSlice interface is very similar to loc. For one column, we'll use it like this:

df.loc[:, idx['old_metrics', 'url']]

Note that because we defined a specific value on all the levels, the result will be pandas Series. We can, however, relax our query, by using colons: for example, df.loc[:, idx[:, 'killed']] will return a dataframe of two columns—killed for axis and for allies

This multilevel index can be quite handy if we compare multiple attributes of multiple sources or entities—exactly our case. This is the final cleaning operation we're doing. Our dataset is finally ready to be used and analyzed. Before we move on to analysis, though, it is a good and often critical practice to check the quality of the result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset