In all of the previous use cases, we used sum aggregation. We were directly able to use sum without going through the aggregate function of Python. The sum() function that we used is a Cython-optimized implementation. Some other Cython-optimized implementations are mean, std, and sem (standard error of the mean). To implement other functions or a combination of aggregations, the aggregate function comes in handy:
sales_data.groupby("Category").aggregate(np.sum)
The following will be the output:
All the rules discussed in the sections on handling multiple keys and indices are applicable here as well.
Please note that when using multiple keys or Multiindex, the result has a hierarchical ordering in indices. To overcome this, you can use the reset_index attribute of DataFrames:
sales_data.groupby(["ShipMode", "Category"]).aggregate(np.sum)
The following will be the output:
The index of the output can be reset using the following snippet:
sales_data.groupby(["ShipMode", "Category"]).aggregate(np.sum).reset_index()
The following will be the output:
To achieve the same results, in place of reset_index, the as_index parameter of groupby can be set to False:
sales_data.groupby(["ShipMode", "Category"], as_index = False).aggregate(np.sum)
Like the implementation of the sum function, the following is a list of other functions that can be applied to groupby objects:
Function |
Description |
mean() |
Compute mean of groups |
sum() |
Compute sum of group values |
size() |
Compute group sizes |
count() |
Compute count of group |
std() |
Standard deviation of groups |
var() |
Compute variance of groups |
sem() |
Standard error of the mean of groups |
describe() |
Generate descriptive statistics |
first() |
Compute first of group values |
last() |
Compute last of group values |
nth() |
Take nth value, or a subset if n is a list |
min() |
Compute min of group values |
max() |
Compute max of group values |