Nesting aggregations

Bucket aggregations split the context into one or more buckets. We can restrict the context of the aggregation by specifying the query element, as we saw in the previous section.

When a metric aggregation is nested inside a bucket aggregation, the metric aggregation is computed within each bucket. Let's go over this by considering the following question, which we may want to get an answer for:

What is the total bandwidth consumed by each user, or a specific customer, on a given day? 

We have to take the following steps:

  1. First, filter the overall data for the given customer and for the given day. This can be done using a global query element of the bool type.
  2. Once we have the filtered data, we will want to create some buckets per user.
  3. Once we have one bucket for each user, we will want to compute the sum metric aggregation on the total usage field (which includes upload and download).

The following query does exactly this. Please refer to the annotated numbers, which correspond to the three main objectives of the the following query:

GET /bigginsight/usageReport/_search?size=0
{
"query": { 1
"bool": {
"must": [
{"term": {"customer": "Linkedin"}},
{"range": {"time": {"gte": 1506257800000, "lte": 1506314200000}}}
]
}
},
"aggs": {
"by_users": { 2
"terms": {
"field": "username"
},
"aggs": {
"total_usage": { 3
"sum": { "field": "usage" }
}
}
}
}
}

The thing to notice here is that the top level by_users aggregation, which is a terms aggregation, contains another aggs element with the total_usage metric aggregation inside it.

The response should look like the following:

{
...,
"aggregations": {
"by_users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 453,
"buckets": [
{
"key": "Jay May",
"doc_count": 2170,
"total_usage": {
"value": 6516943
}
},
{
"key": "Guadalupe Rice",
"doc_count": 2157,
"total_usage": {
"value": 6492653
}
},
...
}

As you can see, each of the terms aggregation buckets contains a total_usage child, which has the metric aggregation value. The buckets are sorted by the number of documents in each bucket, in descending order. It is possible to change the order of buckets by specifying the order parameter within the bucket aggregation.

Please see the following partial query, which has been modified to sort the buckets in descending order of the total_usage metric:

GET /bigginsight/usageReport/_search
{
...,
"aggs": {
"by_users": {
"terms": {
"field": "username",
"order": { "total_usage": "desc"}
},
"aggs": {
...
...
}

The highlighted order clause sorts the buckets using the total_usage nested aggregation, in descending order. 

Bucket aggregations can be nested inside other bucket aggregations. Let's considering this by getting an answer to the following question:

Who are the top two users in each department, given the total bandwidth consumed by each user?

The following query will help us get that answer:

GET /bigginsight/usageReport/_search?size=0
{
"query": { 1
"bool": {
"must": [
{"term": {"customer": "Linkedin"}},
{"range": {"time": {"gte": 1506257800000, "lte": 1506314200000}}}
]
}
},
"aggs": {
"by_departments": { 2
"terms": { "field": "department" },
"aggs": {
"by_users": { 3
"terms": {
"field": "username",
"size": 2,
"order": { "total_usage": "desc"}
},
"aggs": {
"total_usage": {"sum": { "field": "usage" }} 4
}
}
}
}
}
}

Please see the following explanation of the annotated numbers in the query:

  • This is a query that filters the specific customer and time range.
  • The top-level terms aggregation to get a bucket for each department.
  • The second-level terms aggregation to get the top two users (note that size = 2) within each bucket.
  • The metric aggregation that has the sum of usage within its parent bucket. The immediate parent bucket of the total_usage aggregation is the by_users aggregation, which causes the sum of usage to be calculated for each user.

This is how we can nest bucket and metric aggregations to answer complex questions in a very fast and efficient way, regarding big data stored in Elasticsearch.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset