Histogram aggregation

Histogram aggregation can slice the data into different buckets based on one numerical field. The range of each slice, also called the interval, can be specified in the input of the query.

Here, we have some records of network traffic usage data. The usage field tells us about the number of bytes that are used for uploading or downloading data. Let's try to divide or slice all the data based on the usage:

POST /bigginsight/_search?size=0
{
"aggs": {
"by_usage": {
"histogram": {
"field": "usage",
"interval": 1000
}
}
}
}

The preceding aggregation query will slice all the data into the following buckets:

  • 0 to 999: All records that have usage >= 0 and < 1,000 will fall into this bucket
  • 1,000 to 1,999: All records that have usage >= 1,000 and < 2,000 will fall into this bucket
  • 2,000 to 2,999: All records that have usage >= 2,000 and < 3,000 will fall into this bucket

The response should look like the following (truncated for brevity):

{
...,
"aggregations": {
"by_usage": {
"buckets": [
{
"key": 0.0,
"doc_count": 30060
},
{
"key": 1000.0,
"doc_count": 42880
},
{
"key": 2000.0,
"doc_count": 42041
},
...
}

This is how the histogram aggregation creates buckets of equal ranges by using the interval specified in the query. By default, it includes all buckets with the given interval, regardless of whether there are any documents in that bucket. It is possible to get back only those buckets that have at least some documents. This can be done by using the min_doc_count parameter. If specified, the histogram aggregation only returns those buckets that have, at the very least, the specified number of documents.

Let's look at another aggregation, range aggregation, which can be used on numerical data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset