Bucket aggregations

The grouping of documents by a common criteria is called bucketing. Bucketing is very similar to the GROUP BY functionality in SQL. Depending on the aggregation type, each bucket is associated with a criterion that determines whether a document in the current context belongs to the bucket. Each bucket provides the information about the total number of documents it contains.

Bucket aggregations can do the following:

  • Give an employee index containing employee documents
  • Find the number of employees based on their age group or location
  • Give the Apache access logs index, and find the number of 404 responses by country

Bucket aggregation supports sub aggregations, that is, given a bucket, all the documents present in the bucket can be further bucketed (grouped based on criteria); for example, finding the number of 404 responses by country and also by state.

Depending on the type of bucket aggregation, some define a single bucket, some define a fixed number of multiple buckets, and others dynamically create buckets during the aggregation process.

Bucket aggregations can be combined with metric aggregations—for example, finding the average age of employees per age group.

Kibana supports the following types of bucket aggregations:

  • Histogram: This type of aggregation works only on numeric fields and, given the value of the numeric field and the interval, it works by distributing them into fixed-size interval buckets. For example, a histogram can be used to find the number of products per price range, with an interval of 100.
  • Date Histogram: This is a type of histogram aggregation that works only on date fields. It works by distributing them into fixed-size date interval buckets. It supports date/time-oriented intervals such as 2 hours, days, weeks, and so on. Kibana provides various intervals including auto, millisecond, second, minute, hour, day, week, month, year, and custom, for ease of use. Using the Custom option, date/time-oriented intervals such as 2 hours, days, weeks, and so on, can be supplied. This histogram is ideal for analyzing time-series data—for example, finding the total number of incoming web requests per week/day.
  • Range: This is similar to histogram aggregations; however, rather than fixed intervals, ranges can be specified. Also, it not only works on numeric fields, but it can work on dates and IP addresses. Multiple ranges can be specified using from and to values—for example, finding the number of employees falling in the age ranges 0-25, 25-35, 35-50, and 50 and above.
This type of aggregation includes the from value and excludes the to value for each range.
  • Terms: This type of aggregation works by grouping documents based on each unique term in the field. This aggregation is ideal for finding the top n values for a field—for example, finding the top five countries based on the number of incoming web requests. 
This aggregation works on keyword fields only.
  • Filters: This aggregation is used to create buckets based on a filter condition. This aggregation allows for the comparison of specific values. For example, finding the average number of web requests in India compared to the US.
  • GeoHash Grid: This aggregation works with fields containing geo_point values. This aggregation is used for plotting geo_points on a map by grouping them into buckets—for example, visualizing web request traffic over different geographies.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset