Solr has built-in support for faceting numeric and date fields by a range and a divided interval. You can think of this as a convenience feature that calculates the ranges for you with succinct input parameters and output, rather than you calculating and submitting a series of facet queries—facet queries are described after this section.
Range faceting is particularly useful for dates. We'll demonstrate an example against MusicBrainz release dates and another against MusicBrainz track durations, and then describe the parameters and their options.
Here's the URL:
http://localhost:8983/solr/mbreleases/mb_releases?indent=on&wt=json&omitHeader=true&rows=0&facet=true&facet.range.other=all&f.r_event_date_earliest.facet.range.start=NOW/YEAR-10YEARS&facet.range=r_event_date_earliest&facet.range.end=NOW/YEAR&facet.range.gap=+1YEAR&q=smashing
And here's the response:
{"response":{"numFound":248,"start":0,"docs":[]}, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{ "r_event_date_earliest":{ "counts":[ "2003-01-01T00:00:00Z",2, "2004-01-01T00:00:00Z",1, "2005-01-01T00:00:00Z",1, "2006-01-01T00:00:00Z",3, "2007-01-01T00:00:00Z",11, "2008-01-01T00:00:00Z",0, "2009-01-01T00:00:00Z",0, "2010-01-01T00:00:00Z",0, "2011-01-01T00:00:00Z",0, "2012-01-01T00:00:00Z",0], "gap":"+1YEAR", "start":"2003-01-01T00:00:00Z", "end":"2013-01-01T00:00:00Z", "before":93, "after":0, "between":18}}}}
This example demonstrates a few things, not only range faceting:
/mb_releases
is a request handler using dismax to query appropriate release fields.q=smashing
indicates that we're faceting on a keyword search instead of all the documents. We kept the rows at zero, which is unrealistic, but not pertinent as the rows setting does not affect facets."start"
and "end"
part below the facet counts indicates the upper bound of the last facet count. It may or may not be the same as facet.range.end
(see facet.range.hardend
explained in the next section).before
, after
, and between
counts are to specify facet.range.other
. We'll see shortly what this means.The results of our facet range query show that there were three releases in 2006 and eleven in 2007. There is no data after that, since the data is out of date at this point.
Here is another example, this time using range faceting on a number—MusicBrainz track durations (in seconds). The URL is http://localhost:8983/solr/mbtracks/mb_tracks?wt=json&omitHeader=true&rows=0&facet.range.other=after&facet=true&q=Geek&facet.range.start=0&facet.range=t_duration&facet.range.end=240&facet.range.gap=60
.
This is the response:
{"response":{"numFound":552,"start":0,"docs":[]}, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{ "t_duration":{ "counts":[ "0",128, "60",64, "120",111, "180",132], "gap":60, "start":0, "end":240, "after":117}}}}
Taking the first facet, we see that there are 128 tracks that are 0–59 seconds long, given the keyword search "Geek".
All of the range faceting parameters start with facet.range
. As with most other faceting parameters, they can be made field specific in the same way. The parameters are explained as follows:
facet.range
: You must set this parameter to a field's name to range-facet on that field. The trie-based numeric and date field types (those starting with t
, as in tlong
and tdate
) perform best, but others will work. Repeat this parameter for each field to be faceted on.facet.range.start
: This is mandatory. It is a number or date to specify the start of the range to facet on. For dates, see the Date math section in Chapter 5, Searching. Using NOW
with some Solr date math is quite effective as in this example: NOW/YEAR-5YEARS
, interpreted as five years ago, starting at the beginning of the year.facet.range.end
: This is mandatory. It is a number or date to specify the end of the range. It has the same syntax as facet.range.start
. Note that the actual end of the range may be different (see facet.range.hardend
).facet.range.gap
: This is also mandatory. It specifies the interval to divide the range. For dates, it uses a subset of Solr's Date Math syntax, as it's a time duration and not a particular time. It should always start with a +
. For example, +1YEAR
or +1MINUTE+30SECONDS
. Note that after URL encoding, +
becomes %2B
.facet.range.hardend
: This parameter instructs Solr on what to do when facet.range.gap
does not divide evenly into the facet range (start | end). If this is true
, then the last range will be shortened. Moreover, you will observe that the end
value in the facet results is the same as facet.range.end
. Otherwise, by default, the end is essentially increased sufficiently so that the ranges are all equal according to the gap value. The default value is false
.facet.range.other
: This parameter adds more faceting counts depending on its value. It can be specified multiple times. See the example using this at the start of this section. It defaults to none
.before
: Count of documents before the faceted rangeafter
: Count of documents following the faceted rangebetween
: Count of documents within the faceted rangenone
(disabled): The defaultall
: Shortcut for all three (before
, between
, and after
)facet.range.include
: This specifies which range boundaries are inclusive. The choices are lower
, upper
, edge
, outer
, and all
(all
being equivalent to all the others). This parameter can be set multiple times to combine choices and defaults to lower
. Instead of defining each value, we will describe when a given boundary is inclusive:lower
is specified. It is also included if it's the first gap range and edge
is specified.upper
is specified. It is also included if it's the last gap range and edge
is specified.before
range is included if the boundary is not already included by the first gap-based range. It's also included if outer
is specified.after
range is included if the boundary is not already included by the last gap-based range. It's also included if outer
is specified.Avoid double counting
The default facet.range.include
of lower
ensures that an indexed value occurring at a range boundary is counted in exactly one of the adjacent ranges. This is usually desirable, but your requirements may differ. To ensure you don't double count, don't choose both lower
and upper
together and don't choose outer
.