Faceting is Solr's killer feature. It's a must-have feature for most search implementations, especially those with structured data like in e-commerce. Yet there are few products that have this capability, especially in open source. Of course, search fundamentals, including highlighting, are critical too, but they tend to be taken for granted. Faceting enhances search results with aggregated information over all documents matching the search query. It can answer questions about the MusicBrainz data such as:
Faceting in the context of the user experience is often referred to as faceted navigation, but also faceted search, faceted browsing, guided navigation, or parametric search. The facets are typically displayed with clickable links that apply Solr filter queries to a subsequent search. Endeca's excellent UX Design Pattern Library contains many screenshots worth viewing. Visit http://www.oracle.com/webfolder/ux/applications/uxd/endeca/content/library/en/home.html and click on Faceted Navigation.
If we revisit the comparison of search technology to databases, then faceting is more or less analogous to SQL's GROUP
BY
feature on a column with count(*)
. However, in Solr, facet processing is performed subsequent to an existing search as part of a single request-response, with both the primary search results and the faceting results coming back together. In SQL you would need to perform a series of separate queries to get the same information. Furthermore, faceting works so fast that its search response time overhead is often negligible. For more information on why implementing faceting with relational databases is hard and doesn't scale, visit this old article at http://web.archive.org/web/20090321120327/http://www.kimbly.com/blog/000239.html.
Observe the following search results. The echoParams
parameter is set to explicit
(defined in solrconfig.xml
) so that the search parameters are seen here. This example is using the default lucene
query parser. The dismax
query parser is more typical, but it has no bearing on these examples. The query parameter q
is *:*
, which matches all documents. In this case, the index only has releases, so there is no need to apply filters. Filter queries are used in conjunction with faceting a fair amount, so be sure you are familiar with them; see Chapter 5, Searching. To keep this example brief, we set rows
to 2
. Sometimes when using faceting, you only want the facet information and not the main search, so you would set rows
to 0
.
{"responseHeader":{ "status":0, "QTime":3, "params":{ "facet":"true", "f.r_official.facet.method":"enum", "f.r_official.facet.missing":"true", "facet.field":"r_official", "fq":"type:Release", "fl":"r_name", "q":"*:*", "wt":"json", "rows":"2"}}, "response":{"numFound":603090,"start":0,"docs":[ {"r_name":"Texas International Pop Festival 11-30-69"}, {"r_name":"40 Jahre"}]}, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "r_official":[ "Official",519168, "Bootleg",19559, "Promotion",16562, "Pseudo-Release",2819, null,44982]}, "facet_dates":{}, "facet_ranges":{}}}
The facet-related search parameters are highlighted at the top. The facet.missing
parameter was set using the field-specific syntax, which will be explained shortly.
Notice that the facet results (highlighted) follow the main search result and are given the name facet_counts
. In this example, we only faceted on one field, r_official
, but you'll learn in a bit that you can facet on as many fields as you desire. Within "r_official"
lie the facet counts for this field—value and count pairs. The first value in a pair, such as "Official"
, holds a facet value, which is simply an indexed term, and the integer following it is the number of documents in the search results containing that term—the
facet count. The last facet has the count but no corresponding name. It is a special facet to indicate how many documents in the results don't have any indexed terms.