You've deployed Solr, and the world is beating a path to your door, leading to a sharp increase in the number of queries being issued, and meanwhile you've indexed tenfold the amount of information you originally expected. You discover that Solr is taking longer to respond to queries and index new content. When this happens, it's time to start looking at what configuration changes you can make to Solr to support more load. We'll look at a series of changes/optimizations that you can make, starting with the simplest changes that give the most bang for your buck to more complex changes that require thorough analysis of the impact of the system changes.
In this chapter, we will cover the following topics:
Tuning any complex system, whether it's a database, a message queuing system, or the deep dark internals of an operating system, is something of a black art. Researchers and vendors have spent decades figuring out how to measure the performance of systems and coming up with approaches for maximizing the performance of those systems. For some systems that have been around for decades, such as databases, you can just search online for tuning tips for X database and find explicit rules that suggest what you need to do to gain performance. However, even with those well-researched systems, it still can be a matter of trial and error.
In order to measure the impact of your changes, you should look at a couple of metrics and optimize for these three parameters:
avgTimePerRequest
and avgRequestsPerSecond
parameters of your request handlers.In order to get a sense of what the Steady State for your application is, you can gather the statistics by using the SolrMeter load testing tool to put your Solr deployment under load. We'll discuss in the next section how to build a load testing script with SolrMeter that accurately mirrors your real-world interactions with Solr. This effort will give you a tool that can be run repeatedly and allows more of an apple-to-apple comparison of the impact of the changes to your configuration.
Solr's architecture has benefited from its heritage as the search engine developed in-house from 2004 to 2006 to power CNET.com, a site that, at the time of writing, is ranked 86th for traffic by Alexa.com. Solr, out-of-the-box, is already very performant, with extensive effort spent by the community to ensure that there are minimal bottlenecks. Additional tuning will trade-off increases in search performance at the expense of disk index size, indexing speed, and/or memory requirements (and vice versa). The approaches are as follows:
avgTimePerRequest
, but have too many incoming requests, then replicate your complete index across multiple Solr nodes. If your queries take too long to complete due to the complexity or size of the index, then use sharding to share the load of processing a single query across multiple sharded Solr servers.