Data sent to Solr is not immediately searchable, nor do deletions take immediate effect. Like a database, changes must be committed. There are two types of commits:
<autoCommit>
option in solrconfig.xml
or by adding commit=true
request parameter to a Solr update URL.<autoSoftCommit>
option in solrconfig.xml
or using the softCommit=true
option along with the commit
parameter or by using the commitWithin
parameter.The request to Solr could be the same request that contains data to be indexed then committed, or an empty request—it doesn't matter. For example, you can visit this URL to issue a commit on our mbreleases
core: http://localhost:8983/solr/mbreleases/update?commit=true
. You can also commit changes using the XML syntax by simply sending this to Solr:
<commit />
There are three important things to know about commits that are unique to Solr:
When you are bulk-loading data, these concerns are not an issue since you're going to issue a final commit at the end. But if Solr is asynchronously updated by independent clients in response to changed data, commits could come too quickly and might overlap. To address this, Solr has two similar features, autoCommit and commitWithin. The first refers to a snippet of XML configuration that is commented in solrconfig.xml
, in which Solr will automatically commit at a document-count threshold or time-lapse threshold (time of oldest uncommitted document). In this case, Solr itself handles committing and so your application needn't send commits. commitWithin
is a similar time-lapse option that is set by the client on either the <add commitWithin="…">
element or the <commit commitWithin="…"/>
element of an XML formatted update message or a request parameter by the same name. It will ensure a commit occurs within the specified number of milliseconds. Here's an example of a 30-second commit window:
<commit commitWithin="30000"/>
Since Solr 4.0, the commitWithin
performs a soft-commit, which prevents the slaves from replicating the changes in a master/slave configuration. However, this default behavior can be overwritten in solrconfig.xml
by enabling the forceHardCommit
option to allow commitWithin
to perform hard commits.
During indexing, you may find that you are starting to see this error message:
<h2>HTTP ERROR: 503</h2><pre>Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</pre>
Every time a commit happens, a new searcher is created, which invokes the searcher warm up process for populating the cache, and that can take a while. While you can bump up the maxWarmingSearchers
parameter in solrconfig.xml
, you shouldn't since you could still hit the new limit, but worse is that memory requirements can soar and the system will slow down when multiple searchers are warming. So, you need to ensure commits aren't happening concurrently—or, if you must, that there are no more than two. If you see this problem, you should use the autoCommit or commitWithin parameter when issuing commits. In both cases, you need to choose a time window that is long enough for a commit to finish.
Lucene's index is internally composed of one or more segments. When a buffer of indexed documents gets flushed to the disk, it creates a new segment. Deletes get recorded in another file, but they go to disk too. Sometimes, after a new segment is written, Lucene will merge some of them together. When Lucene has just one segment, it is in an
optimized state. The more segments there are, the more query performance will degrade. Of course, optimizing an index comes at a cost; the larger your index is, the longer it will take to optimize. Finally, an optimize
command implies commit semantics. You may specify an optimize
command in all the places you specify a commit. So, to use it in a URL, try this: http://localhost:8983/solr/mbreleases/update?optimize=true
. For the XML format, simply send this:
<optimize />
We recommend explicitly optimizing the index at an opportune time, such as after a bulk load of data and/or a daily interval in off-peak hours, if there are low-volume sporadic updates to the index. Chapter 10, Scaling Solr has a tip on optimizing to more than one segment if the optimizes are taking too long.
Both commit and optimize commands take two additional Boolean options that default to true
:
<optimize waitFlush="true" waitSearcher="true"/>
If you were to set these to false
, then commit and optimize commands return immediately, even though the operation hasn't actually finished yet. So, if you write a script that commits with these at their false values and then executes a query against Solr, you might find that the search does not reflect the changes yet. By waiting for the data to flush to the disk (waitFlush
) and waiting for a new searcher to be ready to respond to changes (waitSearcher
), this circumstance is avoided. These options are useful for executing an optimize command from a script that simply wants to optimize the index and otherwise doesn't care when newly added data is searchable.
There is one final indexing command to discuss—rollback
. All uncommitted changes can be canceled by sending Solr the rollback
command either via a URL
parameter such as http://localhost:8983/solr/mbreleases/update?rollback=true
or with the following XML code:
<rollback />
When the transaction log (tlogs) are enabled via the updateLog
feature in solrconfig.xml
, Solr writes the raw documents into the tlog files for recovery purposes. Transaction logs are used for near real-time (NRT) get, durability, and SolrCloud replication recovery.
To enable tlogs, simply add the following code to your updateHandler
configuration:
<updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog>
Here, dir
represents the target directory for transaction logs. This defaults to the Solr data
directory.
If you don't need NRT get feature and you are not using SolrCloud, you can safely comment-out the updateLog
section in solrconfig.xml
. For more details about NRT get, see https://cwiki.apache.org/confluence/display/solr/RealTime+Get.