We will get started by downloading Solr, examining its directory structure, and then finally run it.
This will set you up for the next section, which tours a running Solr 5 server.
java –version
at a command line will tell you exactly which version of Java you are using, if any.Java is available on all major platforms, including Windows, Solaris, Linux, and Mac OS X. Visit http://www.java.com to download the distribution for your platform. Java always comes with the Java Runtime Environment (JRE) and that's all Solr requires. The Java Development Kit (JDK) includes the JRE plus the Java compiler and various diagnostic utility programs. One such useful program is JConsole, which we'll discuss in Chapter 11, Deployment, and Chapter 10, Scaling Solr and so the JDK distribution is recommended.
When you unzip Solr after downloading it, you should find a relatively straightforward directory structure (differences between Solr 4 and 5 are briefly explained here):
contrib
: The Solr contrib
modules are extensions to Solr:analysis-extras
: This directory includes a few text analysis components that have large dependencies. There are some International Components for Unicode (ICU) unicode classes for multilingual support—a Chinese stemmer and a Polish stemmer. You'll learn more about text analysis in the next chapter.clustering
: This directory will have an engine for clustering search results. There is a one-page overview in Chapter 8, Search Components.dataimporthandler
: The DataImportHandler (DIH) is a very popular contrib
module that imports data into Solr from a database and some other sources. See Chapter 4, Indexing Data.extraction
: Integration with Apache Tika—a framework for extracting text from common file formats. This module is also called SolrCell and Tika is also used by the DIH's TikaEntityProcessor—both are discussed in Chapter 4, Indexing Data.langid
: This directory contains a contrib
module that provides the ability to detect the language of a document before it's indexed. More information can be found on the Solr's Language Detection wiki page at http://wiki.apache.org/solr/LanguageDetection.map-reduce
: This directory has utilities for working with Solr from Hadoop Map-Reduce. This is discussed in Chapter 9, Integrating Solr.morphlines-core
: This directory contains Kite Morphlines, a document ingestion framework that has support for Solr. The morphlines-cell
directory has components related to text extraction. Morphlines is mentioned in Chapter 9, Integrating Solr.uima
: This directory contains library for Integration with Apache UIMA—a framework for extracting metadata out of text. There are modules that identify proper names in text and identify the language, for example. To learn more, see Solr's UIMA integration wiki at http://wiki.apache.org/solr/SolrUIMA.velocity
: This directory will have a simple search UI framework based on the Velocity templating language. See Chapter 9, Integrating Solr.dist
: In this directory, you will see Solr's core
and contrib
JAR files. In previous Solr versions, the WAR file was found here as well. The core JAR file is what you would use if you're embedding Solr within an application. The Solr test framework JAR and /test-framework
directory contain the libraries needed in testing Solr extensions. The SolrJ JAR and /solrj-lib
are what you need to build Java based clients for Solr.docs
: This directory contains documentation and "Javadocs" for the related assets for the public Solr website, a quick tutorial, and of course Solr's API.example
: Pre Solr 5, this was the complete Solr server, meant to be an example layout for deployment. It included the Jetty servlet engine (a Java web server), Solr, some sample data and sample Solr configurations. With the introduction of Solr 5, only the example-DIH
and exampledocs
are kept, the rest was moved to a new server
directory.example/example-DIH
: These are DataImportHandler configuration files for the example Solr setup. If you plan on importing with DIH, some of these files may serve as good starting points.example/exampledocs
: These are sample documents to be indexed into the default Solr configuration, along with the post.jar
program for sending the documents to Solr.server
: The files required to run Solr as a server process are located here. The interesting child directories are as follows:server/contexts
: This is Jetty's WebApp configuration for the Solr setup.server/etc
: This is Jetty's configuration. Among other things, here you can change the web port used from the presupplied 8983
to 80
(HTTP default).server/logs
: Logs are by default output here. Introduced in Solr 5 was collecting JVM metrics, which are output to solr_gc.log
. When you are trying to size your Solr setup they are a good source of information.server/resources
: The configuration file for Log4j lives here. Edit it to change the behavior of the Solr logging, (though you can also changes levels of debugging at runtime through the Admin console).server/solr
: The configuration files for running Solr are stored here. The solr.xml
file, which provides overall configuration of Solr lives here, as well as zoo.cfg
which is required by SolrCloud. The subdirectory /configsets
stores example configurations that ship with Solr.example/webapps
: This is where Jetty expects to deploy Solr from. A copy of Solr's WAR file is here, which contains Solr's compiled code and all the dependent JAR files needed to run it.example/solr-webapp
: This is where Jetty deploys the unpacked WAR file.Solr ships with a number of example collection configurations. We're going to run one called techproducts. This example will create a collection and insert some sample data.
The addition of scripts for running Solr is one of the best enhancements in Solr 5. Previously, to start Solr, you directly invoked Java via java –jar start.jar
. Deploying to production meant figuring out how to migrate into an existing Servlet environment, and was the source of much frustration.
First, go to the bin
directory, and then run the main Solr command. On Windows, it will be solr.cmd
, on *nix systems it will be just solr
. Jetty's start.jar
file by typing the following command:
>>cd bin >>./solr start –e techproducts
The >>
notation is the command prompt and is not part of the command. You'll see a few lines of output as Solr is started, and then the techproducts collection is created via an API call. Then the sample data is loaded into Solr. When it's done, you'll be directed to the Solr admin at http://localhost:8983/solr
.
To stop Solr, use the same Solr command script:
>>./solr stop