There has been a lot of churn in the Ruby on Rails world for adding Solr support, with a number of competing libraries attempting to support Solr in the most Rails-native way. Rails brought to the forefront the idea of
Convention over Configuration, the principle that sane defaults and simple rules should suffice in most situations versus complex configuration expressed in long XML files. The various libraries for integrating Solr in Ruby on Rails applications establish conventions in how they interact with Solr. However, there are often a lot of conventions to learn, such as suffixing String object field names with _s
to match up with the dynamic field definition for String in Solr's schema.xml
.
The Ruby hash structure looks very similar to the JSON data structure with some tweaks to fit Ruby, such as translating nulls to nils, using single quotes for escaping content, and the Ruby =>
operator to separate key/value pairs in maps. Adding a wt=ruby
parameter to a standard search request, returns results that can be eval()
into a Ruby hash structure like this:
{ 'responseHeader'=>{ 'status'=>0, 'QTime'=>1, 'params'=>{ 'wt'=>'ruby', 'indent'=>'on', 'rows'=>'1', 'start'=>'0', 'q'=>'Pete Moutso'}}, 'response'=>{'numFound'=>523,'start'=>0,'docs'=>[ { 'a_name'=>'Pete Moutso', 'a_type'=>'1', 'id'=>'Artist:371203', 'type'=>'Artist'}] }}
The sunspot_rails
gem hooks into the lifecycle of the ActiveRecord model objects and transparently indexes them in Solr as they are created, updated, and deleted. This allows you to do queries that are backed by Solr searches, but still work with your normal ActiveRecord objects. Let's go ahead and build a small Rails application that we'll call myFaves
, which allows you to store your favorite MusicBrainz artists in a relational model and also to search for them using Solr.
Sunspot comes bundled with a full install of Solr as part of the gem, which you can easily start by running rake sunspot:solr:start
, running Solr on port 8982. This is great for quickly doing development since you don't need to download and set up your own Solr. Typically, you are starting with a relational database already stuffed with content that you want to make searchable. However, in our case, we already have a fully populated index of artist information, so we are actually going to take the basic artist information out of the mbartists index of Solr and populate our local myfaves
database used by the Rails application. We'll then fire up the version of Solr shipped with Sunspot, and see how sunspot_rails
manages the lifecycle of ActiveRecord objects to keep Solr's indexed content in sync with the content stored in the relational database. Don't worry, we'll take it step by step! The completed application is at /examples/9/myfaves
for your reference.
This example assumes you have Rails 3.x already installed. We'll start with the standard plumbing to get a Rails application set up with our basic data model:
>>rails new myfaves >>cd myfaves >>./script/generate scaffold artist name:string group_type:string release_date:datetime image_url:string >>rake db:migrate
This generates a basic application backed by a SQLite database. Now, we need to specify that our application depends on Sunspot. Edit Gemfile
and add the following code:
gem 'sunspot_rails', '~> 1.2.1'
Next, update your dependencies and generate the config/sunspot.yml
configuration file:
>>bundle install >>rails generate sunspot_rails:install
We'll also be working with roughly 399,000 artists, so obviously we'll need some page pagination to manage that list, otherwise pulling up the artists' /index
listing page will timeout. We'll use the popular will_paginate
gem to manage pagination. Add the will_paginate
gem declaration to your Gemfile
and re-run bundle install
:
gem "will_paginate", "~> 3.0.pre4"
Edit the ./app/controllers/artists_controller.rb
file, and replace the call to @artists = Artist.all
in the index
method with:
@artists = Artist.paginate :page => params[:page], :order => 'created_at DESC'
Also, add a call to the view helper at ./app/views/artists/index.html.erb
to generate the page links:
<%= will_paginate @artists %>
Start the application using ./script/rails start
, and visit the page http://localhost:3000/artists/
. You should see an empty listing page for all of the artists. Now that we know that the basics are working, let's go ahead and actually leverage Solr.
Step one will be to import data into our relational database from the mbartists
Solr index. Add the following code to ./app/models/artist.rb
:
class Artist < ActiveRecord::Base searchable do text :name, :default_boost => 2 string :group_type time :release_date end end
The searchable
block maps the attributes of the Artist ActiveRecord
object to the artist fields in Solr's schema.xml
. Since Sunspot is designed to store any kind of data in Solr that is stored in your database, it needs a way of distinguishing among various types of data model objects. For example, if we wanted to store information about our User model object in Solr, in addition to the Artist object, then we would need to provide a field in the schema to distinguish the Solr document for the artist with the primary key of 5
from the Solr document for the user, which also has the primary key of 5
. Fortunately, the mbartists
schema has a field named type
that stores the value Artist
, which maps directly to our ActiveRecord
class name of Artist
.
There is a simple script called populate.rb
at the root of /examples/9/myfaves
that you can run, which will copy the artist data from the existing Solr mbartists
index into the myFaves database:
>>./populate.rb
The populate.rb
is a great example of the types of scripts you may need to develop to transfer data in and out of Solr. Most scripts typically work with some sort of batch size of records that are pulled from one system and then inserted into Solr. The larger the batch size, the more efficient the pulling and processing of data typically is at the cost of more memory being consumed, and the slower the commit and optimize operations are. When you run the populate.rb
script, play with the batch size parameter to get a sense of resource consumption in your environment. Try a batch size of 10
versus 10000
to see the changes. The parameters for populate.rb
are available at the top of the script:
MBARTISTS_SOLR_URL = 'http://localhost:8983/solr/mbartists' BATCH_SIZE = 1500 MAX_RECORDS = 100000
There are roughly 399,000 artists in the mbartists
index, so if you are impatient, then you can set MAX_RECORDS
to a more reasonable number to complete running the script faster.
The connection to Solr is handled by the RSolr
library. A request to Solr is simply a hash of parameters that is passed as part of the GET request. We use the *:*
query to find all of the artists in the index and then iterate through the results using the start
parameter:
rsolr = RSolr.connect :url => MBARTISTS_SOLR_URL response = rsolr.select({ :q => '*:*', :rows=> BATCH_SIZE, :start => offset, :fl => ['*','score'] })
In order to create our new Artist model objects, we just iterate through the results of response['response']['docs']
, parsing each document in order to preserve our unique identifiers between Solr and the database and creating new ActiveRecord
objects. In our MusicBrainz Solr schema, the ID field functions as the primary key and looks like Artist:11650
for The Smashing Pumpkins. In the database, in order to sync the two, we need to insert the Artist with the ID of 11650
. We wrap the insert statement a.save!
in a begin/rescue/end
structure so that if we've already inserted an artist with a primary key, then the script continues. This allows us to run the populate script multiple times without erroring out:
response['response']['docs'].each do |doc| id = doc["id"] id = id[7..(id.length)] a = Artist.new( :id => id, :name => doc["a_name"], :group_type => doc["a_type"], :release_date => doc["a_release_date_latest"] begin a.save! rescue ActiveRecord::StatementInvalid => err raise err unless err.to_s.include?("PRIMARY KEY must be unique") # sink duplicates end end
We've successfully migrated the data we need for our myFaves application out of Solr and we're ready to use the version of Solr that's bundled with Sunspot.
Solr configuration information is listed in ./myfaves/config/sunspot.yml
. Sunspot establishes the convention that development is on port 8982, unit tests that use Solr connect on port 8981, and then production connects on the traditional 8983 port:
development: solr: hostname: localhost port: 8982
Start the included Solr by running rake sunspot:solr:start
. To shut down Solr, run the corresponding rake
command, sunspot:solr:stop
. On the initial startup, rake
will create a new top level ./solr
directory and populate the conf
directory with default configuration files for Solr (including schema.xml
, stopwords.txt
, and so on) pulled from the Sunspot gem.
Now, we are ready to trigger a full index of the data from the relational database into Solr. sunspot
provides a very convenient rake task for this with a variety of parameters that you can learn about by running rake -D sunspot:reindex
:
>>rake sunspot:solr:start >>rake sunspot:reindex
Browse to http://localhost:8982/solr/admin/schema.jsp
to see the list of dynamic fields generated by following the
Convention over Configuration pattern of Rails applied to Solr. Some of the conventions that are established by Sunspot and expressed by Solr in ./solr/conf/schema.xml
are as follows:
id
.type
.sunspot_rails
indexes a model object, it sends a document to Solr with the various suffixes to leverage the dynamic column creation. In ./solr/conf/schema.xml
, the only fields defined outside of the management fields are dynamic fields:<dynamicField name="*_text" type="text" indexed="true" stored="false"/>
text
. However, you need to define what fields are copied into the text field. Sunspot's DSL is oriented towards naming each model field you'd like to search from Ruby.The document that gets sent to Solr for our Artist records creates the dynamic fields such as name_text
, group_type_s
and release_date_d
, for a text, string, and date field, respectively. You can see the list of dynamic fields generated through the schema browser at http://localhost:8982/solr/admin/schema.jsp
.
Now we are ready to perform some searches. Sunspot adds some new methods to our ActiveRecord
model objects such as search()
that lets us load ActiveRecord
model objects by sending a query to Solr. Here we find the group Smash Mouth by searching for matches to the word smashing
:
% ./script/rails console Loading development environment (Rails 3.0.9) >>search= Artist.search{keywords "smashing"} =><Sunspot::Search:{:fq=>["type:Artist"], :q=>"smashing", :fl=>"* score", :qf=>"name_text^2", :defType=>"dismax", :start=>0, :rows=>30}> >>search.results.first =>[#<Artist id: 93855, name: "Smashing Atoms", group_type: nil, release_date: nil, image_url: nil, created_at: "2011-07-21 05:15:21", updated_at: "2011-07-21 05:15:21">]
The raw results from Solr are stored in the search.hits
variable. The search.results
variable returns the ActiveRecord
objects from the database.
Let's also verify that Sunspot is managing the full lifecycle of our objects. Assuming Susan Boyle
isn't yet entered as an artist; let's go ahead and create her:
>>Artist.search{keywords 'Susan Boyle', :fields => [:name]}.hits =>[] >>susan = Artist.create(:name => "Susan Boyle", :group_type =>'1', :release_date => Date.new) => #<Artist id: 548200, name: "Susan Boyle", group_type: 1, release_date: "-4712-01-01 05:00:00", created_at: "2011-07-22 21:05:53"", updated_at: "2011-07-22 21:05:53"">
Check the log output from your Solr running on port 8982, and you should also have seen an update query triggered by the insert of the new Susan Boyle record:
INFO: [] webapp=/solr path=/update params={} status=0 QTime=24
Now, delete Susan's record from your database:
>>susan.destroy => #<Artist id: 548200, name: "Susan Boyle", group_type: 1, release_date: "-4712-01-01 05:00:00", created_at: "2009-04-21 13:11:09", updated_at: "2009-04-21 13:11:09">
As a result, there should be another corresponding update issued to Solr to remove the document:
INFO: [] webapp=/solr path=/update params={} status=0 QTime=57
You can verify this by doing a search for Susan Boyle directly, which should return no rows at http://localhost:8982/solr/select/?q=Susan+Boyle
.
Now, let's go ahead and put in the rest of the logic for using our Solr-ized model objects to simplify finding our favorite artists. We'll store the list of favorite artists in the browser's session space for convenience. If you are following along with your own generated version of the myFaves application, then the remaining files you'll want to copy over from /examples/9/myfaves
are as follows:
./app/controller/myfaves_controller.rb
: This contains the controller logic for picking your favorite artists../app/views/myfaves/
: This contains the display files for picking and showing the artists../app/views/layouts/myfaves.html.erb
: This is the layout of the myFaves views. We use the Autocomplete widget again so that this layout embeds the appropriate JavaScript and CSS files../public/stylesheets/jquery.autocomplete.css
and ./public/stylesheets/indicator.gif
: They are stored locally in order to fix pathing issues with the indicator.gif
showing up when the autocompletion search is running.The only other edits you need to make are:
./config/routes.rb
by adding resources :myfaves
and root :to => "myfaves#index"
../public/index.html
to use the new root
route you just defined../app/controllers/artists_controllers.rb
because we want the index method to respond with both HTML and JSON response types.rake db:sessions:create
to generate a sessions table, then rake db:migrate
to update the database with the new sessions table. Edit ./config/initializers/session_store.rb
and change to using :active_record_store
for preserving the session state.You should now be able to run ./script/rails start
and browse to http://localhost:3000/
. You will be prompted to enter the search by entering the artist's name. If you don't receive any results, then make sure you have started Solr using rake sunspot:solr:start
. Also, if you have only loaded a subset of the full 399,000 artists, then your choices may be limited. You can load all of the artists through the populate.rb
script and then run rake sunspot:reindex
, although it will take a long time to complete. This is something good to do just before you head out for lunch or home for the evening!
If you look at ./app/views/myfaves/index.rhtml
, then you can see that the jQuery autocomplete call is a bit different:
$("#artist_name").autocomplete( '/artists.json?callback=?', {
The URL we are hitting is /artists.json
, with the .json
suffix telling Rails that we want the JSON data back instead of normal HTML. If we ended the URL with .xml
, then we would have received XML-formatted data about the artists. We provide a slightly different parameter to Rails to specify the JSONP callback to use. Unlike the previous example, where we used json.wrf
, which is Solr's parameter name for the callback method to call, we use the more standard parameter name callback
. We changed the ArtistController index
method to handle the autocomplete
widget's data needs through JSONP. If there is a q
parameter, then we know that the request was from the autocomplete
widget, and we ask Solr for @artists
to respond with. Later on, we render @artists
into JSON objects, returning only the name
and id
attributes to keep the payload small.
We also specify that the JSONP callback method is what was passed when using the callback
parameter:
def index if params[:q] @artists = Artist.search{ keywords params[:q]}.results else @artists = Artist.paginate :page => params[:page], :order => 'created_at DESC' end respond_to do |format| format.html # index.html.erb format.xml { render :xml => @artists } format.json { render :json => @artists.to_json(:only => [:name, :id]), :callback => params[:callback] } end end
At the end of all of this, you should have a nice autocomplete interface for quickly picking artists.
When you are selecting Sunspot as your integration method, you are implicitly agreeing to the various conventions established for indexing data into Solr. If you are used to working with Solr directly, you may find understanding the Sunspot DSL for querying a bit of an obstacle. However, if your background is in Rails, or you are building very complex queries, then learning the DSL will pay off in productivity and the ability to maintain complex expressive queries.
The two most common high-level libraries for interacting with Solr are acts_as_solr
and Sunspot. However, in the last couple of years, Sunspot has become the more popular choice, and comes in a version designed to work explicitly with Rails called sunspot_rails
that allows Rails' ActiveRecord
database objects to be transparently backed by a Solr index for full text search.
For lower-level client interface to Solr from Ruby environments, there are two libraries duking it out to be the client of choice: solr-ruby
, a client library developed by the Apache Solr project and rsolr
, which is a reimplementation of a Ruby-centric client library. Both of these solutions are solid and act as great low-level API libraries. However, rsolr
has gained more attention, has better documentation, and some nice features such as a direct embedded Solr connection through JRuby. rsolr
also has support for using curb
(Ruby bindings to curl
, a very fast HTTP library) instead of the standard Net::HTTP
library for the HTTP transport layer.
In order to perform a select using solr-ruby
, you need to issue the following code:
response = solr.query('washington', { :start=>0, :rows=>10 })
In order to perform a select using rsolr
, you need to issue the following code:
response = solr.select({ :q=>'washington', :start=>0, :rows=>10 })
So you can see that doing a basic search is pretty much the same in either library. Differences crop up more as you dig into the details on parsing and indexing records. You can learn more about solr-ruby
on the Solr wiki at http://wiki.apache.org/solr/solr-ruby and learn more about rsolr
at http://github.com/mwmitchell/rsolr/.