Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by Matt Mitchell, Kranti Parisa, Eric Pugh, David Smiley
Apache Solr Enterprise Search Server - Third Edition
Apache Solr Enterprise Search ServerThird Edition
Table of Contents
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Quick Starting Solr
An introduction to Solr
Lucene – the underlying engine
Solr – a Lucene-based search server
Comparison to database technology
A few differences between Solr 4 and Solr 5
Getting started
Solr's installation directory structure
Running Solr
A quick tour of Solr
Loading sample data
A simple query
Some statistics
The sample browse interface
Configuration files
What's next?
Schema design and indexing
Text analysis
Searching
Integration
Resources outside this book
Summary
2. Schema Design
Is Solr schemaless?
MusicBrainz.org
One combined index or separate indices
One combined index
Problems with using a single combined index
Separate indices
Schema design
Step 1 – determine which searches are going to be powered by Solr
Step 2 – determine the entities returned from each search
Step 3 – denormalize related data
Denormalizing – one-to-one associated data
Denormalizing – one-to-many associated data
Step 4 – omit the inclusion of fields only used in search results (optional)
The schema.xml file
Field definitions
Dynamic field definitions
Advanced field options for indexed fields
The unique key
The default search field and query operator
Copying fields
Our MusicBrainz field definitions
Defining field types
Built-in field type classes
Numbers and dates
Some other field types
Summary
3. Text Analysis
Configuring field types
Experimenting with text analysis
Character filters
Tokenization
Filtering
Stemming
Correcting and augmenting stemming
Processing synonyms
Synonym expansion at index time versus query time
Working with stop words
Phonetic analysis
Substring indexing and wildcards
ReversedWildcardFilter
N-gram analysis
N-gram costs
Sorting text
Miscellaneous token filters
The multilingual search
The multifield approach
The multicore approach
The single field approach
Summary
4. Indexing Data
Communicating with Solr
Using direct HTTP or a convenient client API
Pushing data to Solr or have Solr pull it
Data formats
Solr's HTTP POST options
Remote streaming
Solr's Update-XML format
Deleting documents
Commit, optimize, and rollback the transaction log
Don't overlap commits
Index optimization
Rolling back an uncommitted change
The transaction log
Atomic updates and optimistic concurrency
Sending CSV-formatted data to Solr
Configuration options
The DataImportHandler framework
Configuring the DataImportHandler framework
The development console
Writing a DIH configuration file
Data sources
Entity processors
Fields and transformers
Example DIH configurations
Importing from databases
Importing XML from a file with XSLT
Importing multiple rich document files – crawling
Importing commands
Delta imports
Indexing documents with Solr Cell
Extracting text and metadata from files
Configuring Solr
Solr Cell parameters
Update request processors
Summary
5. Searching
Your first search – a walk-through
A note on response format types
Solr's generic XML structured data representation
Solr's XML response format
Parsing the URL
Understanding request handlers
Query parameters
Search criteria related parameters
Result pagination related parameters
Output-related parameters
More about the fl parameter
Diagnostic parameters
Query parsers and local-params
Query syntax (the lucene query parser)
Matching all the documents
Mandatory, prohibited, and optional clauses
Boolean operators
Subqueries
Limitations of prohibited clauses in subqueries
Querying specific fields
Phrase queries and term proximity
Wildcard queries
Fuzzy queries
Regular expression queries
Range queries
Date math
Score boosting
Existence and nonexistence queries
Escaping special characters
The DisMax query parser – part 1
Searching multiple fields
Limited query syntax
Min-should-match
Basic rules
Multiple rules
What to choose
A default query
The uf parameter
Filtering
Sorting
Joining
The join query parser
Block-join query parsers
The block-join-children parser
The block-join-parent parser
Spatial search
Spatial in Solr 3 – LatLonType and friends
Configuration
Spatial in Solr 4 – SpatialRecursivePrefixTreeFieldType
Configuration – basic
Indexing points
Filtering by distance or rectangle
Sorting by distance
Returning the distance
Boosting by distance
Memory and performance of distance sorting and boosting
Advanced spatial
Summary
6. Search Relevancy
Scoring
Alternative scoring models
Query-time and index-time boosting
Troubleshooting queries and scoring
Tools – Splainer and Quepid
The DisMax query parser – part 2
Lucene's DisjunctionMaxQuery
Boosting – automatic phrase boosting
Configuring automatic phrase boosting
Phrase slop configuration
Partial phrase boosting
Boosting – boost queries
Boosting – boost functions
Add or multiply boosts
Functions and function queries
Field references
Function references
Mathematical primitives
Other math
Boolean functions
Relevancy statistics functions
Ord and rord
Miscellaneous functions
External field values
Function query boosting
Formula – logarithm
Formula – inverse reciprocal
Formula – reciprocal
Formula – linear
How to boost based on an increasing numeric field
Step by step…
How to boost based on recent dates
Step by step…
Summary
7. Faceting
A quick example – faceting release types
Field requirements
Types of faceting
Faceting field values
Alphabetic range bucketing
Faceting numeric and date ranges
Range facet parameters
Facet queries
Building a filter query from a facet
Field value filter queries
Facet range filter queries
Pivot faceting
Hierarchical faceting
Excluding filters – multiselect faceting
Summary
8. Search Components
About components
The highlight component
A highlighting example
Choose the Standard, FastVector, or Postings highlighter
The Standard (default) highlighter
The FastVector highlighter
The Postings highlighter
Highlighting configuration
The SpellCheck component
The schema configuration
Configuration in solrconfig.xml
Configuring spellcheckers – dictionaries
DirectSolrSpellChecker options
IndexBasedSpellChecker options
FileBasedSpellChecker options
WordBreakSolrSpellChecker options
Processing the q parameter
Processing the spellcheck.q parameter
Building index- and file-based spellcheckers
Issuing spellcheck requests
Example usage for a misspelled query
Query complete/suggest
Instant-search via edge n-grams
Query term completion via facet.prefix
Query term completion via the Suggester
Query term completion via the Terms component
Field-value completion via the Suggester
The QueryElevation component
Configuration
The MoreLikeThis component
Configuration parameters
Parameters specific to the MLT search component
Parameters specific to the MLT request handler
Common MLT parameters
The MLT results example
The Stats component
Configuring the stats component
Statistics on track durations
The Clustering component
Collapsing and expanding
The Collapse query parser
The Expand component
An example
Compared to Result grouping
The TermVector component
Summary
9. Integrating Solr
Working with the included examples
Inventory of examples
Solritas – the integrated search UI
The pros and cons of Solritas
SolrJ – Solr's Java client API
The sample code – BrainzSolrClient
Dependencies and Maven
Declaring logging dependencies
The SolrServer class
Using javabin instead of XML for efficiency
Searching with SolrJ
Indexing with SolrJ
Deleting documents
Annotating your JavaBean – an alternative
Embedding Solr
When should you use embedded Solr? Tests!
Using JavaScript/AJAX with Solr
Wait, what about security?
Building a Solr-powered artists autocomplete widget with jQuery and JSONP
AJAX Solr
Using XSLT to transform XML search results
Accessing Solr from PHP applications
solr-php-client
Drupal options
The Apache Solr Search integration module
Hosted Solr by Acquia
Ruby on Rails integrations
Solr's Ruby response writer
The sunspot_rails gem
Setting up the myFaves project
Populating the myFaves relational database from Solr
Building Solr indexes from a relational database
Completing the myFaves website
Which Rails/Ruby library should I use?
Nutch for crawling web pages
Solr and Hadoop
HDFS
Indexing via MapReduce
Morphlines
Running a Solr build using Hadoop
Looking at the storage
The data ingestion process
ManifoldCF – a connector framework
Connectors
Putting ManifoldCF to use
Document-level security
Summary
10. Scaling Solr
Tuning complex systems is hard
Use SolrMeter to test Solr performance
Optimizing a single Solr server – scale up
Configuring JVM settings to improve memory usage
Using MMapDirectoryFactory to leverage additional virtual memory
Enabling downstream HTTP caching to reduce load
Solr caching
Tuning caches
Indexing performance
Designing the schema
Sending data to Solr in bulk
Disabling unique key checking
Index optimization and mergeFactor settings
Enhancing faceting performance
Using term vectors
Improving phrase search performance
Configuring Solr for near real-time search
Use SolrCloud to go big – scale wide
SolrCloud glossary
Launching Solr in SolrCloud mode
Managing collections and configurations
Stand up SolrCloud for our MusicBrainz artists index
Choosing the replication factor and number of shards
Creating and deleting collections
Replicas and leaders
Document routing
Shard splitting
Dealing with long running collection tasks
Adding nodes
Summary
11. Deployment
Deployment methodology for Solr
Questions to ask
Installing Solr into a Servlet container
Differences between Servlet containers
Defining the solr.home property
Configuring logging
HTTP server request access logs
Solr application logging
Configuring logging output
Jetty startup integration
Managing log levels at runtime
A RequestHandler per search interface
Leveraging Solr cores
Configuring solr.xml
Property substitution
Include fragments of XML with XInclude
Managing cores
Some uses of multiple cores
Setting up ZooKeeper for SolrCloud
Installing ZooKeeper
Administering Data in ZooKeeper
Monitoring Solr performance
Stats Admin interface
Monitoring Solr via JMX
Starting Solr with JMX
Securing Solr from prying eyes
Limiting server access
Put Solr behind a Proxy
Securing public searches
Controlling JMX access
Securing index data
Controlling document access
Other things to look at
Summary
A. Quick Reference
Core search
Diagnostic
The Lucene query parser
The DisMax query parser
The Lucene query syntax
Faceting
Highlighting
Spell checking
Miscellaneous nonsearch
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Table of Contents
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset