While most relational databases provide some mechanism for full-text search, these databases are optimized for online transaction processing (OLTP) type workloads. Document databases, on the other hand, are designed specifically for full-text search queries, and excel at them. In this recipe, I'll show you how to use NHibernate Search and Lucene.Net to provide full-text search capabilities for your entities.
NHibernate.Search.dll
and Lucene.Net.dll
from the downloaded ZIP file to your solution's Lib
folder.Eg.Core
model and mappings from Chapter 1.Eg.Core
, add a reference to NHibernate.Search.dll
.Entity
base class, decorate the Id
property with the DocumentId
attribute from NHibernate.Search.Attributes
.Product
class, add the following attributes:[Indexed] public class Product : Entity { [Field] public virtual string Name { get; set; } [Field] public virtual string Description { get; set; } public virtual Decimal UnitPrice { get; set; } }
[Indexed] public class Book : Product { [Field(Index = Index.UnTokenized)] public virtual string ISBN { get; set; } [Field] public virtual string Author { get; set; } }
Eg.Search.Runner
.Eg.Core
model, log4net.dll
, Lucene.Net.dll
, NHibernate.dll
, and NHibernate.ByteCode.dll
.App.config
file with the standard log4net
and hibernate-configuration
sections.SearchConfiguration
using the following code:public class SearchConfiguration { public ISessionFactory BuildSessionFactory() { var cfg = new Configuration().Configure(); SetSearchPropscfg); AddSearchListeners(cfg); var sessionFactory = cfg.BuildSessionFactory(); return new SessionFactorySearchWrapper( sessionFactory); } private void SetSearchProps(Configuration cfg) { cfg.SetProperty( "hibernate.search.default.directory_provider", typeof(FSDirectoryProvider) .AssemblyQualifiedName); cfg.SetProperty( "hibernate.search.default.indexBase", "~/Index"); } private void AddSearchListeners(Configuration cfg) { cfg.SetListener(ListenerType.PostUpdate, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostInsert, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostDelete, new FullTextIndexEventListener()); cfg.SetListener(ListenerType.PostCollectionRecreate, new FullTextIndexCollectionEventListener()); cfg.SetListener(ListenerType.PostCollectionRemove, new FullTextIndexCollectionEventListener()); cfg.SetListener(ListenerType.PostCollectionUpdate, new FullTextIndexCollectionEventListener()); } }
SessionFactorySearchWrapper
using the following code:public class SessionFactorySearchWrapper : ISessionFactory { private readonly ISessionFactory _sessionFactory; public SessionFactorySearchWrapper( ISessionFactory sessionFactory) { _sessionFactory = sessionFactory; } public ISession OpenSession() { var session = _sessionFactory.OpenSession(); return WrapSession(session); } public ISession OpenSession( IDbConnection conn, IInterceptor sessionLocalInterceptor) { var session = _sessionFactory .OpenSession(conn, sessionLocalInterceptor); return WrapSession(session); } public ISession OpenSession( IInterceptor sessionLocalInterceptor) { var session = _sessionFactory .OpenSession(sessionLocalInterceptor); return WrapSession(session); } public ISession OpenSession( IDbConnection conn) { var session = _sessionFactory.OpenSession(conn); return WrapSession(session); } private static ISession WrapSession( ISession session) { return NHibernate.Search .Search.CreateFullTextSession(session); } }
ISessionFactory
methods and properties in SessionFactorySearchWrapper
by passing the call to the _sessionFactory
field, as shown in the following code:public IClassMetadata GetClassMetadata(string entityName) { return _sessionFactory.GetClassMetadata(entityName); }
Program.cs
, use the following code:class Program { static void Main(string[] args) { XmlConfigurator.Configure(); var log = LogManager.GetLogger(typeof(Program)); var cfg = new SearchConfiguration(); var sessionFactory = cfg.BuildSessionFactory(); var theBook = new Book() { Name = @"Gödel, Escher, Bach: An Eternal Golden Braid", Author = "Douglas Hofstadter", Description = @"This groundbreaking Pulitzer Prize-winning book sets the standard for interdisciplinary writing, exploring the patterns and symbols in the thinking of mathematician Kurt Godel, artist M.C. Escher, and composer Johann Sebastian Bach.", ISBN = "978-0465026562", UnitPrice = 22.95M }; var theOtherBook = new Book() { Name = "Technical Writing", Author = "Joe Professor", Description = "College text", ISBN = "123-1231231234", UnitPrice = 143.73M }; var thePoster = new Product() { Name = "Ascending and Descending", Description = "Poster of famous Escher print", UnitPrice = 7.95M }; using (var session = sessionFactory.OpenSession()) { using (var tx = session.BeginTransaction()) { session.Delete(«from Product»); tx.Commit(); } } using (var session = sessionFactory.OpenSession()) { using (var tx = session.BeginTransaction()) { session.Save(theBook); session.Save(theOtherBook); session.Save(thePoster); tx.Commit(); } } var products = GetEscherProducts(sessionFactory); OutputProducts(products, log); var books = GetEscherBooks(sessionFactory); OutputProducts(books.Cast<Product>(), log); } private static void OutputProducts( IEnumerable<Product> products, ILog log) { foreach (var product in products) { log.InfoFormat("Found {0} with price {1:C}", product.Name, product.UnitPrice); } } private static IEnumerable<Product> GetEscherProducts( ISessionFactory sessionFactory) { IEnumerable<Product> results; using (var session = sessionFactory.OpenSession() as IFullTextSession) { using (var tx = session.BeginTransaction()) { var queryString = "Description:Escher"; var query = session .CreateFullTextQuery<Product>(queryString); results = query.List<Product>(); tx.Commit(); } } return results; } private static IEnumerable<Book> GetEscherBooks( ISessionFactory sessionFactory) { IEnumerable<Book> results; using (var session = sessionFactory.OpenSession() as IFullTextSession) { using (var tx = session.BeginTransaction()) { var queryString = "Description:Escher"; var query = session .CreateFullTextQuery<Book>(queryString); results = query.List<Book>(); tx.Commit(); } } return results; } }
In this recipe, we've offloaded our full-text queries to a Lucene index in the bin/Debug/Index
folder.
First, let's quickly discuss some Lucene terminology. The Lucene database is referred to as an Index. Each record in the Index is referred to as a Document. In the case of NHibernate Search, each Document in the Index has a corresponding entity in the relational database. Each Document has Fields, and each field comprises a name and value. By default, fields are tokenized or broken up into terms. A term can best be described as a single, significant, lower-case word from some string of words. For example, the string "Bag of Cats" would be tokenized into the terms "bag" and "cat". Additionally, Lucene maintains a map of terms in a field, which documents contain a given term, and the frequency of that term in the document. This makes keyword searches extremely fast.
Entity classes with the Indexed
attribute will be included as documents in the Lucene index. The remaining attributes are used to determine what properties from these entities should be included in the document, and how that data will be stored. Automatically, the _hibernate_class
field stores the entity type. Each searchable entity must have a field or property decorated with the DocumentId
attribute. This is stored in the ID field, and is used to maintain the relationship between entities and documents. In our case, the ID property on Entity
will be used.
To be useful, we should include additional data in our documents using the Field
attribute. For keyword searches, we've included the tokenized name and description of every product, and the author of every book. We've also included the ISBN of every book, but have chosen not to tokenize it because a partial ISBN match is useless.
The SearchConfiguration
class is responsible for building an NHibernate configuration, adding the necessary NHibernate Search settings to the configuration, building an NHibernate session factory, and wrapping the session factory in our search wrapper.
The SessionFactorySearchWrapper
wraps the standard NHibernate session factory and returns IFullTextSearchSession
from calls to OpenSession
. These sessions behave as normal NHibernate sessions, and provide additional methods for creating full-text search queries against the Lucene index. The CreateFullTextQuery
method of the session takes a Lucene query in string or query object form and returns a familiar NHibernate IQuery
interface, the same interface used for HQL and SQL queries. When we call List
or UniqueResult
, the query is executed against our Lucene index. For example, the query in our GetEscherProduct
query will search Lucene for documents with a Description
containing the term escher
. This query returns two results: the GEB book and the M. C. Escher poster. The IDs of each of those search results are gathered up and used to build a SQL database query similar to the next query.
SELECT this_.Id as Id0_0_, this_.Name as Name0_0_, this_.Description as Descript4_0_0_, this_.UnitPrice as UnitPrice0_0_, this_.Director as Director0_0_, this_.Author as Author0_0_, this_.ISBN as ISBN0_0_, this_.ProductType as ProductT2_0_0_ FROM Product this_ WHERE (this_.Id in ('5933e3ba-3092-4db7-8d19-9daf014b8ce4' /* @p0 */,'05058886-8436-4a1d-8412-9db1010561b5' /* @p1 */))
Because this database query is performed on the primary key, it is amazingly fast. The Lucene query is fast because the database was specially designed for that purpose. This has the potential for huge performance and functionality gains over the weak full-text search capabilities in most relational databases.
This is just the most basic example of what we can do with NHibernate Search. We can also choose to store the original value of a field in the document. This is useful when we want to display Lucene query results without querying the SQL database. Additionally, Lucene has many more features, like search-term highlighting and spell-checking. Although Lucene is a very capable document database, remember that it is not relational. There is no support for relationships or references between documents stored in a Lucene index.