Xapian/Djapian

The Djapian project is a search layer for Django that specifically supports the Xapian open source search engine. It was created in Russia by Alex Koshelev and is currently under continued development. Xapian is a search engine library, not specifically a search engine application. Djapian wraps this library and applies this library and attaches the full-text search functionality to a Django model. This follows the pattern we've seen in the previous examples of a Django layer on top of a search engine application or library.

You can get the Djapian library from Google Code at the following URL: http://code.google.com/p/djapian/.

Once installed, we can begin using Djapian in any Django project by adding it to our settings.py file in the INSTALLED_APPS setting and creating a DJAPIAN_DATABASE_PATH setting to a directory for our search indexes.

As before, indexes must be created in Djapian. As it is wrapping the Xapian library, our indexes are defined in Python, unlike Sphinx where indexes and sources are defined in external configuration files. This index definition is very Django-like, similar to the indexes we built for Haystack. They start by subclassing a base indexer class and then define specific fields for the object we're indexing.

The new index is associated with a model class by adding it to the index space. Djapian uses an index space object, which is just a wrapper around the file system storage routines needed to save index information to disk. The location on the file system where the index space lives is defined in the DJAPIAN_DATABASE_PATH setting.

An indexer for our Product model in Djapian would look like this:

from djapian import space, Indexer
from coleman.products.models import Product

class ProductIndexer(Indexer):
fields = ['name', 'description'], tags = [('price_in_dollars', 'price_in_dollars')]
space.add_index(Product, ProductIndexer, attach_as='indexer')

Once we've defined an index, we can use Djapian's management command to process our Product models and build the indexes. We do this by issuing the index command to django-admin.py or manage.py. To build the initial index, we must pass the --refresh argument:

$ django-admin.py index—rebuild

Later calls to index will update the indexes and don't require the rebuild flag, unless we want to delete our indexes and build from scratch.

A really convenient feature of Djapian is the ability to test search results in a special index shell. You can access this feature by passing the indexshell management command:

$ django-admin.py indexshell
>>> use 0.1.0
>>> query "cranberry sauce" 
[<Hit: model=products.Product pk=200, percent=100, rank=0, weight=0.4444>]

This allows us to test the quality of the search results from our indexes in a quick and effective way, without writing any views or other Django code. This can be very useful for testing purposes and to evaluate the indexes you've created.

Searching indexes

When we write Indexer objects and add them to our space object, they are also attached as an attribute to our model class. This happened when we included the attach_as='indexer' keyword argument to the add_index method above. This attribute is what we will use to work with the search engine from Django code.

To perform a search query over our Product models, for example, we obtain the indexer for the model and call the search method:

product_indexer = Product.indexer results = product_indexer.search('cranberry sauce').prefetch()

This searches our product indexes for 'cranberry sauce' and stores the results in results. The results variable will be a Djapian ResultSet object, which is similar to Django's built-in QuerySet objects. It doesn't support the full set of methods that QuerySet does, but you can loop over the items returned, count() its length, slice it, and order the results with the order_by() method, among other operations. It's also compatible with Django's built-in Paginator class.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset