MySQL simple index searches

If you are using the MySQL database system and MyISAM tables, a very simple Django search interface exists automatically. This is a feature builtin to the Django ORM layer that allows you to perform boolean full-text searches directly from a filter() method call on any QuerySet.

The only caveat to this is that you must build a full text index on the columns you want to search. This is a one-time step, but it's by far the easiest way to get search up and running on your Django application.

The ORM search syntax looks like this:

>>> results = Product.objects.filter(name__search='+cranberry -sauce')

Note the use of + and - characters, which act as operators to explicitly define the search criteria. This is a boolean search, which is very different from what you might expect based on the use of major search engines such as Google or Bing. Boolean search looks for the presence or absence of the terms provided in the simplest way possible. If the term is prefixed with a + symbol, it is explicitly included, while a - symbol excludes the term outright.

The order of the results of this sort of search is not sorted by relevance as you might expect. In fact, when using this built-in ORM search method, the results do not include any relevance score. This means it's not possible to use order_by() on the results QuerySet to sort by relevance.

The above search query '+cranberry -sauce' will return a Product QuerySet for objects whose name field contains the word cranberry, but does not contain the word sauce. If we were to modify this query to be 'cranberry -sauce' the results would be slightly different. This query means 'cranberry' is optional, so Product objects with this term in their name will also be included in the results. Anything containing the term 'sauce' will be excluded.

Under the hood, this search method is performing a MATCH() AGAINST() SQL function. The above query search will translate to SQL that looks something like this:

SELECT ... WHERE MATCH(name) AGAINST (+cranberry -sauce IN BOOLEAN MODE);

This SQL is currently hard-coded into Django and cannot be changed. However, you are free to write your own MATCH() AGAINST() routines and add them to your Django managers. For example, the following Django ORM method call should produce equivalent SQL as the above:

Product.objects.extra(where=Product.objects.extra(where=
    ['MATCH(name) AGAINST(+cranberry -sauce)IN BOOLEAN MODE'])

This could be extended to a manager method that adds the relevance score, like so:

def search(self, query):
     exp = 'MATCH(name) AGAINST(%%s IN BOOLEAN MODE)'
     return self.extra(where=[exp], params=[query],
                              select={'relevance': exp})

In order to support any of these search methods, we first must create a full-text index on our table's columns. In this case, the name column is the only one we're searching. The index is created by issuing the following SQL statement directly to MySQL:

CREATE FULLTEXT INDEX name_idx ON products_product (name);

We only need to create the index once and MySQL will subsequently index new data as rows are added to the products table.

Even with relevance scores added, boolean searches are very limited in their usefulness. They often give good results when searching a single, short field. But imagine the results of the boolean operation for fields with very large amounts of text, such as a product description or blog post body. Boolean search tends to return many false positives for text-rich data.

MySQL also supports natural language searching on columns with full-text indexes. The syntax is very similar to boolean searching, but is not builtin to Django's ORM syntax. But we can perform a natural language search by slightly modifying our manage method from above:

def search(self, query):
    exp = 'MATCH(name) AGAINST(%%s IN NATURAL LANGUAGE MODE)'
    return self.extra(where=[exp], params=[query],
                             select={'relevance': exp})

We can translate this to a more complete search manager class by adding a constructor method that stores a list of fields to include in our full-text search:

class SearchManager(models.Manager):
    def __init__(self, fields):
        models.Manager.__init__(self)
        self.fields = fields

    def search(self, query):
        meta = self.model._meta
        db_columns = ['%s.%s' % (meta.db_table,
                                 meta.get_field(name).column 
                                 for name in self.fields]
        columns = ','.join(db_columns)
        
        exp = 'MATCH(%s) AGAINST (%%s IN NATURAL LANGUAGE MODE)' %   
                                  columns
        return self.extra(where=[exp], params=[query],
                          select={'relevance': exp})

We can add this search manager to any of our models and specify what model attribute fields we would like to perform the full-text search over. These fields all need to have MySQL FULLTEXT indexes created and should generally store text data.

class Product(models.Model):
    ...

    search = SearchManager(['name', 'description'])

MySQL's built-in natural language search is an improvement over our boolean mode search, but as far as search engine techniques goes, it's still relatively simple. It doesn't support features such as word stemming or phrase searching. And as mentioned earlier, full-text indexes in MySQL are only supported for MyISAM database tables and only work for CHAR, VARCHAR, and TEXT column types.

Another limitation of both the MySQL natural language search and boolean mode search is that it can only search on one model per search. More sophisticated search solutions would allow us to search across many different model classes in one query.

In the next sections we will explore some open source search engine solutions and how to integrate them with Django for more advanced functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset