Case-insensitive

Case-sensitivity is a common problem with indexing. We may store our data in mixed caps and need our index to ignore case when looking for our stored data. Until version 3.4, this was dealt with at the application level by creating duplicate fields with all lowercase characters and indexing all lowercase field to simulate a case-insensitive index.

Using the collation parameter, we can create case-insensitive indexes, and even collections that behave as case-insensitive.

In general, collation allows users to specify language-specific rules for string comparisons. A possible (but not the only) usage is for case-insensitive indexes and queries.

Using our familiar books collection, we can create a case-insensitive index on a name, as follows:

> db.books.createIndex( { "name" : 1 },
                          { collation: {
                              locale : 'en',
                              strength : 1
                            }
                          } )

The strength parameter is one of the collation parameters: the defining parameter for case-sensitivity comparisons. Strength levels follow the International Components for Unicode (ICU) comparison levels. The values that it accepts are as follows:

Strength Value	Description
`1a`	Primary level of comparison. Comparison based on string values, ignoring any other differences, such as case and diacritics.
`2`	Secondary level of comparison, which is the comparison based on the primary level and if this is equal then compare diacritics (that is, accents).
`3` (default)	Tertiary level of comparison. Same as level 2, adding case and variants.
`4`	Quaternary level. Limited for specific use cases to consider the punctuation when levels 1-3 ignore punctuation, or for processing Japanese text.
`5`	Identical level. Limited for specific use cases: a tie breaker.

Creating the index with collation is not enough to get back case-insensitive results. We need to specify collation in our query, as well:

> db.books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 1 } )

If we specify the same level of collation in our query as our index, then the index will be used. We could specify a different level of collation, as follows:

> db.books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 2 } )

Here, we cannot use the index, as our index has collation level 1, and our query looks for collation level 2.

If we don't use any collation in our queries, we will get results defaulting to level 3, that is, case-sensitive.

Indexes in collections that were created using a different collation from the default will automatically inherit this collation level.

Suppose that we create a collection with collation level 1, as follows:

> db.createCollection("case_sensitive_books", { collation: { locale: 'en_US', strength: 1 } } )

The following index will also have collation name: 1:

> db.case_sensitive_books.createIndex( { name: 1 } )

Default queries to this collection will be collation strength: 1, case-sensitive. If we want to override this in our queries, we need to specify a different level of collation in our queries, or ignore the strength part altogether. The following two queries will return case-insensitive, default collation level results in our case_sensitive_books collection:

> db.case_sensitive_books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en', strength: 3 } ) // default collation strength value
> db.case_sensitive_books.find( { name: "Mastering MongoDB" } ).collation( { locale: 'en'  } ) // no value for collation, will reset to global default (3) instead of default for case_sensitive_books collection (1)

Collation is a pretty strong and relatively new concept in MongoDB, so we will keep exploring it throughout the different chapters.

Table of Contents for Case-insensitive

Create new playlist

Sign In

Sign Up

Table of Contents for
Case-insensitive