Multikey indexes

Indexing scalar (single) values was explained in the preceding sections. However, one of the advantages that we get from using MongoDB is the ability to easily store vector values in the form of arrays.

In the relational world, storing arrays is generally frowned upon, as it violates the normal forms. In a document-oriented database such as MongoDB, it is frequently a part of our design, as we can store and query easily on complex structures of data.

Indexing arrays of documents is achieved by using the multikey index. A multikey index can store both arrays of scalar values and arrays of nested documents.

Creating a multikey index is the same as creating a regular index:

> db.books.createIndex({"tags":1})

Assume that we have created a document in our books collection, using the following command:

> db.books.insert({"name": "MongoDB Multikeys Cheatsheet", "isbn": "1002", "available": 1, "meta_data": {"page_count":128, "average_customer_review":3.9}, "tags": ["mongodb", "index","cheatsheet","new"] })

Our new index will be a multikey index, allowing us to find documents with any of the tags stored in our array:

> db.books.find({tags:"new"})
{
"_id" : ObjectId("5969f4bc14ae9238fe76d7f2"),
"name" : "MongoDB Multikeys Cheatsheet",
"isbn" : "1002",
"available" : 1,
"meta_data" : {
"page_count" : 128,
"average_customer_review" : 3.9
},
"tags" : [
"mongodb",
"index",
"cheatsheet",
"new"
]
}
>

We can also create compound indexes with a multikey index, but we can have, at the most, one array in each and every index document. Given that in MongoDB we don't specify the type of each field, this means that creating an index with two or more fields with an array value will fail at creation time, and trying to insert a document with two or more fields as arrays will fail at insertion time.

For example, a compound index on tags, analytics_data will fail to be created if we have the following document in our database:

{
"_id" : ObjectId("5969f71314ae9238fe76d7f3"),
"name": "Mastering parallel arrays indexing",
"tags" : [
"A",
"B"
],
"analytics_data" : [
"1001",
"1002"
]
}

> db.books.createIndex({tags:1, analytics_data:1})
{
"ok" : 0,
"errmsg" : "cannot index parallel arrays [analytics_data] [tags]",
"code" : 171,
"codeName" : "CannotIndexParallelArrays"
}

Consequently, if we first create the index on an empty collection and try to insert this document, the insert will fail, with the following error:

> db.books.find({isbn:"1001"}).hint("international_standard_book_number_index").explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mongo_book.books",
"indexFilterSet" : false,
"parsedQuery" : {
"isbn" : {
"$eq" : "1001"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"isbn" : 1
},
"indexName" : "international_standard_book_numbe
r_index",
"isMultiKey" : false,
"multiKeyPaths" : {
"isbn" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"isbn" : [
"["1001", "1001"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "PPMUMCPU0142",
"port" : 27017,
"version" : "3.4.7",
"gitVersion" : "cf38c1b8a0a8dca4a11737581beafef4fe120bcd"
},
"ok" : 1
 Hashed indexes cannot be multikey indexes.

Another limitation that we will likely run into when trying to fine-tune our database is that multikey indexes cannot entirely cover a query. Covering a query with the index means that we can get our result data entirely from the index, without accessing the data in our database at all. This can result in dramatically increased performance, as indexes are most likely to be stored in RAM.

Querying for multiple values in multikey indexes will result in a two-step process, from the index's perspective.

In the first step, index will be used to retrieve the first value of the array, and then a sequential scan will run through the rest of the elements in the array; an example is as follows:

> db.books.find({tags: [ "mongodb", "index", "cheatsheet", "new" ] })

This will first search for all entries in multikey index tags that have a mongodb value, and will then sequentially scan through them to find the ones that also have the index, cheatsheet, and new tags.

A multikey index cannot be used as a shard key. However, if the shard key is a prefix index of a multikey index, it can be used. We will cover more on this in Chapter 13, Sharding.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset