Sharding limitations

Sharding comes with great flexibility. Unfortunately, there are a few limitations in the way that we perform some of the operations.

We will highlight the most important ones in the following list:

  • The group() database command does not work. The group() command should not be used anyway; use aggregate() and the aggregation framework instead, or mapreduce().
  • The db.eval() command does not work and should be disabled in most cases for security reasons.
  • The $isolated option for updates does not work. This is a functionality that is missing in sharded environments. The $isolated option for update() provides the guarantee that, if we update multiple documents at once, other readers and writers will not see some of the documents updated with the new value, and the others will still have the old value. The way this is implemented in unsharded environments is by holding a global write lock and/or serializing operations to a single thread to make sure that every request for the documents affected by update() will not be accessed by other threads/operations. This implementation means that it is not performant and does not support any concurrency, which makes it prohibitive to allow the $isolated operator in a sharded environment.
  • The $snapshot operator for queries is not supported. The $snapshot operator in the find() cursor prevents documents from appearing more than once in the results, as a result of being moved to a different location on the disk after an update. The $snapshot operator is operationally expensive and often not a hard requirement. The way to substitute it is by using an index for our queries on a field whose keys will not change for the duration of the query.

  • The indexes cannot cover our queries if our queries do not contain the shard key. Results in sharded environments will come from the disk and not exclusively from the index. The only exception is if we query only on the built-in _id field and return only the _id field, in which case, MongoDB can still cover the query using built-in indexes.
  • The update() and remove() operations work differently. All update() and remove() operations in a sharded environment must include either the _id of the documents that are to be affected or the shard key; otherwise, the mongos router will have to do a full table scan across all collections, databases, and shards, which would be operationally very expensive.
  • Unique indexes across shards need to contain the shard key as a prefix of the index. In other words, to achieve uniqueness of documents across shards, we need to follow the data distribution that MongoDB follows for the shards.
  • The shard key has to be up to 512 bytes in size. The shard key index has to be in ascending order on the key field that gets sharded and optionally other fields as well, or a hashed index on it.

The shard key value in a document is also immutable. If our shard key for our User collection is email, then we cannot update the email value for any user after we set it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset