Changing the shard key

There is no command or simple procedure to change the shard key in MongoDB. The only way to change the shard key involves backing up and restoring all of our data, something that may range from being extremely difficult to impossible in high-load production environments.

The following are the steps that we need to go through in order to change the shard key:

  1. Export all data from MongoDB
  2. Drop the original sharded collection
  3. Configure sharding with the new key
  4. Presplit the new shard key range
  5. Restore our data back into MongoDB

Of these steps, step 4 is the one that needs further explanation.

MongoDB uses chunks to split data in a sharded collection. If we bootstrap a MongoDB sharded cluster from scratch, chunks will be calculated automatically by MongoDB. MongoDB will then distribute the chunks across different shards to ensure that there are an equal number of chunks in each shard.

The only time when we cannot really do this is when we want to load data into a newly sharded collection.

The reasons for this are threefold:

  • MongoDB creates splits only after an insert operation.
  • Chunk migration will copy all of the data in that chunk from one shard to another.
  • The floor(n/2) chunk migrations can happen at any given time, where n is the number of shards we have. Even with three shards, this is only a floor(1.5)=1 chunk migration at a time.

These three limitations mean that letting MongoDB figure it out on its own will definitely take much longer, and may result in an eventual failure. This is why we want to presplit our data and give MongoDB some guidance on where our chunks should go.

In our example of the mongo_books database and the books collection, this would be as follows:

> db.runCommand( { split : "mongo_books.books", middle : { id : 50 } } )

The middle command parameter will split our key space in documents that have id less than or equal to 50 and documents that have id greater than 50. There is no need for a document to exist in our collection with id that is equal to 50 as this will only serve as the guidance value for our partitions.

In this example, we chose 50, as we assume that our keys follow a uniform distribution (that is, there is the same count of keys for each value) in the range of values from 0 to 100.

We should aim to create at least 20-30 chunks to grant MongoDB flexibility in potential migrations. We can also use bounds and find instead of middle if we want to manually define the partition key, but both parameters need data to exist in our collection before applying them.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset