Hash-based sharding

If we don't have a shard key (or can't create one) that achieves the three goals mentioned previously, we can use the alternative strategy of using hash-based sharding. In this case, we are trading data distribution with query isolation.

Hash-based sharding will take the values of our shard key and hash them in a way that guarantees close to uniform distribution. This way, we can be sure that our data will be evenly distributed across the shards. The downside is that only exact match queries will get routed to the exact shard that holds the value. Any range query will have to go out and fetch data from all the shards.

For our example database and collection (mongo_books and books respectively), we have the following:

> sh.shardCollection("mongo_books.books", { id: "hashed" } )

Similar to the preceding example, we are now using the id field as our hashed shard key.

Suppose we use fields with float values for hash-based sharding. Then we will end up with collisions if the precision of our floats is more than 2^53. These fields should be avoided where possible.

Table of Contents for Hash-based sharding

Create new playlist

Sign In

Sign Up

Table of Contents for
Hash-based sharding