Building indexes on replica sets

In replica sets, if we issue a createIndex() command, secondaries will begin to create the index after the primary server has finished creating it. Similarly, in sharded environments, primaries will start to build indexes, and secondaries will start after the primary for each shard that is finished.

Recommended approaches to building indexes in replica sets are as follows:

  • Stop one secondary from the replica set
  • Restart it as a standalone server in a different port
  • Build the index from the shell as a standalone index
  • Restart the secondary in the replica set
  • Allow for the secondary to catch up with the primary

We need to have a large enough oplog size in the primary to make sure that the secondary will be able to catch up once it's reconnected. The oplog size is defined in MB in the configuration, and it defines how many operations will be kept in the log in the primary server. If the oplog size can only hold up to the last 100 operations happening in the primary, and 101 or more operations happen, this means that the secondary will not be able to sync with the primary. This is a consequence of the primary not having enough memory to keep track of its operations and inform the secondary of them. Building indexes in replica sets is a manual process, involving several steps for each primary and secondary server.

This approach can be repeated for each secondary server in the replica set. Then, for the primary server, we can do either of these things:

  • Build the index in the background
  • Step down the primary by using rs.stepDown(), and repeat the preceding process with the server as a secondary

Using approach number two, when the primary steps down, there will be a brief period when our cluster won't be taking any writes. Our application shouldn't timeout during this (usually less than) 30-60 second period.

Building an index in the background in the primary will build it in the background for the secondaries too. This may impact writes in our servers during index creation, but on the plus side, it has no manual steps. It is always a good idea to have a staging environment that mirrors production, and dry run operations that affect the live cluster in staging, in order to avoid surprises.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset