Re-syncing a member of a replica set

Secondaries sync up with the primary by replaying the contents of the oplog. If our oplog is not large enough, or if we encounter network issues (partitioning, an underperforming network, or just an outage of the secondary server) for a period of time larger than the oplog, then MongoDB cannot use the oplog to catch up to the primary anymore.

At this point, we have two options:

  • The more straightforward option is to delete our dbpath directory and restart the mongod process. In this case, MongoDB will start an initial sync from scratch. This option has the downside of putting a strain on our replica set, and our network, as well.
  • The more complicated (from an operational standpoint) option is to copy data files from another well-behaving member of the replica set. This goes back to the contents of Chapter 8, Monitoring, Backup, and Security. The important thing to keep in mind is that a simple file copy will probably not suffice, as data files will have changed from the time that we started copying to the time that the copying ended.

Thus, we need to be able to take a snapshot copy of the filesystem under our data directory.

Another point of consideration is that by the time we start our secondary server with the newly copied files, our MongoDB secondary server will try to sync up to the primary using the oplog again. So, if our oplog has fallen so far behind the primary that it can't find the entry on our primary server, this method will fail, too.

Keep a sufficiently sized oplog. Don't let data grow out of hand in any replica set member. Design, test, and deploy sharding early on.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset