Monitoring replication

Replica sets use the operations log (oplog) to keep the synced state. Every operation gets applied on the primary server, and then gets written in the primary server's oplog, which is a capped collection. Secondaries read this oplog asynchronously and apply the operations one by one.

If the primary server gets overloaded, then the secondaries won't be able to read and apply the operations fast enough, generating replication lag. Replication lag is counted as the time difference between the last operation applied on the primary and the last operation applied on the secondary, as stored in the oplog capped collection.

For example, if the time is 4:30:00 PM and the secondary just applied an operation that was applied on our primary server at 4:25:00 PM, this means that the secondary is lagging five minutes behind our primary server.

In our production cluster, the replication lag should be close to (or equal to) zero.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset