Backups

A quote from a well-known maxim is as follows:

"Hope for the best, plan for the worst."
                                                                                                       – John Jay (1813)

This should be our approach when designing our backup strategy for MongoDB. There are several distinct failure events that can happen.

Backups should be the cornerstone of our disaster recovery strategy, in case something happens. Some developers may rely on replication for disaster recovery, as it seems that having three copies of our data is more than enough. We can always rebuild the cluster from the other two copies, in case one of the copies is lost.

This is the case in the event of disks failing. Disk failure is one of the most common failures in a production cluster, and will statistically happen once the disks start reaching their mean time between failures (MTBF) time.

However, it is not the only failure event that can happen. Security incidents, or purely human errors, are just as likely to happen, and should be a part of our plan, as well. Catastrophic failures by means of losing all replica set members at once, from a fire, a flood, an earthquake, or a disgruntled employee, are events that should not lead to production data loss.

A useful interim option, in the middle ground between replication and implementing proper backups, could be setting up a delayed replica set member. This member can lag several hours or days behind the primary server so that it will not be affected by malicious changes in the primary. The important detail to take into account is that the oplog needs to be configured so that it can hold several hours of delay. Also, this solution is only an interim, as it doesn't take into account the full range of reasons why we need disaster recovery, but can definitely help with a subset of them.

This is called disaster recovery. Disaster recovery is a class of failures that require backups to be taken not only regularly, but also by using a process that isolates them (both geographically and in terms of access rules) from our production data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset