Checklists

Operations require the completion of many tasks and complexity. A good practice is to keep a set of checklists with all of the tasks that need to be performed and their order of significance. This will ensure that we don't let something slip through. A deployment and security checklist, for example, could be as follows:

Hardware:
- Storage: How much disk space is needed per node? What is the growth rate?
- Storage technology: Do we need SSD versus HDD? What is the throughput of our storage?
- RAM: What is the expected working set? Can we fit it in the RAM? If not, are we going to be okay with SSD instead of HDD? What is the growth rate?
- CPU: This usually isn't a concern for MongoDB, but it could be if we planned to run CPU-intensive jobs in our cluster (for example, aggregation or MapReduce).
- Network: What are the network links between servers? This is usually trivial if we are using a single data center, but it can get complicated if we have multiple data centers and/or offsite servers for disaster recovery.
Security:
- Enable auth.
- Enable SSL.
- Disable REST/HTTP interfaces.
- Isolate our servers (for example, VPC).
- Authorization is enabled. With great power comes great responsibility. Make sure that the powerful users are the ones that you trust. Don't give potentially destructive powers to inexperienced users.

A monitoring and operations checklist could be as follows:

Monitoring:
- Usage of hardware (CPU, memory, storage, and network).
- Health checks, using Pingdom or an equivalent service to make sure that we get a notification when one of our servers fails.
- Client performance monitoring: Integrating periodic mystery shopper tests using service as a customer in a manual or automated way, from an end-to-end perspective, in order to find out if it behaves as expected. We don't want to learn about application performance issues from our customers.
- Use MongoDB Cloud Manager monitoring; it has a free tier, it can provide useful metrics, and it is the tool that MongoDB engineers can take a look at if we run into issues and need their help, especially as a part of support contracts.
Disaster recovery:
- Evaluate the risk: What is the risk, from a business perspective, of losing MongoDB data? Can we recreate this dataset? If yes, how costly is it in terms of time and effort?
- Devise a plan: Have a plan for each failure scenario, with the exact steps that we need to take in case something happens.
- Test the plan: Having a dry run of every recovery strategy is as important as having one. Many things can go wrong in disaster recovery, and having an incomplete plan (or one that fails in each purpose) is something that we shouldn't allow to happen under any circumstance.
- Have an alternative to the plan: No matter how well we devise a plan and test it, anything can go wrong during planning, testing, or execution. We need to have a backup plan for our plan, in case we can't recover our data using plan A. This is also called plan B, or the last resort plan. It doesn't have to be efficient, but it should alleviate any business reputation risks.
- Load test: We should make sure that we load test our application end to end before deployment, with a realistic workload. This is the only way to ensure that our application will behave as expected.

Table of Contents for Checklists

Create new playlist

Sign In

Sign Up

Table of Contents for
Checklists