MON settings

In Ceph releases beginning with Jewel, the MONs do a pretty good job at managing their databases. This was not necessarily the case in older releases. MON DB size is a function of both the numbers of OSDs and PGs in your cluster and of how much topology churn is going on. During periods of heavy recovery or node maintenance, the /var/lib/ceph/mon DB can grow to tens of GBs. This can become problematic on systems that provision meager filesystem space, and in some cases, it can impact MON responsiveness and thus overall cluster snappiness.

Even on Jewel and later releases, this setting is recommended; it directs MON daemons to sweep stale entries from their DBs at startup-time:

[mon]
mon_compact_on_start = true

Another valuable setting is the mouthful mon_osd_down_out_subtree_limit. This affects how Ceph behaves when components go down:

[mon]
mon_osd_down_out_subtree_limit = host

This behavior, like the name, is tricky. This is defined as the smallest CRUSH bucket type that will not automatically be marked out. What this means is that if everything underneath a CRUSH bucket of the specified type fails at once, those items will not have the out state applied to them. With the default value of rack, hosts and OSD buckets will be marked out if they enter the down state due to failure and the cluster will begin recovery to ensure replication policy.

If we change the value to host, this means that if an entire OSD node suddenly bites the dust, Ceph will not mark its OSDs down. If we have a replicated cluster with a failure domain of rack, the loss of an entire host at once will no longer trigger recovery. The idea is that most of the time a host can be brought back up quickly, say by a hard reset because it wedged during reboot. Or maybe we installed a bad kernel that we need to remove before a second reboot.

If the host is legitimately dead, then we have the ability to run backfill/recovery on our own terms at our own rate, say with the ceph-gentle-reweight script we used in Chapter 19Operations and Maintenance.

This option is tricky and can be difficult to understand. If you aren't really sure it's right for you, you should stick with the default.

Many other settings are possible and may benefit your deployment. We do not want to overwhelm you with a raft of settings that you aren't comfortable with and which may not be right for your installation's unique mix of hardware, versions, and use-cases. Once you are conversant with Ceph's components and the dynamics of your clusters, we suggest perusing the larger set of settings detailed at http://docs.ceph.com and the ceph-users mailing list archives.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset