The concept of quorum is fundamental to all consensus algorithms that are designed to access information in a fault-tolerant distributed system. The minimum number of votes necessary to achieve consensus among a set of nodes is called a quorum. In Ceph's case, MONs are exploited to persist operations that result in a change of the cluster state. They need to agree to the global order of operations and register them synchronously. Hence, an active quorum of MON nodes is important in order to make progress. To keep a cluster operational, the quorum (or majority) of MON nodes needs to be available at all times. Mathematically, it means we need (n/2)+1 MON nodes available at all times, where n is the total number of MONs provisioned. For example, if we have 5 MONs, we need at least (5/2)+1 = 3 MONs available at all times. Ceph considers all MONs equal when testing for a majority, and thus any three working MONs in the above example will qualify to be within the quorum.
We can obtain the cluster quorum status by using the quorum_status subcommand of our trusty Swiss Army Knife—like the Ceph utility.
root@ceph-client0:~# ceph quorum_status
{
"election_epoch": 6,
"quorum": [
0
],
"quorum_names": [
"ceph-mon0"
],
"quorum_leader_name": "ceph-mon0",
"monmap": {
"epoch": 1,
"fsid": "e6d4e4ab-f59f-470d-bb76-511deebc8de3",
"modified": "2017-09-10 20:20:16.458985",
"created": "2017-09-10 20:20:16.458985",
"mons": [
{
"rank": 0,
"name": "ceph-mon0",
"addr": "192.168.42.10:6789/0"
}
]
}
}
The output is displayed in JSON format. Let's discuss the fields below:
- election_epoch: A counter that indications the number of re-elections that have been proposed and completed to date.
- quorum: A list of the ranks of MON nodes. Each active MON is associated with a unique rank value within the cluster. The value is an integer and starts at zero.
- quorum_names: The unique identifier of each MON process.
- quorum_leader_name: The identifier of a MON that is elected to be the leader of the ensemble. When a client first talks to the cluster, it acquires all the necessary information and cluster maps from the current or acting leader node.
- monmap: The dump of the cluster's MON map.
Ceph's Luminous release adds a new configuration setting called mon priority that lets us adjust priorities of MONs regardless of the values of their IP addresses. This helps us apply custom ordering to all MONs that we want to act as temporary leaders when the previous leader dies. It also allows us to change the existing leader dynamically and in a controlled manner. We might, for example, switch the leader a day before performing firmware upgrades or a disk/chassis replacement, so that the former lead server going down does not trigger an election at a time when we need to concentrate on other priorities.