Removing the Ceph OSD

Before proceeding with the cluster size reduction, scaling it down, or removing the OSD node, make sure that the cluster has enough free space to accommodate all the data present on the node you are planning to move out. The cluster should not be at its full ratio, which is the percentage of used disk space in an OSD. So, as best practice, do not remove the OSD or OSD node without considering the impact on the full ratio. At the current time of writing this book Ceph-Ansible does not support scaling down of the Ceph OSD nodes in a cluster and this must be done manually.

As we need to scale down the cluster, we will remove ceph-node4 and all of its associated OSDs out of the cluster. Ceph OSDs should be set out so that Ceph can perform data recovery. From any of the Ceph nodes, take the OSDs out of the cluster:

 # ceph osd out osd.9
 # ceph osd out osd.10
 # ceph osd out osd.11

As soon as you mark an OSD out of the cluster, Ceph will start rebalancing the cluster by migrating the PGs out of the OSDs that were made out to other OSDs inside the cluster. Your cluster state will become unhealthy for some time, but it will be good for the server data to clients. Based on the number of OSDs removed, there might be some drop in cluster performance until the recovery time is complete. You can throttle the backfill and recovery as covered in this chapter in throttle backfill and recovery section.
Once the cluster is healthy again, it should perform as usual:

# ceph -s

Here, you can see that the cluster is in the recovery mode but at the same time is serving data to clients. You can observe the recovery process using the following:

        # ceph -w

As we have marked osd.9, osd.10, and osd.11 as out of the cluster, they will not participate in storing data, but their services are still running. Let's stop these OSDs:

root@ceph-node1 # systemctl -H ceph-node4 stop ceph-osd.target

Once the OSDs are down, check the OSD tree; you will observe that the OSDs are down and out:

# ceph osd tree

Now that the OSDs are no longer part of the Ceph cluster, let's remove them from the CRUSH map:

# ceph osd crush remove osd.9
# ceph osd crush remove osd.10
# ceph osd crush remove osd.11

As soon as the OSDs are removed from the CRUSH map, the Ceph cluster becomes healthy. You should also observe the OSD map; since we have not removed the OSDs, it will still show 12 OSDs, 9 UP, and 9 IN:

# ceph -s

Remove the OSD authentication keys:

# ceph auth del osd.9
# ceph auth del osd.10
# ceph auth del osd.11

Finally, remove the OSD and check your cluster status; you should observe 9 OSDs, 9 UP, and 9 IN, and the cluster health should be OK:

# ceph osd rm osd.9
# ceph osd rm osd.10
# ceph osd rm osd.11

To keep your cluster clean, perform some housekeeping; as we have removed all the OSDs from the CRUSH map, ceph-node4 does not hold any items. Remove ceph-node4 from the CRUSH map; this will remove all the traces of this node from the Ceph cluster:

# ceph osd crush remove ceph-node4

Once the OSD node has been removed from the cluster and the CRUSH map, a final validation of the Ceph status should be done to verify HEALTH_OK:

# ceph -s

To complete removal of ceph-node4 from the cluster, update the /etc/ansible/hosts file on ceph-node1 and remove ceph-node4 from the [osds] section so the next time the playbook is run it will not redeploy ceph-node4 as an OSD node:

Table of Contents for Removing the Ceph OSD

Create new playlist

Sign In

Sign Up

Table of Contents for
Removing the Ceph OSD