Understanding the Ceph Filesystem and MDS

The Ceph Filesystem offers the POSIX-compliant distributed filesystem of any size that uses Ceph RADOS to store its data. To implement the Ceph Filesystem, you need a running Ceph storage cluster and at least one Ceph Metadata Server (MDS) to manage its metadata and keep it separated from data, which helps in reducing complexity and improves reliability. The following diagram depicts the architectural view of Ceph FS and its interfaces:

The libcephfs libraries play an important role in supporting its multiple client implementations. It has the native Linux kernel driver support, and thus clients can use native filesystem mounting, for example, using the mount command. It has tight integration with SAMBA and support for CIFS and SMB. Ceph FS extends its support to Filesystem in USErspace (FUSE) using cephfuse modules. It also allows direct application interaction with the RADOS cluster using the libcephfs libraries. Ceph FS is gaining popularity as a replacement for Hadoop HDFS. Previous versions of HDFS only supported the single name node, which impacts its scalability and creates a single point of failure; however, this has been changed in the current versions of HDFS. Unlike HDFS, Ceph FS can be implemented over multiple MDS in an active-active state, thus making it highly scalable, high performing, and with no single point of failure.

Ceph MDS is required only for the Ceph FS; other storage methods' block and object-based storage does not require MDS services. Ceph MDS operates as a daemon, which allows the client to mount a POSIX filesystem of any size. MDS does not serve any data directly to the client; data serving is done only by the OSD. MDS provides a shared coherent filesystem with a smart caching layer, hence drastically reducing reads and writes. It extends its benefits towards dynamic subtree partitioning and a single MDS for a piece of metadata. It is dynamic in nature; daemons can join and leave, and the takeover to failed nodes is quick.

MDS does not store local data, which is quite useful in some scenarios. If an MDS daemon dies, we can start it up again on any system that has cluster access. The Metadata Server's daemons are configured as active or passive. The primary MDS node becomes active and the rest will go into standby. In the event of primary MDS failure, the second node takes charge and is promoted to active. For even faster recovery, you can specify that a standby node should follow one of your active nodes, which will keep the same data in memory to pre-populate the cache.

Jewel (v10.2.0) is the first Ceph release to include stable Ceph FS code and fsck/repair tools, although multiple active MDS is running safely, and snapshots are still experimental. Ceph FS development continues to go at a very fast pace, and we can expect it to be fully production-ready in the Luminous release. For your non-critical workloads, you can consider using Ceph FS with single MDS and no snapshots.

In the coming sections, we will cover recipes for configuring both kernel and FUSE clients. Which client you choose to use is based on your use case. But the FUSE client is the easiest way to get up-to-date code, while the kernel client will often give better performance. Also, the clients do not always provide equivalent functionality; for example, the FUSE client supports client-enforced quotas while the kernel client does not.

Table of Contents for Understanding the Ceph Filesystem and MDS

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding the Ceph Filesystem and MDS