GlusterFS and Ceph are two distributed persistent storage systems. GlusterFS is at its core a network filesystem. Ceph is at the core an object store. Both expose block, object, and filesystem interfaces. Both use the xfs
filesystem under the covers to store the data and metadata as xattr
attributes. There are several reasons why you may want to use GlusterFs or Ceph as persistent volumes in your Kubernetes cluster:
GlusterFS is intentionally simple, exposing the underlying directories as they are and leaving it to clients (or middleware) to handle high availability, replication, and distribution. Gluster organizes the data into logical volumes, which encompass multiple nodes (machines) that contain bricks, which store files. Files are allocated to bricks according to DHT (distributed hash table). If files are renamed or the GlusterFS cluster is expanded or rebalanced, files may be moved between bricks. The following diagram shows the GlusterFS building blocks:
To use a GlusterFS cluster as persistent storage for Kubernetes (assuming you have an up and running GlusterFS cluster), you need to follow several steps. In particular, the GlusterFS nodes are managed by the plugin as a Kubernetes service (although as an application developer it doesn't concern you).
Here is an example of an endpoints resource that you can create as a normal Kubernetes resource using kubectl create
:
{ "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "glusterfs-cluster" }, "subsets": [ { "addresses": [ { "ip": "10.240.106.152" } ], "ports": [ { "port": 1 } ] }, { "addresses": [ { "ip": "10.240.79.157" } ], "ports": [ { "port": 1 } ] } ] }
To make the endpoints persistent, you use a Kubernetes service with no selector to indicate the endpoints are managed manually:
{ "kind": "Service", "apiVersion": "v1", "metadata": { "name": "glusterfs-cluster" }, "spec": { "ports": [ {"port": 1} ] } }
Finally, in the pod spec's volumes
section, provide the following information:
"volumes": [ { "name": "glusterfsvol", "glusterfs": { "endpoints": "glusterfs-cluster", "path": "kube_vol", "readOnly": true } } ]
The containers can then mount glusterfsvol
by name.
The endpoints
tell the GlusterFS volume plugin how to find the storage nodes of the GlusterFS cluster.
Ceph's object store can be accessed using multiple interfaces. Kubernetes supports the RBD (block) and CEPHFS (filesystem) interfaces. The following diagram shows how RADOS – the underlying object store – can be accessed in multiple days. Unlike GlusterFS, Ceph does a lot of work automatically. It does distribution, replication, and self-healing all on its own:
Kubernetes supports Ceph via the Rados Block Device (RBD) interface. You must install ceph-common on each node in the Kubernetes cluster. Once you have your Ceph cluster up and running, you need to provide some information required by the Ceph RBD volume plugin in the pod
configuration file:
monitors
: Ceph monitors.pool
: The name of the RADOS pool. If not provided, the default RBD pool is used.image
: The image name that RBD has created.user
: The RADOS user name. If not provided, the default admin
is used.keyring
: The path to the keyring
file. If not provided, the default /etc/ceph/keyring
is used.secretName
: The name of the authentication secrets. If provided, secretName
overrides keyring
. Note, see the following paragraph about how to create a secret
.fsType
: The filesystem type (ext4
, xfs
, and so on) that is formatted on the device.readOnly
: Whether the filesystem is used as readOnly
.If the Ceph authentication secret
is used, you need to create a secret
object:
apiVersion: v1 kind: Secret metadata: name: ceph-secret type: "kubernetes.io/rbd" data: key: QVFCMTZWMVZvRjVtRXhBQTVrQ1FzN2JCajhWVUxSdzI2Qzg0SEE9PQ==
The pod spec's volumes section looks same as this:
"volumes": [ { "name": "rbdpd", "rbd": { "monitors": [ "10.16.154.78:6789", "10.16.154.82:6789", "10.16.154.83:6789" ], "pool": "kube", "image": "foo", "user": "admin", "secretRef": { "name": "ceph-secret" }, "fsType": "ext4", "readOnly": true } } ]
Ceph RBD supports ReadWriteOnce
and ReadOnlyMany
access modes.
If your Ceph cluster is already configured with CephFS, then you can assign it very easily to pods. Also CephFS supports ReadWriteMany
access modes.
The configuration is similar to Ceph RBD, except you don't have a pool, image, or filesystem type. The secret can be a reference to a Kubernetes secret
object (preferred) or a secret
file:
apiVersion: v1 kind: Pod metadata: name: cephfs spec: containers: - name: cephfs-rw image: kubernetes/pause volumeMounts: - mountPath: "/mnt/cephfs" name: cephfs volumes: - name: cephfs cephfs: monitors: - 10.16.154.78:6789 - 10.16.154.82:6789 - 10.16.154.83:6789 user: admin secretFile: "/etc/ceph/admin.secret" readOnly: true
You can also provide a path as a parameter in the cephfs
system. The default is /
.