Chapter 7. Handling Kubernetes Storage

In this chapter, we'll look at how Kubernetes manages storage. Storage is very different from compute, but at a high level they are both resources. Kubernetes as a generic platform takes the approach of abstracting storage behind a programming model and a set of plugins for storage providers. First, we'll go in to detail about the storage conceptual model and how storage is made available to containers in the cluster. Then, we'll cover the common case cloud platform storage providers, such as AWS, GCE, and Azure. Then we'll look at a prominent open source storage provider (GlusterFS from Red Hat), which provides a distributed filesystem. We'll also look into an alternative solution – Flocker – that manages your data in containers as part of the Kubernetes cluster. Finally, we'll see how Kubernetes supports integration of existing enterprise storage solutions.

At the end of this chapter, you'll have a solid understanding of how storage is represented in Kubernetes, the various storage options in each deployment environment (local testing, public cloud, enterprise), and how to choose the best option for your use case.

Persistent volumes walkthrough

In this section, we will understand the Kubernetes storage conceptual model and see how to map persistent storage into containers so they can read and write. Let's start by understanding the problem of storage. Containers and pods are ephemeral. Anything a container writes to its own filesystem gets wiped out when the container dies. Containers can also mount directories from their host node and read or write. That will survive container restarts, but the nodes themselves are not immortal.

There are other problems, such as ownership for mounted hosted directories when the container dies. Just imagine a bunch of containers writing important data to various data directories on their host and then go away leaving all that data all over the nodes with no direct way to tell what container wrote what data. You can try to record this information, but where would you record it? It's pretty clear that for a large-scale system, you need persistent storage accessible from any node to reliably manage the data.

Volumes

The basic Kubernetes storage abstraction is the volume. Containers mount volumes that bind to their pod and they access the storage wherever it may be as if it's in their local filesystem. This is nothing new, and it is great because, as a developer who writes applications that need access to data, you don't have to worry about where and how the data is stored.

Using emptyDir for intra-pod communication

It is very simple to share data between containers in the same pod using a shared volume. Container 1 and container 2 simply mount the same volume and can communicate by reading and writing to this shared space. The most basic volume is the emptyDir. An emptyDir volume is an empty directory on the host. Note that it is not persistent because when the pod is removed from the node, the contents are erased. If a container just crashes, the pod will stick around and you can access it later. Another very interesting option is to use a RAM disk, by specifying the medium as Memory. Now, your containers communicate through shared memory, which is much faster but more volatile of course. If the node is restarted, the emptyDir's volume contents are lost.

Here is a pod configuration file that has two containers that mount the same volume called shared-volume. The containers mount it in different paths, but when the hue-global-listener container is writing a file to /notifications, the hue-job-scheduler will see that file under /incoming:

apiVersion: v1
kind: Pod
metadata:
  name: hue-scheduler
spec:
  containers:
  - image: the_g1g1/hue-global-listener
    name: hue-global-listener
    volumeMounts:
    - mountPath: /notifications
      name: shared-volume
  - image: the_g1g1/hue-job-scheduler
    name: hue-job-scheduler
    volumeMounts:
    - mountPath: /incoming
      name: shared-volume
  volumes:
  - name: shared-volume
    emptyDir: {}

To use the shared memory option, we just need to add medium: Memory to the emptyDir section:

  volumes:
  - name: shared-volume
    emptyDir:
     medium: Memory

Using HostPath for intra-node communication

Sometimes you want your pods to get access to some host information (for example, the Docker Daemon) or you want pods on the same node to communicate with each other. This is useful if the pods know they are on the same host. Since Kubernetes schedules pods based on available resources, pods usually don't know what other pods they share the node with. There are two cases where a pod can rely on other pods being scheduled with it on the same node:

  • In a single-node cluster all pods obviously share the same node
  • DaemonSet pods always share a node with any other pod that matches their selector

For example, in Chapter 6, Using Critical Kubernetes Resources, we discussed a DeamonSet pod that serves as an aggregating proxy to other pods. Another way to implement this behavior is for the pods to simply write their data to a mounted volume that is bound to a host directory and the DaemonSet pod can directly read it and act on it.

Before you decide to use HostPath volume, make sure you understand the limitations:

  • The behavior of pods with the same configuration might be different if they are data-driven and the files on their host are different.
  • It can violate resource-based scheduling (coming soon to Kubernetes) because Kubernetes can't monitor HostPath resources.
  • The containers that access host directories must have a security context with privileged set to true or, on the host side, you need to change the permissions to allow writing.

Here is a configuration file that mounts the /coupons directory into the hue-coupon-hunter container, which is mapped to the host's /etc/hue/data/coupons directory:

apiVersion: v1
kind: Pod
metadata:
  name: hue-coupon-hunter
spec:
  containers:
  - image: the_g1g1/hue-coupon-hunter
    name: hue-coupon-hunter
    volumeMounts:
    - mountPath: /coupons
      name: coupons-volume  
  volumes:
  - name: coupons-volume
    host-path: 
        path: /etc/hue/data/coupons

Since the pod doesn't have a privileged security context, it will not be able to write to the host directory. Let's change the container spec to enable it by adding a security context:

  - image: the_g1g1/hue-coupon-hunter
    name: hue-coupon-hunter
    volumeMounts:
    - mountPath: /coupons
      name: coupons-volume
    securityContext:
        privileged: true      

In the following diagram, you can see that each container has its own local storage area inaccessible to other containers or pods and the host's /data directory is mounted as a volume into both container 1 and container 2:

Using HostPath for intra-node communication

Provisioning persistent volumes

While emptyDir volumes can be mounted and used by containers, they are not persistent and don't require any special provisioning because they use existing storage on the node. HostPath volumes persist on the original node, but if a pod is restarted on a different node, it can't access the HostPath volume from its previous node. Real persistent volumes use storage provisioned ahead of time by administrators. In cloud environments, the provisioning may be very streamlined but it is still required, and as a Kubernetes cluster administrator you have to at least make sure your storage quota is adequate and monitor usage versus quota diligently.

Remember that persistent volumes are resources that the Kubernetes cluster is using similar to nodes. As such they are not managed by the Kubernetes API server.

You can provision resources statically or dynamically.

Provisioning persistent volumes statically

Static provisioning is straightforward. The cluster administrator creates persistent volumes backed up by some storage media ahead of time, and these persistent volumes can be claimed by containers.

Provisioning persistent volumes dynamically

Dynamic provisioning may happen when a persistent volume claim doesn't match any of the statically provisioned persistent volumes. If the claim specified a storage class and the administrator configured that class for dynamic provisioning, then a persistent volume may be provisioned on the fly. We will see examples later when we discuss persistent volume claims and storage classes.

Creating persistent volumes

Here is the configuration file for an NFS persistent volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-1
  annotations:
    volume.beta.kubernetes.io/storage-class: "normal"
  labels:
     release: stable
     capacity: 100Gi   
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
   - ReadOnlyMany
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    path: /tmp
    server: 172.17.0.8

A persistent volume has a spec and metadata that includes the name and possibly an annotation of a storage class. The storage class annotation will become an attribute when storage classes get out of beta. Note that persistent volumes are at v1, but storage classes are still in beta. More on storage classes later. Let's focus on the spec here. There are four sections: capacity, access mode, reclaim policy, and the volume type (nfs in the example).

Capacity

Each volume has a designated amount of storage. Storage claims may be satisfied by persistent volumes that have at least that amount of storage. In the example, the persistent volume has a capacity of 100 Gibibytes (230 bytes). It is important when allocating static persistent volumes to understand the storage request patterns. For example, if you provision 20 persistent volumes with 100 GiB capacity and a container claims a persistent volume with 150 GiB, then this claim will not be satisfied even though there is enough capacity overall:

capacity:
    storage: 100Gi

Access modes

There are three access modes:

  • ReadOnlyMany: Can be mounted read-only by many nodes
  • ReadWriteOnce: Can be mounted as read-write by a single node
  • ReadWriteMany: Can be mounted as read-write by many nodes

The storage is mounted to nodes, so even with ReadWriteOnce multiple containers on the same node can mount the volume and write to it. If that causes a problem, you need to handle it though some other mechanism (for example, claim the volume only in DaemonSet pods that you know will have just one per node).

Different storage providers support some subset of these modes. When you provision a persistent volume, you can specify which modes it will support. For example, NFS supports all modes, but in the example, only these modes were enabled:

accessModes:
    - ReadWriteMany
   - ReadOnlyMany

Reclaim policy

The reclaim policy determines what happens when a persistent volume claim is deleted. There are three different policies:

  • Retain – the volume will need to be reclaimed manually
  • Delete – the associated storage asset such as AWS EBS, GCE PD, Azure disk, or OpenStack Cinder volume is deleted
  • Recycle – delete content only (rm -rf /volume/*)

The Retain and Delete policies mean the persistent volume is not available anymore for future claims. The recycle policy allows the volume to be claimed again.

Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure disk, and Cinder volumes support deletion. Dynamically provisioned volumes are always deleted.

Volume type

The volume type is specified by name in the spec. There is no volumeType section. In the preceding example, nfs is the volume type:

nfs:
    path: /tmp
    server: 172.17.0.8

Each volume type may have its own set of parameters. In this case, it's a path and server.

We will go over various volume types later.

Making persistent volume claims

When containers want access to some persistent storage they make a claim (or rather, the developer and cluster administrator coordinate on necessary storage resources to claim). Here is a sample claim that matches the persistent volume from the previous section:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: storage-claim
  annotations:
    volume.beta.kubernetes.io/storage-class: "normal"
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 80Gi
  selector:
    matchLabels:
      release: "stable"
    matchExpressions:
      - {key: capacity, operator: In, values: [80Gi, 100Gi]}

In the metadata, you can see the storage class annotation. The name storage-claim will be important later when mounting the claim into a container.

The access mode in the spec is ReadWriteOnce, which means if the claim is satisfied no other claim with the ReadWriteOnce access mode can be satisfied, but claims for ReadOnlyMany can still be satisfied.

The resources section requests 80 GiB. This can be satisfied by our persistent volume, which has a capacity of 100 Gi. But, this is a little bit of a waste because 20 Gi will not be used by definition.

The selector section allows you to filter available volumes further. For example, here the volume must match the label release: stable and also have a label with either capacity: 80 Gi or capacity: 100 Gi. Imagine that we have several other volumes provisioned with capacities of 200 Gi and 500 Gi. We don't want to claim a 500 Gi volume when we only need 80 Gi.

Kubernetes always tries to match the smallest volume that can satisfy a claim, but if there are no 80 Gi or 100 Gi volumes then the labels will prevent assigning a 200 Gi or 500 Gi volume and use dynamic provisioning instead.

It's important to realize that claims don't mention volumes by name. The matching is done by Kubernetes based on storage class, capacity, and labels.

Finally, persistent volume claims belong to a namespace. Binding a persistent volume to a claim is exclusive. That means that a persistent volume will be bound to a namespace. Even if the access mode is ReadOnlyMany or ReadWriteMany, all the pods that mount the persistent volume claim must be from that claim's namespace.

Mounting claims as volumes

OK. We have provisioned a volume and claimed it. It's time to use the claimed storage in a container. This turns out to be pretty simple. First, the persistent volume claim must be used as a volume in the pod and then the containers in the pod can mount it, just like any other volume. Here is a pod configuration file that specifies the persistent volume claim we created earlier (bound to the NFS persistent volume we provisioned):

kind: Pod
apiVersion: v1
metadata:
  name: the-pod
spec:
  containers:
    - name: the-container
      image: some-image
      volumeMounts:
      - mountPath: "/mnt/data"
        name: persistent-volume
  volumes:
    - name: persistent-volume
      persistentVolumeClaim:
        claimName: storage-claim

The key is in the persistentVolumeClaim section under volumes. The claim name (storage-claim here) uniquely identifies within the current namespace the specific claim and makes it available as a volume named persistent-volume here. Then, the container can refer to it by its name and mount it to /mnt/data.

Storage classes

Storage classes let an administrator configure your cluster with custom persistent storage (as long as there is a proper plugin to support it). A storage class has a name in the metadata (it must be specified in the annotation to claim), a provisioner, and parameters.

The storage class is still in beta as of Kubernetes 1.5. Here is a sample storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

You may create multiple storage classes for the same provisioner with different parameters. Each provisioner has its own parameters.

The currently supported volume types are as follows:

  • emptyDir
  • hostPath
  • gcePersistentDisk
  • awsElasticBlockStore
  • nfs
  • iscsi
  • flocker
  • glusterfs
  • rbd
  • cephfs
  • gitRepo
  • secret
  • persistentVolumeClaim
  • downwardAPI
  • azureFileVolume
  • azureDisk
  • vsphereVolume
  • Quobyte

This list contains both persistent volumes and other volume types, such as gitRepo or secret, that are not backed by your typical network storage. This area of Kubernetes is still in flux and, in the future, it will be decoupled further and the design will be cleaner, where the plugins are not part of Kubernetes itself. Utilizing volume types intelligently is a major part of architecting and managing your cluster.

Default storage class

The cluster administrator can also assign a default storage class. When a default storage class is assigned and the DefaultStorageClass admission plugin is turned on, then claims with no storage class will be dynamically provisioned using the default storage class. If the default storage class is not defined or the admission plugin is not turned on, then claims with no storage class can only match volumes with no storage class.

Demonstrating persistent volume storage end to end

To illustrate all the concepts, let's do a mini demonstration where we create a HostPath volume, claim it, mount it, and have containers write to it.

Let's start by creating a hostPath volume. Save the following in persistent-volume.yaml:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: persistent-volume-1
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/tmp/data"

> kubectl create -f persistent-volume.yaml
persistentvolume "persistent-volume-1" created

To check out the available volumes, you can use the resource type persistentvolumes or pv for short:

> kubectl get pv
NAME                  CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM     REASON    AGE
persistent-volume-1   1Gi        RWX           Retain          Available                       6m

The capacity is 1 GiB as requested. The reclaim policy is Retain because host path volumes are retained. The status is Available because the volume has not been claimed yet. The access mode is specified a RWX, which means ReadWriteMany. All access modes have a shorthand version:

  • RWOReadWriteOnce
  • ROXReadOnlyMany
  • RWXReadWriteMany

We have a persistent volume. Let's create a claim. Save the following to persistent-volume-claim.yaml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: persistent-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Then, run the following command:

> kubectl create -f  .persistent-volume-claim.yaml 
persistentvolumeclaim "persistent-volume-claim" created

Let's check the claim and the volume:

k get pvc
NAME                      STATUS    VOLUME                CAPACITY   ACCESSMODES   AGE
persistent-volume-claim   Bound     persistent-volume-1   1Gi        RWX           27s

> k get pv
NAME                  CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                             REASON    AGE
persistent-volume-1   1Gi        RWX           Retain          Bound     default/persistent-volume-claim             40m

As you can see, the claim and the volume are bound to each other. The final step is to create a pod and assign the claim as a volume. Save the following to shell-pod.yaml:

kind: Pod
apiVersion: v1
metadata:
  name: just-a-shell
  labels:
    name: just-a-shell
spec:
  containers:
    - name: a-shell
      image: ubuntu
      command: ["/bin/bash", "-c", "while true ; do sleep 10 ; done"]
      volumeMounts:
      - mountPath: "/data"
        name: pv
    - name: another-shell
      image: ubuntu
      command: ["/bin/bash", "-c", "while true ; do sleep 10 ; done"]
      volumeMounts:
      - mountPath: "/data"
        name: pv
  volumes:
    - name: pv
      persistentVolumeClaim:
       claimName: persistent-volume-claim

This pod has two containers that use the Ubuntu image and both run a shell command that just sleeps in an infinite loop. The idea is that the container will keep running, so we can connect to it later and check its filesystem. The pod mounts our persistent volume claim with a volume name of pv. Both containers mount it into their /data directory.

Let's create the pod and verify that both containers are running:

> kubectl create -f shell-pod.yaml
pod "just-a-shell" created

> kubectl get pods
NAME           READY     STATUS    RESTARTS   AGE
just-a-shell   2/2       Running   0          1h

Then, ssh to the node. This is the host whose /tmp/data is the pod's volume that mounted as /data into each of the running containers:

> minikube ssh
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
                          __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___  __| | ___   ___| | _____ _ __
| '_  / _  / _ | __| __) / _` |/ _  / __| |/ / _  '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.11.1, build master : 901340f - Fri Jul  1 22:52:19 UTC 2016
Docker version 1.11.1, build 5604cbe
docker@minikube:~$

Inside the node, we can communicate with the containers using Docker commands. Let's look at the last two running containers:

docker@minikube:~$ docker ps -n=2
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
3c91a46b834a        ubuntu              "/bin/bash -c 'while "   About an hour ago   Up About an hour                        k8s_another-shell.b64b3aab_just-a-shell_default_ebf12a22-cee9-11e6-a2ae-4ae3ce72fe94_8c7a8408
f1f9de10fdfd        ubuntu              "/bin/bash -c 'while "   About an hour ago   Up About an hour                        k8s_a-shell.1a38381b_just-a-shell_default_ebf12a22-cee9-11e6-a2ae-4ae3ce72fe94_451fa9ec

Then, let's create a file in the /tmp/data directory on the host. It should be visible by both containers via the mounted volume:

docker@minikube:~$ sudo touch /tmp/data/1.txt

Let's execute a shell on one of the containers, verify that the file 1.txt is indeed visible, and create another file, 2.txt:

docker@minikube:~$ docker exec -it 3c91a46b834a /bin/bash
root@just-a-shell:/# ls /data
1.txt
root@just-a-shell:/# touch /data/2.txt
root@just-a-shell:/# exit
Finally, we can run a shell on the other container and verify that both 1.txt and 2.txt are visible:
docker@minikube:~$ docker exec -it f1f9de10fdfd /bin/bash
root@just-a-shell:/# ls /data
1.txt  2.txt
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset