So far, we've learned about Kubernetes' key concepts, and this chapter is going to be the last one about Kubernetes' core concepts. So far, you've understood that Kubernetes is all about creating an object in the etcd datastore that will be converted into actual computing resources on the Nodes that are part of your cluster.
This chapter will focus on a concept called PersistentVolume. This is going to be another object that you will need to master in order to get persistent storage on your cluster. Persistent storage is achieved in Kubernetes by using the PersistentVolume resource kind, which has its own mechanics. Honestly, these can be relatively difficult to approach at first, but we are going to discover all of that!
In this chapter, we're going to cover the following main topics:
If you do not meet these technical requirements, you can follow Chapter 2, Kubernetes Architecture – From Docker Images to Running Pods, and Chapter 3, Installing Your Kubernetes Cluster, to get these two prerequisites.
When you're creating your Pods, you have the opportunity to create volumes in order to share files between the containers created by them. However, these volumes can represent a massive problem: they are bound to the life cycle of the Pod that created them.
That is why Kubernetes offers another object called PersistentVolume, which is a way to create storage in Kubernetes that will not be bound to the life cycle of a Pod.
Just like the Pod of ConfigMap, PersistentVolume is a resource kind that is exposed through kube-apiserver: you can create, update, and delete persistent volumes using YAML and kubectl just like any other Kubernetes object.
The following command will demonstrate how to list the PersistentVolume resource kind currently provisioned within your Kubernetes cluster:
$ kubectl get persistentvolume
No resource found
The persistentvolume object is also accessible with the plural form of persistentvolumes along with the alias of pv. The following three commands are essentially the same:
$ kubectl get persistentvolume
No resource found
$ kubectl get persistentvolumes
No resource found
$ kubectl get pv
No resource found
You'll find that the pv alias is very commonly used in the Kubernetes world, and a lot of people refer to persistent volumes as simply pv, so be aware of that. As of now, no PersistentVolume object has been created within my Kubernetes cluster, and that is why I don't see any resource listed in the output of the preceding command.
PersistentVolume is the object and, essentially, represents a piece of storage that you can attach to your Pod. That piece of storage is referred to as a Persistent one because it is not supposed to be tied with the life cycle of a Pod.
Indeed, as mentioned in Chapter 5, Using Multi-Container Pods and Design Patterns, Kubernetes Pods uses the notion of volumes. Additionally, we discovered the emptyDir and hostPath volumes, which, respectively, initiate an empty directory that your Pods can share. It also defines a path within the worker Node filesystem that will be exposed to your Pods. Both of these volumes were supposed to be attached to the life cycle of the Pod. This means that once the Pod is destroyed, the data stored within the volumes will be destroyed as well.
However, sometimes, you don't want the volume to be destroyed. You just want it to have its life cycle to keep both the volume and its data alive even if the Pod fails. That's where PersistentVolumes comes into play: essentially, they are volumes that are not tied to the life cycle of a Pod. Since they are a resource kind just like the Pods themselves, they can live on their own!
Important Note
Bear in mind that PersistentVolumes objects are just entries within the etcd datastore, and they are not an actual disk on their own.
PersistentVolume is just a kind of pointer within Kubernetes to an external piece of storage, such as an NFS, a disk, an Amazon EBS volume, and more. This is so that you can access these technologies from within Kubernetes and in a Kubernetes way.
Simply put, PersistentVolume is essentially made up of two different things:
You need to master both concepts in order to understand how to use PersistentVolumes. Let's begin by explaining what PersistentVolume types are.
Kubernetes is supposed to be able to run on as much infrastructure as possible, and even though it started as a Google project, it can be used on many platforms, whether they are public clouds or private solutions.
As you already know, the simplest Kubernetes setup consists of a simple minikube installation, whereas the most complex Kubernetes setup can be made of dozens of servers on massively scalable infrastructure. All of these different setups will forcibly have different ways in which to manage persistent storage. For example, the three biggest public cloud providers have a lot of different solutions. Let's name a few, as follows:
These solutions have their own design and set of principles along with their own logic and mechanics. Kubernetes was built with the principle that all of these setups should be abstracted using just one object to abstract all of the different technologies. And that single object is the PersistentVolume resource kind. The PersistentVolume resource kind is the object that is going to be attached to a running Pod. Indeed, a Pod is a Kubernetes resource and does not know what an EBS or a PD is; Kubernetes Pods only play well with PersistentVolumes, which is also a Kubernetes resource.
Whether your Kubernetes cluster is running on Google GKE, Amazon EKS, or whether it is a single Minkube cluster on your local machine has no importance. When you wish to manage persistent storage, you are going to create, use, and deploy PersistentVolumes objects, and then bind them to your Pods!
Here are some of the backend technologies supported by Kubernetes out of the box:
The preceding list is not exhaustive: Kubernetes is extremely versatile and can be used with many storage solutions that can be abstracted as PersistentVolume objects in your cluster.
When you create a PersistentVolume object, essentially, you are creating a YAML file. However, this YAML file is going to have a different key/value configuration based on the backend technology used by the PersistentVolume objects.
There are three major benefits of PersistentVolume:
Bear in mind that these three statements are not always 100% valid. Indeed, sometimes, a PersistentVolume object can be affected by its underlying technology.
To demonstrate this, let's consider a PersistentVolume object that is, for example, a pointer to an Amazon EBS volume created on your AWS cloud. In this case, the worker Node will be an Amazon EC2 instance. In such a setup, PersistentVolume won't be available to any Node.
The reason is that AWS has some limitations around EBS volumes, which relates to the fact that an EBS volume can only be attached to one instance at a time, and that instance must be provisioned in the same availability zones as the EBS volume. From a Kubernetes perspective, this would only make PersistentVolume (EBS volumes) accessible from EC2 instances (that is, worker Nodes) in the same AWS availability zone, and several Pods running on different Nodes (EC2 instances) won't be able to access the PersistentVolume object at the same time.
However, if you take another example, such as an NFS setup, it wouldn't be the same. Indeed, you can access an NFS from multiple machines at once; therefore, a PersistentVolume object that is backed by an NFS would be accessible from several different Pods running on different Nodes without much problem. To understand how to make a PersistentVolume object on several different Nodes at a time, we need to consider the concept of access modes.
As the name suggests, access modes are an option you can set when you create a PersistentVolume type that will tell Kubernetes how the volume should be mounted.
PersistentVolumes supports three access modes, as follows:
It is necessary to set at least one access mode to a PersistentVolume type, even if said volume supports multiple access modes. Indeed, not all PersistentVolume types will support all access modes.
As mentioned earlier, PersistentVolume types are only a pointer to an external piece of storage. And that piece of storage is constrained by the backend technology that is providing it.
As mentioned earlier, one good example that we can use to explain this is the Amazon EBS volume technology that is accessible within the AWS cloud. When you create a PersistentVolume in Kubernetes, which is a pointer to an Amazon EBS volume, then that PersistentVolume will only support the ReadWriteOnce access mode, whereas NFS supports all three. This is because of the hard limitation mentioned earlier: an EBS volume can only be attached to one Amazon EC2 instance at a time, and it is a hard limit set by AWS. So, in the Kubernetes world, it can only be represented by a PersistentVolume type with an access mode set to ReadWriteOnce.
Simply put, these PersistentVolume types, and the concepts surrounding them, are simply Kubernetes concepts that are only valid within the Kubernetes scope and have absolutely no meaning outside of Kubernetes.
Some PersistentVolume objects will be permissive, while others will have a lot of constraints. And all of this is determined by the underlying technology they are pointing to. No matter what you do with PersistentVolume, you'll have to deal with the restrictions set by your cloud provider or underlying infrastructure.
Now, let's create our first PersistentVolume object.
So, let's create a PersistentVolume on your Kubernetes cluster using the declarative way. Since this is a kind of complex resource, I heavily recommend that you try not to use the imperative way to create such resources:
# ~/pv-hostpath.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hostpath
spec:
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
This is the simplest form of PersistentVolume. Essentially, this YAML file creates a PersistentVolume entry within the Kubernetes cluster. So, this PersistentVolume will be a hostPath type.
It could be a more complex volume such as a cloud-based disk, or an NFS, but in its simplest form, a PersistentVolume can simply be a hostPath type on the worker Node running your Pod.
A bare PersistentVolume entry in our cluster can do nothing on its own and must be seen as a layer of abstraction on the Kubernetes level: outside Kubernetes, the PersistentVolume resource kind has no meaning.
That being said, the PersistentVolume resource kind is a pointer to something else, and that something else can be, for example, a disk, an NFS drive, a Google Cloud PD, or an Amazon EBS volume. All of these different technologies are managed differently. However, fortunately for us, in Kubernetes, they are all represented by the PersistentVolume object.
Simply put, the YAML file to build a PersistentVolume will be a little bit different depending on the backend technology that the PersistentVolume is backed by. For example, if you want your PersistentVolume to be a pointer to an Amazon EBS volume, you have to meet the following two conditions:
And the same logic goes for everything else. For a PersistentVolume to work properly, it needs to forcibly be able to make the link between Kubernetes and the actual storage. So, you need to create a piece of storage or provision it outside of Kubernetes and then create the PersistentVolume entry by including the unique ID of the disk, or the volume, that is backed by a storage technology that is external to Kubernetes. Next, let's take a closer look at some examples of PersistentVolume YAML files.
This example displays a PersistentVolume object that is pointing to an Amazon EBS volume on AWS:
# ~/persistent-volume-ebs.yaml
apiVersion: v1
kind: PersistentVolume
metdata:
name: persistent-volume-ebs
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeId: vol-xxxx
fsType: ext4
As you can see, in this YAML file, awsElasticBlockStore is indicating that this PersistentVolume object is pointing to a volume on my AWS account. The exact Amazon EBS volume is identified by the volumeId key. And that's pretty much it. With this YAML file, Kubernetes is capable of finding the proper EBS volume and maintaining a pointer to it thanks to this PersistentVolume entry.
Of course, since EBS volumes are pure AWS, they can only be mounted on EC2 instances, which means this volume will never work if you attempt to attach it to something else. Now, let's examine a very similar YAML file; however, this time, it's going to point to a GCE PD.
Here is the YAML file that is creating a PersistentVolume object that is pointing to an existing GCE PD:
# ~/persistent-volume-pd.yaml
apiVersion: v1
kind: PersistentVolume
metdata:
name: persistent-volume-pd
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: xxxx
fsType: ext4
Once again, please note that it is the same kind: PersistentVolume object as the one used by the Amazon EBS PersistentVolume object. In fact, it is the same object and the same interface from the Kubernetes side. The only difference is the configuration under gcePersistentDisk, which, this time, points to a PD created on Google Cloud. Kubernetes is so versatile that it can fetch and use different cloud storage solutions just like that.
Next, let's explore one last example in YAML, this time using NFS.
Here is an example YAML file that can create a PersistentVolume object that is backed by an NFS drive:
# ~/persistent-volume-nfs.yaml
apiVersion: v1
kind: PersistentVolume
metdata:
name: persistent-volume-nfs
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteMany
nfs:
path: /opt/nfs
server: nfsxxxx
fsType: ext4
Again, note that this time, we're still using the kind: PersistentVolume entry. Additionally, we are now specifying an nfs path configuration with the path as well as the server address. Now, let's discuss a little bit about the provisioning of the storage resources.
The fact that you need to create the actual storage resource separately and then create a PersistentVolume in Kubernetes might be tedious.
Fortunately for us, Kubernetes is also capable of communicating with the APIs of your cloud provider in order to create the volumes or disk on the fly. There is something called dynamic provisioning that you can use when it comes to managing PersistentVolume. It makes things a lot simpler when dealing with PersistentVolume provisioning, but it only works on supported cloud providers.
However, this is an advanced topic, so we will discuss it, in more detail, later in this chapter.
Now that we know how to provision PersistentVolume objects inside our cluster, we can try to mount them. Indeed, in Kubernetes, once you create a PersistentVolume, you need to mount it to a Pod so that it becomes in use. Things will get slightly more advanced and conceptual here; this Kubernetes uses an intermediate object in order to mount a PersistentVolume to Pods. And this intermediate object is called PersistentVolumeClaim. Let's focus on it next.
So far, we've learned that Kubernetes makes use of two objects to deal with persistent storage technologies. The first one is PersistentVolumes, which represents a piece of storage, and we quoted Google Cloud PD and Amazon EBS volumes as possible backends for PersistentVolume. Additionally, we discovered that depending on the technology that PersistentVolume is relying on, it is going to be exposed to one or more Pods using access modes.
That being said, we can now try to mount a PersistentVolume object to a Pod. To do that, we will need to use another object, which is the second object we need to explore in this chapter, called PersistentVolumeClaim.
Just like PersistentVolume and ConfigMap, PersistentVolumeClaim is another independent resource kind living within your Kubernetes cluster and is the second resource kind that we're going to examine in this chapter.
This object can appear to be a little bit more complex to understand compared to the others. First, bear in mind that even if both names are almost the same, PersistentVolume and PersistentVolumeClaim are two distinct resources that represent two different things.
You can list the PersistentVolumeClaim resource kind created within your cluster using kubectl, as follows:
$ kubectl get persistentvolumeclaims
No resources found in default namespace.
The following output is telling me that I don't have any PersistentVolumeClaim resources created within my cluster. Please note that the pvc alias works, too:
$ kubectl get pvc
No resources found in default namespace.
You'll quickly find that a lot of people working with Kubernetes refer to the PersistentVolumeClaim resources simply with pvc. So, don't be surprised if you see the term pvc here and there while working with Kubernetes. That being said, let's explain what PersistentVolumeClaim resources are in Kubernetes.
The key to understanding the difference between PersistentVolume and PersistentVolumeClaims is to understand that one is meant to represent the storage itself, whereas the other one represents the request for storage that a Pod makes to get the actual storage.
The reason is that Kubernetes is supposed to be used by two types of people:
In fact, there is no problem if you handle both roles in your organization; however, this information is crucial to understand the workflow to mount PersistentVolume to Pods.
Kubernetes was built with the idea that a PersistentVolume object should belong to the cluster administrator scope, whereas PersistentVolumeClaims objects belong to the application developer scope. It is up to the cluster administrator to add PersistentVolumes since they might be hardware resources, whereas developers have a better understanding of what amount of storage and what kind of storage is needed, and that's why the PersistentVolumeClaim object was built.
Essentially, a Pod cannot mount a PersistentVolume object directly. It needs to explicitly ask for it. And that asking action is achieved by creating a PersistentVolumeClaim object and attaching it to the Pod that needs a PersistentVolume object.
This is the only reason why this additional layer of abstractions exists.
Once the developer has built the application, it is their responsibility to ask for a PersistentVolume object if needed. To do that, the developer will write two YAML manifest files:
The Pod application must be written so that the PersistentVolumeClaim object is mounted as a volumeMount configuration key in the YAML file. Please note that for it to work, the PersistentVolumeClaim object needs to be in the same namespace as the application Pod that is mounting it. The PersistentVolume object is never mounted directly to the Pod.
When both YAML files are applied and both resources are created in the cluster, the PersistentVolumeClaim object will look for a PersistentVolume object that matches the criteria required in the claim. Supposing that a PersistentVolume object capable of fulfilling the claim is created and ready in the Kubernetes cluster, the PersistentVolume object will be attached to the PersistentVolumeClaim object.
If everything is okay, the claim is considered fulfilled, and the volume is correctly mounted to the Pod: if you understand this workflow, essentially, you understand everything related to PersistentVolume usage.
Let's summarize this as follows:
This setup might seem complex to understand at first, but you will quickly become used to it.
In this section, I will create a Pod that mounts PersisentVolume within a minikube cluster. This is going to be a kind of PersisentVolume object, but this time, it will not be bound to the life cycle of the Pod. Indeed, since it will be managed as a real PersisentVolume object, the hostPath type will get its life cycle independent of the Pod.
The very first thing to do is to create the PersisentVolume object that will be a hostPath type. Here is the YAML file to do that. Please note that I created this PersisentVolume object with some arbitrary labels in the metadata section. This is so that it will be easier to fetch it from the PersistentVolumeClaim object later:
# ~/pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-hostpath-pv
labels:
type: hostpath
env: prod
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/Users/me/test"
We can now list the PersisentVolume entries available in our cluster, and we should observe that this one exists. Please note that the pv alias works, too:
$ kubectl create -f pv.yaml
persistentvolume/my-hostpath-pv created
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
my-hostpath-pv 1Gi RWO Retain Available 49s
We can see that the PersisentVolume was successfully created.
Now, we need to create two things in order to mount the PersisentVolume object:
Let's proceed, in order, with the creation of the PersistentVolumeClaim object:
# ~/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-hostpath-pvc
spec:
resources:
requests:
storage: 1Gi
selector:
matchLabels:
type: hostpath
env: prod
The important aspect of this PersisentVolumeClaim object is that it is going to fetch the proper volume by using its labels, using the selector key. Let's create it and check that it was successfully created in the cluster. Please note that the pvc alias also works here:
$ kubectl create -f pvc.yaml
persistentvolumeclaim/my-hostpath-pvc created
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-hostpath-pvc Pending standard 53s
Now that the PersisentVolume object and the PersistentVolumeClaim object exist, I can create a Pod that will mount the PV using the PVC.
Let's create an NGINX Pod that will do the job:
# ~/Pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers
- image: nginx
name: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypersistentvolume
volumes:
- name: mypersistentvolume
persistentVolumeClaim:
claimName: my-hostpath-pvc
As you can see, in the volumeMounts section, the PersistentVolumeClaim object is referenced as a volume, and we reference the PVC by its name. Note that the PVC must live in the same namespace as the Pod that mounts it. This is because PVCs are namespace-scoped resources, whereas PVs are not. There are no labels and selectors for this one; to bind a PVC to a Pod, you simply need to use the PVC name.
That way, the Pod will become attached to the PersistentVolumeClaim object, which will find the corresponding PersisentVolume object. This, in the end, will make the host path available and mounted on my NGINX Pod.
Now we can create the three objects in the following order:
Note that before you go any further, you need to make sure that the /Users/me/test directory exists on your host machine or your worker Node. This is because this is the path specified in the PV definition.
You can achieve that using the following commands if you have not already created these resources in your cluster:
$ kubectl create -f pvc.yaml
persistentvolumeclaim/my-hostpath-pvc created
$ kubectl create -f Pod.yaml
Pod/nginx created
Now, let's check that everything is okay by checking the status of our PersistentVolumeClaim object:
zEverything seems to be okay! We have just demonstrated a typical workflow. No matter what kind of storage you need, it's always going to be the same:
And that's it!
So far, we have learned what PersistentVolume and PersistentVolumeClaim objects are and how to use them to mount persistent storage on your Pods.
Next, we must continue our exploration of the PersistentVolume and PersistentVolumeClaim mechanics by explaining the life cycle of these two objects. Because they are independent of the Pods, their life cycles have some dedicated behaviors that you need to be aware of.
PersistentVolume objects are good if you want to keep the state of your app without being constrained by the life cycle of the Pods or containers that are running them.
However, since PersistentVolume objects get their very own life cycle, they have some very specific mechanics that you need to be aware of when you're using them. We'll take a closer look at them next.
The first thing to be aware of when you're using PersistentVolume objects is that they are not namespaced resources, but PersistentVolumeClaims objects are.
That's something very important to know. This is because when a Pod is using a PersistentVolume object, it is only exposed to the PersistentVolumeClaims object. So, its one requirement is that it is created in the same namespace as the Pod that is using it.
That being said, PersistentVolume objects are constrained by namespaces, unlike PersistentVolumeClaim objects. Indeed, they are created cluster-wide. So, do bear that in mind: PersistentVolumeClaim needs to be created in the same namespace resource as the Pods using them, but they are able to fetch PersistentVolume resources that are not in any namespace resource at all.
To figure this out, I invite you to create the following PersistentVolume object using the following YAML file, which will create a PersistentVolume called new-pv-hostpath:
# ~/new-pv-hostpath.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: new-pv-hostpath
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
hostPath:
path: "/home/user/mydirectory"
Once the file has been created, we can apply it against our cluster using the kubectl create -f new-pv-hostpath.yaml command:
$ kubectl create -f new-pv-hostpath
persistentvolume/new-pv-hostpath created
Then, we will run the kubectl get pv/new-hostpath-pv -o yaml | grep -i "namespace" command, which will output nothing. This means the PersistentVolume object is not a namespace:
$ kubectl get api-resources –namespaced=false #kubectl
Olala…
As you can see, the PersistentVolume object appears in the output of the command, which means it is not living in any namespace!
Now, let's examine another important aspect of PersistentVolume known as reclaiming a policy. This is something that is going to be important when you want to unmount a PVC from a running Pod.
When it comes to PersistentVolume, there is a very important option that you need to understand, which is the reclaim policy. But what does this option do?
This option will tell Kubernetes what treatment it should give to your PersistentVolume object when you delete the corresponding PersistentVolumeClaim object that was attaching it to the Pods.
Indeed, deleting a PersistentVolumeClaim object consists of deleting the link between the Pod(s) and your PersistentVolume object, so it's like you unmount the volume and then the volume becomes available again for another application to use. However, in some cases, you don't want that behavior; instead, you want your PersistentVolume object to be automatically removed when its corresponding PersistentVolumeClaim object has been deleted. That's why the reclaim policy option exists and it is what you should configure.
The reclaim policy can be set to three statuses, as follows:
Let's explain these three reclaim policies.
The delete one is the simplest of the three. When you set your reclaim policy to delete, the PersistentVolume object will be wiped out and the PersistentVolume entry will be removed from the Kubernetes cluster when the corresponding PersistentVolumeClaim object is deleted. That's the behavior for sensitive data. So, use this when you want your data to be deleted and not used by any other application. Bear in mind that this is a permanent option, so you might want to build a backup strategy with your underlying storage provider if you need to recover anything.
The retain policy is the second policy and is contrary to the delete policy. If you set this reclaim policy, the PersistentVolume object won't be deleted if you delete its corresponding PersistentVolumeClaim object. Instead, the PersistentVolume object will enter the released status, which means it is still available in the cluster, and all of its data can be manually retrieved by the cluster administrator.
The third policy is the recycle reclaim policy, which is a kind of combination of the previous two policies. First, the volume is wiped of all its data, such as a basic rm -rf volume/* volume. However, the volume itself will remain accessible in the cluster, so you can mount it again on your application.
The reclaim policy can be set in your cluster directly in the YAML definition file at the PersistentVolume level.
The good news with a reclaim policy is that you can change it after the PersistentVolume object has been created; it is a mutable setting. To do that, you can simply list the PVs in your cluster and then issue a kubectl patch command to update the PV of your choice:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
my-hostpath-pv 1Gi RWO Retain Available 24m
As you can see, this PV has a Retain reclaim policy. I'll now update it to Delete using the kubectl patch command against the my-hostpath-pv PersistentVolume:
$ kubectl patch pv/my-hostpath-pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
persistentvolume/my-hostpath-pv patched
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEpath-pv 1Gi RWO Delete Available 3h13m
We can observe that the reclaim policy was updated from Retain to Delete!
Now, let's discuss the different statuses that PVs and PVCs can have.
Just like Pods can be in a different state, such as Pending, ContainerCreating, Running, and more, PersistentVolume and PersistentVolumeClaim can also hold different states. You can identify their state by issuing the kubectl get pv and kubectl get pvc commands.
PersistentVolume has the following different states that you need to be aware of:
On their side, PersistentVolumeClaim can hold one additional status, which is the Terminating one.
Let's explain these different states in more detail.
The Available status indicates that the PersistentVolume object is created and ready to be mounted by a PersistentVolumeClaim object. There's nothing wrong with it, and the PV is just ready to be used.
The Bound status indicates that the PersistentVolume object is currently mounted to one or several Pods. The PersistentVolume is bound to a PersistentVolumeClaim object. Essentially, it indicates that the volume is currently in use. When this status is applied to a PersistentVolumeClaim object, this indicates that the PVC is currently in use: that is, a Pod is using it and has access to a PV through it.
The Terminating status applies to a PersistentVolumeClaim object. This is the status the PVC enters after you issue a kubectl delete pvc command. It is during this phase that the PV the PVC is bound to is destroyed and wiped out. This happens if its reclaim policy is set to Retain and is destroyed when it is then set to Delete.
We now have all the basics relating to PersistentVolume and PersistentVolumeClaim that should be enough to start using persistent storage in Kubernetes. However, there's still something important to know about this topic, and it is called dynamic provisioning. This is a very impressive aspect of Kubernetes that makes it able to communicate with cloud provider APIs to create persistent storage on the cloud. Additionally, it can make this storage available on the cluster by dynamically creating PV objects. In the next section, we will compare static and dynamic provisioning.
So far, we've only provisioned PersistentVolume by doing static provisioning. Now we're going to discover dynamic PersistentVolume provisioning, which enables PersistentVolume provisioning directly from the Kubernetes cluster.
So far, when using static provisioning, you have learned that you have to follow this workflow:
That is called static provisioning. It is static because you have to create the piece of storage before creating the PV and the PVC in Kubernetes. It works well; however, at scale, it can become more and more difficult to manage, especially if you are managing dozens of PV and PVC. Let's say you are creating an Amazon EBS volume to mount it as a PersistentVolume object, and you would do it like this with static provisioning:
Again, it should work, but it would become extremely time-consuming to do at scale, with possibly dozens and dozens of PVs and PVCs.
That's why Kubernetes developers decided that it would be better if Kubernetes was capable of provisioning the piece of actual storage on your behalf along with the PersistentVolume object to serve as a pointer to it. This is known as dynamic provisioning.
When using dynamic provisioning, you configure your Kubernetes cluster so that it authenticates for you on your AWS account. Then, you issue a command to provision an EBS volume and automatically create a PersistentVolume claim to bind it to a Pod.
That way, you can save a huge amount of time by getting things automated. Dynamic provisioning is so useful because Kubernetes supports a wide range of storage technologies. We already introduced a few of them earlier in this chapter, when we mentioned NFS, Google PD, Amazon EBS volumes, and more.
But how does Kubernetes achieve this versatility? Well, the answer is that it makes use of a third resource kind, which we're going to discover in this chapter, that is the StorageClass object.
StorageClass is another resource kind exposed by kube-apiserver. This resource kind is the one that grants Kubernetes the ability to deal with several underlying technologies transparently.
You can access and list the storageclasse resources created within your Kubernetes cluster by using kubectl. Here is the command to list the storage classes:
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) k8s.io/minikube-hostpath Delete Immediate false 24d
Additionally, you can use the plural form of storageclasses along with the sc alias. The following three commands are essentially the same:
$ kubectl get storageclass
$ kubectl get storageclasses
$ kubectl get sc
Note that I haven't included the output of the command for simplicity, but it is essentially the same for the three commands. There are two fields within the command output that are important to us:
Important Note
Note that you can create multiple StorageClass objects that use the same provisioner.
As I'm currently using a minikube cluster, I have a storageclass resource called standard that is using the k8s.io/minikube-hostpath provisioner.
This provider deals with my host filesystem to automatically create provisioned host path volumes for my Pods, but it could be the same for Amazon EBS volumes or Google PDs.
Here is the same output for a Kubernetes cluster based on Google GKE:
$ kubectl get sc
And here is the same output for a Kubernetes cluster based on Amazon EKS:
$ kubectl get sc
As you might have gathered, by default, we get different storage because all of these clusters need to access different kinds of storage. In GKE, Google built a storage class with a provisioner that was capable of interacting with the Google PD's API, which is a pure Google Cloud feature, In contrast, in AWS, we have a storageclass object with a provisioner that is capable of dealing with EBS volume APIs. These provisioners are just libraries that interact with the APIs of these different cloud providers.
The storageclass objects are the reason why Kubernetes is capable of dealing with so many different storage technologies. From a Pod perspective, no matter if it is an EBS volume, NFS drive, or GKE volume, the Pod will only see a PersistentVolume object. All the underlying logic dealing with the actual storage technology is implemented by the provisioner the storageclass object uses.
The good news is that you can add as many storageclass objects with their provisioner as you want to your Kubernetes cluster in a plugin-like fashion. As of writing, the following is a list of PersistentVolume types that are supported in Kubernetes:
By the way, nothing is preventing you from expanding your cluster by adding storageclasses to your cluster. You'll simply add the ability to deal with different storage technologies from your cluster. For example, I can add an Amazon EBS storageclass object to my minikube cluster. But while it is possible, it's going to be completely useless. Indeed, since my minikube setup is not running on an EC2 instance but my local machine, I won't be able to attach an EBS.
When using dynamic storage provisioning, the PersistentVolumeClaim object will get an entirely new role. Since PersistentVolume is gone in this use case, the only object that will be left for you to manage is the PersistentVolumeClaim one.
Let's demonstrate this by creating an NGINX Pod that will mount a hostPath type dynamically. In this example, the administrator won't have to provision a PersistentVolume object at all. This is because the PersistentVolumeClaim object and the StorageClass object will be able to create and provision it together.
Let's start by creating a new namespace, called dynamicstorage, where we will run our examples:
$ kubectl create ns dynamicstorage
namespace/dynamicstorage created
Now, let's run a kubectl get sc command to check that we have a storage class that is capable of dealing with the hostPath that is provisioned in our cluster.
For this specific storageclass object in this specific Kubernetes setup (minikube) we don't have to do anything to get the storageclass object as it is created by default at cluster installation. However, this might not be the case depending on your Kubernetes distribution.
Bear that in mind because it is very important: clusters that have been set up on GKE might have default storage classes that are capable of dealing with Google's storage offerings, whereas an AWS-based cluster might have storageclass to communicate with Amazon's storage offerings and more. With minikube, we have at least one default storageclass object that is capable of dealing with a hostPath-based PersistentVolume object. If you understand that, you should understand that the output of the kubectl get sc command will be different depending on where your cluster has been set up:
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) k8s.io/minikube-hostpath Delete Immediate false 11h
As you can see, we do have a storage class called standard on our cluster that is capable of dealing with hostPath.
Important Note
Some complex clusters spanning across multiple clouds and or on-premises might be provisioned with a lot of different storageclass objects to be able to communicate with a lot of different storage technologies. Bear in mind that Kubernetes is not tied to any cloud provider and, therefore, does not force or limit you in your usage of backing storage solutions.
Now, we will create a PersistentVolumeClaim object that will dynamically create a hostPath type. Here is the YAML file to create the PVC. Please note that storageClassName is set to standard:
# ~/pvc-dynamic.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-dynamic-hostpath-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard # VERY IMPORTANT !
resources:
requests:
storage: 1Gi
selector:
matchLabels:
type: hostpath
env: prod
Following this, we can create it in the proper namespace:
$ kubectl create -f pvc-dynamic.yaml -n dynamicstorage
persistentvolumeclaim/my-dynamic-hostpath-pvc created
Now that this PVC has been created, we can add a new Pod that will mount this PersistentVolumeClaim object. As soon as this claim has been mounted, it will create a PersistentVolume object using the provisioner and then bind to it.
That's how dynamic provisioning works, and it is the same behavior no matter if it is on-premise or in the cloud. Here is a YAML definition file of a Pod that will mount the PersistentVolumeClaim object created earlier:
# ~/pvc-dynamic.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-dynamic-storage
spec:
containers
- image: nginx
name: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypersistentvolume
volumes:
- name: mypersistentvolume
persistentVolumeClaim:
claimName: my-dynamic-hostpath-pvc
Now let's create it in the correct namespace:
$ kubectl create -f Pod-dynamic.yaml -n dynamicstorage
Pod/nginx-dynamic-storage created
Next, let's list the PersistentVolume object. If everything worked, we should get a brand new PersistentVolume object that has been dynamically created and is in the bound state:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-56b79a65-86f6-4db5-b800-2ec415156097 1Gi RWO Delete Bound dynamicstorage/my-dynamic-hostpath-pvc standard 7m19s
Everything is OK! We're finally done with dynamic provisioning! Please note, by default, the reclaim policy will be set to Delete so that the PV is removed when the PVC that created it is removed, too. Don't hesitate to change the reclaim policy if you need to retain sensitive data.
We have arrived at the end of this chapter, which taught you how to manage persistent storage on Kubernetes. You discovered that PersistentVolume is a resource kind that acts as a point to an underlying resource technology, such as hostPath and NFS, along with cloud-based solutions such as Amazon EBS and Google PDs.
Additionally, you discovered that you cannot use your PersistentVolume object without PersistentVolumeClaim, and that PersistentVolumeClaim acts as an object to fetch and mount PersistentVolume to your Pods. You learned that PersistentVolume can hold different reclaim policies, which makes it possible to remove, recycle, or retain them when their corresponding PersistentVolumeClaim object gets removed.
Finally, we discovered what dynamic provisioning is and how it can help us. Bear in mind that you need to be aware of this feature because if you create and retain too many volumes, it can have a negative impact on your cloud bill at the end of the month.
We're now done with the basics of Kubernetes, and this chapter is also the end of this section. In the next section, you're going to discover Kubernetes controllers, which are objects designed to automate certain tasks in Kubernetes, such as maintaining a number of replicas of your Pods, either using the Deployment resource kind or the StatefulSet resource kind. There are still a lot of things to learn!