In this chapter, we’ll go further into exploring container runtimes, networking, interfaces, and learning about service meshes. We will see which runtime implementations exist and the difference between them, learn how containers can communicate with each other over the network, which container interfaces exist in Kubernetes, and get to know what a service mesh is and its applications. We will also do a few more exercises using the Docker tooling we have previously installed to support our journey.
The contents of this chapter will cover topics from the Container Orchestration domain of the KCNA certification, which is the second biggest part of the exam, so make sure to answer all questions at the end of the chapter.
Here are the topics we’re going to cover:
Let’s get started!
As you know from the previous chapters, containers can run on virtual machines, in the cloud, on-premise, on bare-metal servers, or simply on your laptop. The software responsible for basic operations such as downloading images from the registry and creating, starting, stopping, or deleting containers is called the container runtime. We’ve already learned about Docker tooling and runtime, but there are more runtimes that exist, including the following:
Before going into runtime specifics, we need to understand what a Container Runtime Interface (CRI) is.
CRI
The CRI is a plugin interface that allows Kubernetes to use different container runtimes. In the first releases of Kubernetes before the CRI was introduced, it was only possible to use Docker as a runtime.
As you might remember, Kubernetes does not have its own runtime to do basic container operations, so it needs a runtime to manage containers and this runtime has to be CRI compatible. For example, Docker Engine does not support the CRI, but most of the other runtimes, including containerd or CRI-O, do. Essentially, the CRI defines the protocol for communication between Kubernetes and the runtime of choice using gRPC (the high-performance Remote Procedure Call framework), as shown in Figure 4.1:
Figure 4.1 – Container runtime integration with CRI
Initially, there was no CRI implementation in Kubernetes, but as new container runtimes were developed, it became increasingly hard to incorporate all of them into Kubernetes, so the solution was to define a standard interface that would allow compatibility with any runtime. The introduction of the CRI in Kubernetes version 1.5 allowed the use of multiple container runtimes within a single K8s cluster and also made it easier to develop compatible runtimes. Today, containerd is the most used runtime with newer versions of Kubernetes.
But why would you need to run a mix of different runtimes in the same cluster? This is a rather advanced scenario and the main reason behind it is that some runtimes can provide better security for more sensitive container workloads. Therefore, when we talk about containers and their runtimes, we need to distinguish three main types:
While this might sound very complicated, for the scope of the KCNA exam, you don’t really need to know all the details about container runtimes. This knowledge will be needed if you ever go for a CKS exam or have a special use case for using sandboxed or virtualized containers. For now, make sure to remember which container runtimes exist and the fact that in most scenarios, namespaced containers are used. Also, don’t confuse CRI with OCI, which we covered in Chapter 2, Overview of CNCF and Kubernetes Certifications.
Important note
The Open Container Initiative (OCI) provides the industry specifications for containers (image, runtime, and distribution specs) while CRI is a part of Kubernetes that makes it possible to use different runtimes with K8s in a pluggable way.
In practice, you do not interact with container runtimes directly but instead use orchestration systems such as Kubernetes or Docker Swarm. We can also use a CLI to talk to container runtimes as we did with the Docker CLI or as you can with the ctr or nerdctl CLI when using the containerd runtime.
Moving on, in the following section, we are going to learn more about container networking.
We have only tried creating individual containers so far, however, in the real world, we would need to deal with tens and often hundreds of containers. As the microservice architectures gained wider adoption, the applications were split into multiple smaller parts that communicate with each other over the network. One application could be represented by the frontend part, several backend services, and the database layer, where end-user requests hitting the frontend will trigger communication with the backend, and the backend will talk with the database. When each component is running in its own container across multiple servers, it is important to understand how they can all talk with each other. Networking is a large part of containers and Kubernetes, and it can be really challenging to understand how things work. For the moment, we are only going to touch the surface of container-to-container communication and continue with more details such as exposing containers and K8s specifics in the later chapters.
Let’s get back to the Docker tooling we installed in the previous chapter and try starting another Ubuntu container.
Important note
Make sure that Docker Desktop is running before attempting to spawn containers. If you have not enabled auto-start previously, you might need to start it manually. On Linux with Docker Engine, you might need to execute $ sudo systemctl start docker.
Open the terminal and run the following:
$ docker run -it ubuntu:22.04 bash
Because the image is stripped down to the minimum to save space, there are no preinstalled basic packages such as net-tools. Let’s install those inside our container by calling apt update and apt install:
root@5919bb5d37e3:/# apt update; apt -y install net-tools … SOME OUTPUT OMITTED … Reading state information... Done The following NEW packages will be installed: net-tools … SOME OUTPUT OMITTED … Unpacking net-tools (1.60+git20181103.0eebece-1ubuntu5) ... Setting up net-tools (1.60+git20181103.0eebece-1ubuntu5) ... root@5919bb5d37e3:/#
Now that we have net-tools installed, we can use the ifconfig tool inside the container. The output you’ll see should be similar to this:
root@5919bb5d37e3:/# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.17.0.2 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet) RX packets 14602 bytes 21879526 (21.8 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3127 bytes 174099 (174.0 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 5 bytes 448 (448.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5 bytes 448 (448.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We can also see the container’s routing table by calling the route tool inside the container. The output will be similar to the following:
root@9fd192b5897d:/# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
As we can see, our container has an eth0 interface with the 172.17.0.2 IP address. In your case, the address might be different, but the important part is that our containers, by default, will have their own isolated networking stack with their own (virtual) interfaces, routing table, default gateway, and so on.
If we now open another terminal window and execute docker network ls, we will see which network types are supported using which drivers. The output will be similar to the following:
$ docker network ls NETWORK ID NAME DRIVER SCOPE c82a29c5280e bridge bridge local 83de399192b0 host host local d4c7b1acbc0d none null local
There are three basic network types:
Pay attention that those types in the output of docker network ls have the local scope, meaning that they can be used on individual hosts where we spawn containers with Docker. But they won’t allow containers created on one server to communicate with containers created on another server directly (unless host networking is used, which is similar to running applications directly on the host when no containers are involved).
In order to establish networking between multiple hosts where we spawn containers communicating with each other, we need a so-called overlay network. Overlay networks connect multiple servers together, allowing communication between containers located on different hosts.
Overlay network
An overlay network is a virtual network running on top of another network, typically using packet encapsulation – an overlay network packet resides inside another packet that is forwarded to a particular host.
Whether you are running Kubernetes, Docker Swarm, or another solution to orchestrate containers, in the real world, you’ll always run multiple hosts for your workloads, and containers running on those hosts need to talk with each other using overlay networks.
When it comes to Kubernetes, similar to the CRI, it implements a Container Network Interface (CNI) that allows the usage of different overlay networks in a pluggable manner.
CNI
A CNI is an interface that allows Kubernetes to use different overlay networking plugins for containers.
The introduction of the CNI has allowed third parties to develop their own solutions that are compatible with Kubernetes and offer their own unique features, such as traffic encryption or network policies (firewall rules) in container networks.
Some of the CNI network plugins used with Kubernetes today are flannel, Cilium, Calico, and Weave, just to name a few. Kubernetes also makes it possible to use multiple plugins at the same time with Multus (a Multi-Network Plugin); however, this is an advanced topic that is out of scope for the KCNA exam. In Part 3, Learn Kubernetes Fundamentals, of the book, we will have another closer look at networking in Kubernetes, but now it is time to look further into container storage.
Containers are lightweight by design and, as we saw earlier, often even the basic tools such as ifconfig and ping might not be included in container images. That is because containers represent a minimal version of the OS environment where we only install an application we are going to containerize with its dependencies. You don’t usually need many packages or tools pre-installed inside container images except for those required for your application to run.
Containers also don’t keep the state by default, meaning that if you’ve placed some files inside the container filesystem while it was running and deleted the container after, all those files will be completely gone. Therefore, it is common to call containers stateless and the on-disk files in containers ephemeral.
That does not mean we cannot use containers for important data that we need to persist in case a container fails or an application exits.
Note
In case the application running inside the container fails, crashes, or simply terminates, the container also stops by default.
It is possible to keep the important data from the container by using external storage systems.
External storage can be a block volume attached to the container with a protocol such as iSCSI or it could be a Network File System (NFS) mount, for example. Or, external could also simply be a local directory on your container host. There are many options out there, but we commonly refer to external container storage as volumes.
One container can have multiple volumes attached and those volumes can be backed by different technologies, protocols, and hardware. Volumes can also be shared between containers or detached from one container and attached to another container. Volume content exists outside of the container life cycle, allowing us to decouple container and application data. Volumes allow us to run stateful applications in containers that need to write to disk, whether it is a database, application, or any other files.
Let’s get back to our computer with Docker tooling and try to run the following in the terminal:
$ docker run -it --name mycontainer --mount source=myvolume,target=/app ubuntu:22.04 bash
As we run it and attach tty to a container, we should be able to see our new myvolume mounted inside container at /app:
root@e642a068d4f4:/# df -h Filesystem Size Used Avail Use% Mounted on overlay 126G 7.9G 112G 7% / tmpfs 64M 0 64M 0% /dev tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm /dev/vda1 126G 7.9G 112G 7% /app tmpfs 3.0G 0 3.0G 0% /proc/acpi tmpfs 3.0G 0 3.0G 0% /sys/firmware root@e642a068d4f4:/# cd /app/ root@e642a068d4f4:/app#
What happened is that Docker automatically created and attached a local volume for our container at the start. Local means the volume is backed by a directory on the host where the container was started.
Important note
Local storage can be used for testing or some development, but by no means is it suitable for production workloads and business-critical data!
If we now write any files to /app, they will persist:
root@e642a068d4f4:/app# echo test > hello_world root@e642a068d4f4:/app# cat hello_world test root@e642a068d4f4:/app# exit exit
Even if we remove the container by calling docker rm:
$ docker rm mycontainer mycontainer
By calling docker volume ls, we are able to see which volumes currently exist on our host:
$ docker volume ls DRIVER VOLUME NAME local myvolume
To find more details about the volume, we can use the docker volume inspect command:
$ docker volume inspect myvolume [ { "CreatedAt": "2022-05-15T18:00:06Z", "Driver": "local", "Labels": null, "Mountpoint": "/var/lib/docker/volumes/myvolume/_data", "Name": "myvolume", "Options": null, "Scope": "local" } ]
Feel free to experiment more with volumes yourself at this point. For example, you could create a new container and attach the existing volume to make sure the data is still there:
$ docker run -it --name mycontainer2 --mount source=myvolume,target=/newapp ubuntu:22.04 bash root@fc1075366787:/# ls /newapp/ hello_world root@fc1075366787:/# cat /newapp/hello_world test
Now, when it comes to Kubernetes, you’ve probably already guessed it – similar to the CRI and the CNI, K8s implements the Container Storage Interface (CSI).
CSI
The CSI allows using pluggable storage layers. External storage systems can be integrated for use in Kubernetes in a standardized way with the CSI.
The CSI allows vendors and cloud providers to implement support for their storage services or hardware appliances. For example, there is an Amazon Elastic Block Store (EBS) CSI driver that allows you to fully manage the life cycle of EBS volumes in the AWS cloud via Kubernetes. There is a NetApp Trident CSI project, which supports a variety of NetApp storage filers that can be used by containers in Kubernetes. And plenty of other CSI-compatible storage solutions available today.
Kubernetes is very powerful when it comes to managing storage; it can automatically provision, attach, and re-attach volumes between hosts and containers in the cluster. We will learn in more detail about Kubernetes features for stateful applications in Chapter 6, Deploying and Scaling Applications with Kubernetes, and now let’s move on to learn about container security.
Container security is an advanced and complex topic and yet even for an entry-level KCNA certification, you are expected to know a few basics. As we’ve learned, Namespaced containers are the most commonly used containers and they share the kernel of an underlying OS. That means a process running in a container cannot see other processes running in other containers or processes running on the host. However, all processes running on one host still use the same kernel. If one of the containers gets compromised, there is a chance of the host and all other containers being compromised as well.
Let’s get back to our Docker setup for a quick demonstration. Start an Ubuntu container as we did before and run the uname -r command to see which kernel version is used:
$ docker run -it ubuntu:22.04 bash root@4a3db7a03ccf:/# uname -r 5.10.47-linuxkit
The output you’ll see depends on your host OS and kernel version. Don’t get surprised if you see another version. For example, you might see this:
5.13.0-39-generic
Now exit the container and start another one with an older version of Ubuntu:16.04:
$ docker run -it ubuntu:16.04 bash Unable to find image 'ubuntu:16.04' locally 16.04: Pulling from library/ubuntu 58690f9b18fc: Pull complete b51569e7c507: Pull complete da8ef40b9eca: Pull complete fb15d46c38dc: Pull complete Digest: sha256:20858ebbc96215d6c3c574f781133ebffdc7c18d98af 4f294cc4c04871a6fe61 Status: Downloaded newer image for ubuntu:16.04 root@049e8a43181f:/# uname -r 5.10.47-linuxkit root@049e8a43181f:/#
See? We took an Ubuntu:16.04 image that is more than 5 years old by now, but the kernel version used is exactly the same as in the first container. Even if you take a different flavor of Linux, the kernel version of your host OS will be used.
So, how can we protect the kernel of our host where we run Namespaced containers? Perhaps the two most well-known technologies are AppArmor for Ubuntu and Security-Enchanced Linux (SELinux) for Red Hat and the CentOS Linux family. Essentially, those projects allow you to enforce access control policies for all user applications and system services. Access to specific files or network resources can also be restricted. There is also a special tool for SELinux that helps to generate security profiles specifically for applications running in containers (https://github.com/containers/udica). Kubernetes has integration with both AppArmor and SELinux that allows you to apply profiles and policies to containers managed with K8s.
Moving on, it is considered a bad practice and a security risk to run containers as a root user. In Linux, a root user is a user with an ID of 0 and a group ID of 0 (UID=0, GID=0). In all our hands-on exercises, we’ve used a root user inside containers:
root@4a3db7a03ccf:/# root@4a3db7a03ccf:/# id -u 0
In a real production environment, you should consider running applications as a non-root user because root is essentially a super-admin that can do anything in the system. Now comes the interesting part – a root user inside a container can also be a root user on the host where the container is running (very bad practice!). Or, thanks to the Namespace functionality of the Linux kernel, the root user inside the container can be mapped to a different user ID on the host OS (such as UID=1001, for example). This is still not perfect, but in case a container is compromised, root inside the container won’t automatically gain root privileges on the host OS.
Note
It is possible to specify which user and group to use for the application packaged in the container during the image build process. You can simply add the USER mynewuser instruction to a Dockerfile to define which user to use. You might need to first create this user by adding one more instruction above it. For example: RUN useradd -r -u 1001 mynewuser
Last but not least, keep in mind which container images you are using in your environments. If you go to Docker Hub (https://hub.docker.com/) or any other online container registry, you’ll find lots and lots of third-party images that anybody can download and run. You might encounter an image that does exactly what you need. For example, an image might package a tool or an application you wanted to try (e.g., to monitor the database you are running). But it may well package malicious code inside. Therefore, make sure to run trusted code in your containers.
It is also better to build the image yourself and store it in your own repository because third-party public image repositories are completely out of your control. Their owner might simply delete or replace the image at any given point in time or make the repository private. You won’t notice that immediately and this might cause an incident when the image isn’t available for download. Finally, there are a number of tools available today that perform container image scanning for security vulnerabilities (Clair, Dagda, and Anchore, to name a few). Those tools can be integrated into the image build process to reduce the risks of using outdated packages or installing software with known security exposures.
Now that we know more about container security and networking, we will look into service meshes – a rather new technology for managing traffic and securing cloud-native applications.
Before jumping into the definition of the service mesh, let’s reiterate quickly what we’ve learned previously about the architecture of cloud-native applications.
Modern cloud-native applications rely on microservices that work together as a part of bigger applications and communicate with each other over a network. Those microservices are packaged as container images and run with the help of an orchestration system such as Kubernetes. The nature of cloud-native applications is highly dynamic, and the number of running containers varies a lot depending on the current load and infrastructure events or outages.
Consider a situation where you are responsible for running an application your company has developed that consists of 20 different microservices. You have implemented autoscaling for all services and in the peak load times, the number of running containers goes well over a hundred (e.g., several container replicas for each service spread across multiple cloud instances). Even if using Kubernetes to effectively orchestrate that fleet, you still want to make sure your application runs reliably, infrastructure is secure, and if any problem occurs, you’re able to detect it and act fast. This is where a service mesh comes into play.
Service mesh
A service mesh is a dedicated infrastructure layer for making communication between services safe, observable, and reliable.
A service mesh is a special layer for handling service-to-service communication. The service here is typically a microservice running in a container orchestrated by Kubernetes. Technically, a service mesh can be used without Kubernetes and even containers, but in practice, most of the time, a service mesh is used together with containers orchestrated by Kubernetes. Examples of service meshes include the following:
The first three in the list are in fact open source CNCF projects, although of different maturity levels.
Now, what does safe communication mean in the context of a service mesh?
In the preceding part, we covered the basics of container security, but we have not looked further into securing network communication between containers. Securing network communication is often a part of the so-called Zero Trust security approach.
Zero Trust
Zero Trust is an approach where no one is trusted by default from within the network or outside of the network. Verification is required to gain access to services connected to the network.
The traditional network security approach is based on securing the perimeter of the infrastructure, that is, it is hard to obtain access to the network from the outside, but inside the network everyone is trusted by default. Obviously, if an attacker can breach perimeter security and access internal networks, they are very likely to gain access everywhere else, including confidential data. This is the reason why more and more enterprises are implementing the Zero Trust approach, and this is where a service mesh is very helpful.
One of the major advantages of a service mesh is that you do not need any changes in the application code to use a service mesh and its features. A service mesh is implemented on the platform layer, meaning that, once installed on the platform, all the applications (e.g., microservices in containers) can benefit from its features. With a service mesh, all traffic between containers can be automatically encrypted and decrypted and the applications running inside won’t require a single line of code change.
The traditional approach to accomplishing this without a service mesh would require managing SSL certificates, requesting and renewing them on expiration, and potentially making further changes to the application or the infrastructure levels.
In fact, all service meshes from the aforementioned list offer mutually-authenticated Transport Layer Security (mTLS) for all TCP traffic between containers connected to the mesh. It is similar to regular TLS when the server identity is presented with a certificate, with the difference that in the case of mTLS, both sides have to identify themselves to start communicating. That means the client also needs to present a certificate that the server will verify. In our example, the client and server are two services in containers connected to the service mesh. And again, mTLS can be enabled completely automatically with no extra work required on the application part.
Before exploring other features, let’s first understand better how a service mesh works. The service mesh layer is interfaced with microservices through an array of lightweight network proxies and all traffic between microservices is routed via those proxies in their own infrastructure layer. Typically, proxies run alongside each service in so-called sidecar containers, and altogether, those sidecar proxies form a service mesh network, as depicted in Figure 4.2.
Figure 4.2 – Service mesh overview
The service mesh is usually made up of two parts:
For a service mesh to work with Kubernetes, it has to be compatible with the K8s Service Mesh Interface (SMI).
SMI
This is a specification defining a standard, common, and portable set of APIs for smooth service mesh integration in a vendor-agnostic way. SMI serves the same purpose as CRI, CNI, and CSI, but for service meshes.
When it comes to observability, a service mesh offers detailed telemetry for all communications happening within the mesh. Automatically collected metrics from all proxies allow operators and engineers to troubleshoot, maintain, and optimize their applications. With a service mesh, we can trace the calls and service dependencies as well as inspecting traffic flows and individual requests. This information is extremely helpful to audit service behavior and response times, and to detect abnormalities in complex distributed systems.
Finally, a service mesh offers traffic management and reliability features. The exact functionality might vary from project to project, therefore some features provided by one service mesh might not be offered by another one. For the sake of example, let’s see what a Linkerd mesh has to offer:
All in all, a service mesh is a complex and advanced topic and we have only scratched the surface to learn the minimum required for passing the KCNA exam. If you are interested in knowing more, it is recommended to check the Further reading section.
One question you might be asking yourself at this point is what’s the difference between overlay networks and service meshes and why do we need both?
The short answer is that most overlay networks operate on the lower layer of the Open Systems Interconnection (OSI) model (Network layer 3) whereas a service mesh operates on layer 7 of the OSI model, focusing on services and high-level application protocols (if you’re not familiar with the OSI model, check the Further reading section). The functionality of one is not a replacement for the other, and service meshes are still gaining momentum meaning, that not every microservice-based or containerized application running on Kubernetes will use a service mesh. Technically, we are also not obligated to always use overlay networks with containers, as we saw in our exercises with Docker, but in the upcoming chapters, we’ll see why it is favorable.
In this chapter, we’ve learned a lot about container runtimes, container interfaces, and service meshes. A container runtime is low-level software that manages basic container operations such as image downloading and the start or deletion of containers. Kubernetes does not have its own runtime, but it provides interfaces that allow you to use different runtimes, different network plugins, different storage solutions, and different service meshes. Those interfaces are called CRI, CNI, CSI, and SMI respectively and their introduction allowed a lot of flexibility when using K8s.
We’ve also learned about container runtime types and their differences. Namespaced containers are the most popular and lightweight, however, they are not as secure as other types. Virtualized containers are the slowest, but they provide maximum security as each container uses an individual Linux kernel. Sandboxed containers fill the gap between the other two – they are more secure than namespaced ones and faster than virtualized ones.
When it comes to container networking, there are many options. For container-to-container communication in a cluster, we would typically use an overlay network. Kubernetes supports third-party network plugins through CNI, and those plugins provide a different set of features and capabilities. It is also possible to run containers in a non-isolated network environment, for example, directly in the network namespace of the host where the container is started.
Containers are stateless by design, meaning that they don’t preserve the data on the disk by default. To run a stateful application in a container, we need to attach external storage volumes that can be anything ranging from an iSCSI block device to a specific vendor or cloud provider solution or even a simple local disk. Kubernetes with a pluggable CSI allows a lot of flexibility when it comes to integrating external storage to containers orchestrated by K8s.
We additionally touched on the basics of container security. Namespaced containers share the same kernel, which is why it is important to make sure that no container gets compromised. There are security extensions such as AppArmor and SELinux that add an extra kernel protection layer with configurable profiles and there are best practices that help to minimize the risks.
One of the practices is to use regular (non-root) user accounts in containers and another one is to ensure that you execute trusted code in a container. It is recommended to build your own images and keep them in your own registries, rather than using images from unknown third-party repositories. Additionally, you could implement automatic vulnerability scanning as a part of the image build process.
Finally, we learned about the service mesh – a special infrastructure layer that allows securing network communication between services without any changes to the application code. A service mesh also provides a rich set of features for observability and traffic management and even allows you to automatically retry requests and split traffic.
In the upcoming chapter, we will get to a major part of the KCNA exam and this book – namely, Kubernetes for container orchestration. Now make sure to answer all of the following recap questions to test your knowledge.
As we conclude, here is a list of questions for you to test your knowledge regarding this chapter’s material. You will find the answers in the Assessments section of the Appendix:
To learn more about the topics that were covered in this chapter, take a look at the following resources: