In this chapter, we’ll take a closer look at containers, dive deeper into container technology and the container ecosystem, and discover tooling that is commonly used.
An old Chinese proverb states, “What I hear, I forget. What I see, I remember. What I do, I understand.”
Starting with this chapter, we will get our hands dirty and try building images and running containers to get a deeper understanding and first-hand practical experience. Even though KCNA is a multi-choice exam, it is very important to do things first-hand as this experience will help you in the future. Don’t just read the code snippets – make sure you execute them completely, especially if you have no previous experience with containers. You’ll need any computer running a recent version of Linux, Windows, or macOS and a working internet connection.
In this chapter, we will cover the following topics:
All the example files and code snippets used in this chapter have been uploaded to this book’s GitHub repository at https://github.com/PacktPublishing/Becoming-KCNA-Certified.
Docker has been around for quite a few years, so you may have heard about it before. For many people, the name Docker itself is synonymous with container. However, there are so many things called Docker that it is easy to get confused:
Let’s clarify each one.
For starters, Docker Inc. (as a company) did not invent the technology behind containers, but it created easy-to-use tools from the list that helped kickstart broader container adoption. The company was founded in 2008 and was initially called dotCloud.
Docker Engine is an open source software bundle for building and containerizing applications. It is a piece of client-server software that consists of a daemon service known as dockerd (Docker daemon) that provides a REST API (for other programs to talk to it) and a command-line interface (CLI) that is simply called docker.
Containerization
Containerization is the process of packaging software application code with dependencies (libraries, frameworks, and more) in a container. Containers can be moved between environments independently of the infrastructure’s operating system.
When you install a Docker engine, you essentially install two things – the dockerd service and the CLI. dockerd constantly runs and listens for commands to do any operations with containers such as starting new, stopping existing, restarting containers, and so on. Those commands might be issued using the docker CLI, or a common tool such as curl. We will be using the docker CLI in this chapter’s examples.
Next on our list is Docker Hub (https://hub.docker.com/), a public container image registry. As you may recall, a container image is a predefined static template that we use as a base for starting new containers. Now, where do we get the images from? Docker Hub can be one such place. It is an online repository service offered by Docker Inc. where thousands of container images with different environments (Ubuntu, Centos, Fedora, and Alpine Linux) as well as popular software such as Nginx, Postgres, MySQL, Redis, and Elasticsearch are hosted. Docker Hub allows you to find, share, and store container images that can be easily pulled (downloaded) over the internet to the host where you need to create a new container. It’s worth mentioning that Docker Hub is not the only such service – others include Quay (https://quay.io/), Google Container Registry (https://cloud.google.com/container-registry), and Amazon Elastic Container Registry (https://aws.amazon.com/ecr/).
Let’s move on to Docker Registry which is today managed at CNCF as a project named Distribution. It is an open source server-side application that can be used for storing and distributing Docker images. The main difference compared to Docker Hub is that Docker Registry is a piece of software that you can just take, install, and run within your organization at no cost, whereas Docker Hub is a registry as a service with some additional paid features. Docker Registry can be used to store and serve container images with software that your dev teams are developing.
Docker Swarm is next on our list and its purpose is cluster management and container orchestration. Swarm is similar to Kubernetes; however, it is only compatible with Docker Engine (meaning no other container runtimes are supported) and has significantly fewer features and limited customizations compared to Kubernetes. That is the reason it did not receive such wide adoption as Kubernetes did.
Docker Compose is another Docker tool that allows you to define and share multi-container applications specifications. With Compose, you can define multiple containers that need to communicate with each other as a part of one application in a single YAML formatted file. For example, you can bootstrap a Django web application with a database running in two containers and define that the database has to start first, as well as expose certain ports of containers. Compose might be handy for some local development with Docker, but it is not compatible with Kubernetes, so we are not going to cover it in any more detail.
Docker Desktop is a combination of Docker Engine, the docker CLI, Docker Compose, Kubernetes, and some other tools for Windows/macOS that comes with its own graphical user interface (GUI). That’s right – Docker Desktop even packages Kubernetes and K8s clients for local development. Docker Desktop is free for non-commercial use but paid if used in organizations. There is also a beta version available for Ubuntu and Debian Linux.
Dockershim is a software compatibility layer that was created to allow Kubernetes (its kubelet component, to be precise) to communicate with dockerd (Docker daemon). As you might remember from the previous chapters, Kubernetes does not have its own container runtime (software for performing basic operations with containers such as starting, stopping, and deleting). In the early versions, Kubernetes only supported Docker to operate containers. As the container ecosystem evolved with Open Container Initiative (OCI), support for new runtimes was added through structured and standardized interfaces. Since dockerd did not have an OCI standardized interface, a translation layer between Kubernetes and dockerd called Dockershim was created. Dockershim has been deprecated since Kubernetes version 1.20 and with the 1.24 release, it has been completely removed from K8s.
Finally, we’ve reached the end of our list. Despite the number of alternatives that have appeared over the years, Docker Engine and the Docker tooling are still actively used by thousands of development teams and organizations across the globe. The following diagram demonstrates how, using the Docker CLI, we can communicate with the Docker daemon, which fetches the images from Docker Registry to create containers locally:
Figure 3.1 – Docker architecture
In the upcoming sections, we will install some of the Docker tools to see it in action and finally get our hands on containers.
Before we move on to the practical part, we still need to figure out the technology behind containers and who created it. The technology behind Linux containers was developed quite a long time ago and is based on two fundamental kernel features:
cgroups
cgroups is a mechanism that allows processes to be organized into hierarchical groups. How resources (CPU, memory, disk I/O throughput, and so on) are used by those groups can be limited, monitored, and controlled.
cgroups were initially developed by engineers at Google and first released in 2007. Since early 2008, cgroups functionality was merged into the Linux kernel and has been present ever since. In 2016, a revised version of cgroups was released and it is now known as cgroups version 2.
Even before cgroups, in 2002, the Linux namespaces feature was developed.
Linux kernel namespaces
This Linux feature allows you to partition kernel resources in such a way that one set of processes sees one set of resources while another set of processes sees a different set of resources. Linux namespaces are used to isolate processes from each other.
There are different types of namespaces, each with its own properties:
This may sound complicated, but don’t worry – namespaces and cgroups are not part of the KCNA exam, so you don’t need to know about every namespace and what they do. However, since those are at the core of container technology, it is helpful to have an idea, plus you are always given bonus points if you can explain how containers work under the hood.
To summarize, cgroups and namespaces are the building blocks of containers. cgroups allow you to monitor and control computational resources for a process (or a set of processes), whereas namespaces isolate the processes at different system levels. Both functionalities can also be used without containers, and there’s plenty of software that makes use of this functionality.
Enough theory – let’s get some practice! In the next section, we will install Docker tooling and start our first container.
If you are running Windows or macOS, you can download and install Docker Desktop from https://docs.docker.com/desktop/. If you are running a recent version of Ubuntu Linux, there is a version of Docker Desktop for you too. If you are running another Linux distribution, you’ll have to install Docker Engine. You can find detailed instructions for your distribution at https://docs.docker.com/engine/install/. Please pick a stable release for installation.
If you restart your computer, make sure that Docker Desktop is running. On Linux, you might have to execute the following code in your Terminal:
$ sudo systemctl start docker
If you want it to start automatically in case of a system restart, you can run the following command:
$ sudo systemctl enable docker
Regardless of the OS or tooling that you’ve installed (Desktop or Engine), it will come with the Docker CLI that we will be using, which is simply called docker.
First, let’s make sure that Docker was installed correctly and running by checking the version. Open the terminal and type the following:
$ docker --version Docker version 20.10.10, build b485636
Important note
If you are on Linux and you have not added your user to the docker group after the installation, you’ll have to call the Docker CLI with superuser privileges, so all docker commands should be prefixed with sudo. For the preceding example, the command will be sudo docker --version.
Your output might look slightly different – perhaps you’ll have a newer version installed. If the preceding command did not work, but Docker is installed, make sure that Docker Desktop (if you’re on macOS or Windows) or the Docker daemon (if you’re on Linux) is running.
Now, let’s start our first container with Ubuntu 22.04:
$ docker run -it ubuntu:22.04 bash
The output that you’ll see should be similar to the following:
Unable to find image ‘ubuntu:22.04’ locally 22.04: Pulling from library/ubuntu 125a6e411906: Pull complete Digest: sha256:26c68657ccce2cb0a31b330cb0be2b5e108d467f641c62e13ab40cbe c258c68d Status: Downloaded newer image for ubuntu:22.04 root@d752b475a54e:/#
Wow! We are now running bash inside an Ubuntu container. It might take a few seconds for the image to be downloaded, but as soon as it is ready, you’ll see the command-line prompt running as a root user inside a newly spawned container.
So, what exactly happened when we called docker run?
docker run executes a command inside a new container; it requires the name of the container image where the command will be executed (ubuntu in the preceding example), optionally the tag of the image (22.04 here), and the command to be executed (simply bash here).
The -i argument is the same as --interactive, and it means we’d like to be running our command in interactive mode. -t, which is the same as --tty, will allocate a pseudo-TTY (emulated terminal).
As you may remember, images are templates for container environments. We have asked for an ubuntu environment tagged with version 22.04. In the first few lines of output, we saw that the image was not found locally:
Unable to find image ‘ubuntu:22.04’ locally 22.04: Pulling from library/ubuntu 125a6e411906: Pull complete
If the requested image with a particular tag was not downloaded previously, it will be automatically downloaded (pulled) from the Docker Hub library and you should be able to see the download progress while it is happening.
Now, let’s exit the container and try running it again. Simply type exit in the terminal:
root@d752b475a54e:/# exit exit
Now, execute the same command we did previously:
$ docker run -it ubuntu:22.04 bash root@e5d98a473adf:/#
Was it faster this time? Yes, because we already have the ubuntu:22.04 image cached locally, so we don’t need to download it again. Therefore, the container was started immediately.
Did you notice that the hostname after root@ is different this time – e5d98a473adf versus d752b475a54e? (Note: you will see your unique container names here.) This is because we have started a new container that is based on the same ubuntu image. When we start a new container, we don’t modify the read-only source image; instead, we create a new writable filesystem layer on top of the image. The following diagram shows such a layered approach:
Figure 3.2 – Container layers
When we start a container, we add a new layer, which allows modifications to be made to the container image copy. This way, we can create any number of containers from the same base image without modifying the initial read-only image layer. The major benefit of this approach is that in container layers, we only store the difference with the image layer, which means significant disk space savings when used at scale.
The images can also consist of multiple layers, where one layer might be originating from another one. In the following section, we will learn how to build new images and include the software that we like inside.
Feel free to explore our container environment and exit it when you’re done:
$ docker run -it ubuntu:22.04 bash root@e5d98a473adf:/# echo “Hello World” > /root/test root@e5d98a473adf:/# hostname e5d98a473adf root@e5d98a473adf:/# date Sun May 1 13:40:01 UTC 2022 root@e5d98a473adf:/# exit exit
When we called exit in the first container, it exited; later, when we called docker run again, a new container was created. Now that both containers have exited, we have an image layer stored on the disk, as well as two different container layers based on the ubuntu:22.04 base.
Since the container layers only keep track of differences from the base image, we won’t be able to remove the base image until all the container layers have been deleted. Let’s get the list of images we have locally by running the following code:
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu 22.04 d2e4e1f51132 39 hours ago 77.8MB
If we attempt to delete the ubuntu:22.04 image with the docker rmi command, we’ll get an error:
$ docker rmi ubuntu:22.04 Error response from daemon: conflict: unable to remove repository reference “ubuntu:22.04” (must force) – container e5d98a473adf is using its referenced image d2e4e1f51132
We can also execute the docker ps command to see all running containers:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
An empty table means no containers are currently running.
Finally, we can execute docker ps --all to see all the containers on the local system, including those that have exited:
$ docker ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e5d98a473adf ubuntu:22.04 “bash” 8 minutes ago Exited (0) 2 minutes ago vibrant_jenn d752b475a54e ubuntu:22.04 “bash” 18 minutes ago Exited (0) 12 minutes ago cool_perl
Try removing those exited containers with docker rm CONTAINER ID:
$ docker rm d752b475a54e d752b475a54e $ docker rm e5d98a473adf e5d98a473adf
Now, the image should be deleted too:
$ docker rmi ubuntu:22.04 Untagged: ubuntu:22.04 Untagged: ubuntu@sha256:26c68657ccce2cb0a31b330cb0be2b5e108d467f641c62e13ab40cbe c258c68d Deleted: sha256:d2e4e1f511320dfb2d0baff2468fcf0526998b73fe10c8890b4684bb 7ef8290f Deleted: sha256:e59fc94956120a6c7629f085027578e6357b48061d45714107e79f04 a81a6f0c
sha256 are the digests of image layers; they are unique and immutable identifiers. If we assign a different tag to our ubuntu image instead of 22.04 and try to pull (download) the same image from Docker Hub again, Docker will recognize that we already have an image with this digest and will do nothing except tag it again.
Let’s try one more thing – pulling another Docker image without any tags. If you simply pull the image, no container is going to be started, but this will save download time the next time a new container is started from that image:
$ docker pull centos Using default tag: latest latest: Pulling from library/centos a1d0c7532777: Pull complete Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473 f432b177 Status: Downloaded newer image for centos:latest docker.io/library/centos:latest
As you can see, if we don’t specify the tag explicitly, latest will be taken by default.
In the upcoming section, we will learn more about the meaning of the latest tag, tagging in general, and building images with Docker.
Now that we know how to start containers and pull images, we’ll learn what should be done to create new container images. Since the image layer is immutable, you can create new images with the software of your choice to build an image by adding new layers on top of existing ones. There are two ways this can be done with Docker:
The interactive way is to create an image from an existing container. Let’s say you start a container with the Ubuntu 22.04 environment, install additional packages, and expose port 80. To create a new image, we can use the docker commit command:
$ docker commit CONTAINER_ID [REPOSITORY[:TAG]]
The image name will be in the REPOSITORY:TAG format. If no tag is given, then latest will be added automatically. If no repository was specified, the image name will be a unique identifier (UUID). The tag, as well as the name (which is the same as the image repository’s name), can be changed or applied after the build.
While the interactive method is quick and easy, it should not be used under normal circumstances because it is a manual, error-prone process and the resulting images might be larger with many unnecessary layers.
The second, better option for building images is using Dockerfiles.
Dockerfile
A Dockerfile is a text file containing instructions for building an image. It supports running shell scripts, installing additional packages, adding and copying files, defining commands executed by default, exposing ports, and more.
Let’s have a look at a simplistic Dockerfile:
FROM ubuntu:22.04 RUN apt-get update && apt-get install -y curl vim LABEL description=”My first Docker image”
As you’ve probably already guessed, the FROM instruction defines the base image with a tag for the image we are going to build. The base image can also be one of our previously built local images or an image from the registry. RUN instructs to execute apt-get update and then install curl and vim packages. LABEL is simply any metadata you’d like to add to the image. If you copy the preceding contents to a file called Dockerfile, you’ll be able to build a new image by calling docker build in the same folder:
$ docker build . -t myubuntuimage [+] Building 11.2s (6/6) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 153B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/ubuntu:22.04 0.0s => [1/2] FROM docker.io/library/ubuntu:22.04 0.0s => [2/2] RUN apt-get update && apt-get install -y curl vim 9.9s => exporting to image 1.1s => => exporting layers 1.1s => => writing image sha256:ed53dcc2cb9fcf7394f8b03818c02e0ec4 5da57e89b550b68fe93c5fa9a74b53 0.0s => => naming to docker.io/library/myubuntuimage 0.0s
With -t myubuntuimage, we have specified the name of the image without the actual tag. This means that the latest tag will be applied to the image by default:
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE myubuntuimage latest ed53dcc2cb9f 6 minutes ago 176MB centos latest 5d0da3dc9764 7 months ago 231MB
There are a few things we need to clarify about the latest tag, as it can be misleading:
Therefore, the best practice is to tag images with something more descriptive rather than relying on latest. For instance, an incrementable version of the packaged application (v.0.32, v.1.7.1, and so on) can be used as a tag or even the build timestamp. The timestamp allows us to determine when the image was built without the need to inspect each image metadata.
Let’s quickly go back to the instructions supported in Dockerfiles. We’ve already learned about FROM, RUN, and LABEL, but there are more:
A quick note about CMD and ENTRYPOINT: they are similar, yet not the same. We could specify either CMD, ENTRYPOINT, or both in our Dockerfile. If we specify both, then CMD acts as a parameter for ENTRYPOINT. Since CMD is a bit easier to override at runtime, then typically, ENTRYPOINT is the executable and CMD is the argument in such scenarios. For example, we could set ENTRYPOINT to /bin/cat and use CMD to give a path to a file we want to concatenate (/etc/hosts; /etc/group, and so on). For many public images from Docker Hub, ENTRYPOINT is set to /bin/sh -c by default.
This list is not meant to be a complete reference of instructions supported by Dockerfiles, but it mentions the most used instructions that cover 99% of scenarios. In addition, you don’t often build containers on your laptop or local workstation; instead, you use a modern CI/CD system or an automated build from Docker Hub as an alternative.
Now, let’s understand what a development workflow might look like when containers are in use:
Important note
There is no need to learn a new programming language – any software that runs in a Linux environment will run inside a container too.
As you may remember, one of the main features of containers is portability – a container running on one host will also run on another host. This means that you can have a container image with Alpine Linux and run it on both your laptop with Fedora Linux or on an Ubuntu-based Kubernetes cluster.
But wait – can we run Linux containers on Windows or vice versa? Not really. First, we need to distinguish between Linux containers and Windows containers.
Important note
Everything in this book and the KCNA exam itself is only about Linux containers.
Even if you are running Docker Desktop on Windows, it is using a minimal Linuxkit virtual machine in the background. Windows containers are different and might use one of the two distinct isolation modes (WSL 2 or Hyper-V) available today in the Microsoft operating system. Docker Desktop allows you to switch between Windows containers and Linux containers if you are running on Windows. However, keep in mind that more than 90% of the servers in the world run Linux, so unless you are going to run Windows-only applications in containers, you are fine to only learn about and use Linux containers.
In this chapter, we gained experience with (Linux) containers and learned that the technology behind containers has existed for many years and is based on cgroups and kernel namespaces.
Docker has introduced tooling that’s aimed at developers and engineers looking for a universal and simple way to package and share applications. Before containers, it has often been the case that an application could work in the development environment but fail in the production environment because of unmet dependencies or incorrect versions that have been installed. Containers have fixed this problem by bundling the application with all the dependencies and system packages in a template known as a container image.
Container images can be stored in registries that support private and public repositories and allow you to share them with different teams. Docker Hub, Quay, and Google Container Registry (GCR) are some of the most well-known container image registries today that can be reached over the internet. An image that’s pushed (uploaded) to the registry can then be pulled (downloaded) by a container orchestrator such as Kubernetes or simply by a server with a container runtime over the internet.
Images are used to create containers, so a container is a running instance of an image. When a container is started with a writable filesystem, a layer is created on top of the immutable image layer. Containers and images can have multiple layers and we can start as many containers as we want from a single image. Containers are more lightweight compared to VMs and are very fast to start.
We also learned that to build a container image with Docker, we can leverage an interactive or Dockerfile method. With Dockerfile, we define a set of instructions that will be executed to build an image with our containerized application.
In the next chapter, we will continue exploring containers by learning about the runtimes and pluggable interfaces provided by Kubernetes.
As we conclude, here is a list of questions for you to test your knowledge regarding this chapter’s material. You will find the answers in the Assessments section of the Appendix:
While this chapter provided an insight into the container ecosystem and the knowledge needed to pass the KCNA exam, it doesn’t cover all the features of Docker tooling nor describes cgroups and namespaces in detail. If you’d like to go the extra mile, you are encouraged to check out the following resources: