Chapter 11: Securing Containers

Security is becoming the hottest topic of current times. Enterprises and companies all over the world are making huge investments in security practices and tools that should help protect their systems from internal or external attacks.

As we saw in Chapter 1, Introduction to Container Technology, containers and their host systems can be considered a medium to execute and keep a target application running. Security should be applied to all levels of the service architecture, from the base infrastructure to the target application code, all while passing through the virtualization or containerization layer.

In this chapter, we will look at the best practices and tools that could help improve the overall security of our containerization layer. In particular, we're going to cover the following main topics:

  • Running rootless containers with Podman
  • Do not run containers with UID 0
  • Signing our container images
  • Customizing Linux kernel capabilities
  • SELinux interaction with containers

Technical requirements

To complete this chapter's examples, you will need a machine with a working Podman installation. As we mentioned in Chapter 3, Running the First Container, all the examples in this book have been executed on a Fedora 34 system or later, but they can be reproduced on your OS of choice.

Having a good understanding of the topics that were covered in Chapter 4, Managing Running Containers, Chapter 5, Implementing Storage for the Container's Data, and Chapter 9, Pushing Images to a Container Registry, will help you understand the container security topics we'll be discussing here.

Running rootless containers with Podman

As we briefly saw in Chapter 4, Managing Running Containers, it is possible for Podman to let standard users without administrative privileges run containers in a Linux host. These containers are often referred to as "rootless containers."

Rootless containers have many advantages, including the following:

  • They create an additional security layer that could block attackers trying to get root privileges on the host, even if the container engine, runtime, or orchestrator has been compromised.
  • They can allow many unprivileged users to run containers on the same host, making the most of high-performance computing environments.

Let's think about the approach that's used by any Linux system to handle traditional process services. Usually, the package maintainers tend to create a dedicated user for scheduling and running the target process. If we try to install an Apache web server on our favorite Linux distribution through the default package manager, then we can find out that the installed service will run through a dedicated user named "apache."

This approach has been the best practice for years because, from a security perspective, allowing fewer privileges improves security.

Using the same approach but with a rootless container allows us to run the container process without the need for additional privileges escalation. Additionally, Podman is daemonless, so it will just create a child process.

Running rootless containers in Podman is pretty straightforward and, as we saw in the previous chapters, many of the examples in this book can be run as standard unprivileged users. Now, let's learn what's behind the execution of a rootless container.

The Podman Swiss Army knife – subuid and subgid

Modern Linux distributions use a version of the shadow-utils package that leverages two files: /etc/subuid and /etc/subgid. These files are used to determine which UIDs and GIDs can be used to map a user namespace.

The default allocation for every user is 65536 UIDs and 65536 GIDs.

We can run the following simple command to check how the subuid and subgid allocation works in rootless containers:

$ id

uid=1000(alex) gid=1000(alex) groups=1000(alex),10(wheel)

$ podman run alpine cat /proc/self/uid_map /proc/self/gid_map

Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)

Trying to pull docker.io/library/alpine:latest...

Getting image source signatures

Copying blob 59bf1c3509f3 done  

Copying config c059bfaa84 done  

Writing manifest to image destination

Storing signatures

         0       1000          1

         1     100000      65536

         0       1000          1

         1     100000      65536

As we can see, both files indicate that they start mapping UID and GID 0 with the current user's UID/GID that we just run the container with; that is, 1000. After that, it maps UID and GID 1, starting from 100000 and arriving at 165536. This is calculated by summing the starting point, 100000, and the default range, 65536.

Using rootless containers is not the only best practice we can implement for our container environments. In the next section, we'll learn why we shouldn't run a container with UID 0.

Do not run containers with UID 0

Container runtimes can be instructed to perform running processes inside a container with a user ID that's different from the one that initially created the container, similar to what we saw for rootless containers. Running the container's processes as a non-root user can be helpful for security purposes. For example, using an unprivileged user in a container could limit the attack surface inside and outside that container.

By default, a Dockerfile and Containerfile may set the default user as root (that is, UID=0). To avoid this, we can leverage the USER instruction in those build files – for example, USER 1001 – to instruct Buildah or other container build tools to build and run the container image using that particular user (with UID 1001).

If we want to force a specific UID, we need to adjust the permissions of any file, folder, or mount we plan to use with our running containers.

Now, let's learn how to adapt an existing image so that it can be run with a standard user.

We can leverage some prebuilt images on DockerHub or pick one of the official Nginx container images. First, we need to create a basic nginx configuration file:

$ cat hello-podman.conf

server {

    listen 80;

    location / {

        default_type text/plain;

        expires -1;

        return 200 'Hello Podman user! Server address: $server_addr:$server_port ';

    }

}

The nginx configuration file is really simple: we define the listening port (80) and the content message to return once a request arrives on the server.

Then, we can create a simple Dockerfile to leverage one of the official Nginx container images:

$ cat Dockerfile

FROM docker.io/library/nginx:mainline-alpine

RUN rm /etc/nginx/conf.d/*

ADD hello-podman.conf /etc/nginx/conf.d/

The Dockerfile contains three instructions:

  • FROM: For selecting the official Nginx image
  • RUN: For cleaning the configuration directory from any default config example
  • ADD: For copying the configuration file we just created

Now, let's build the container image with Buildah:

$ buildah bud -t nginx-root:latest -f .

STEP 1/3: FROM docker.io/library/nginx:mainline-alpine

STEP 2/3: RUN rm /etc/nginx/conf.d/*

STEP 3/3: ADD hello-podman.conf /etc/nginx/conf.d/

COMMIT nginx-root:latest

Getting image source signatures

Copying blob 8d3ac3489996 done

...

Copying config 21c5f7d8d7 done  

Writing manifest to image destination

Storing signatures

--> 21c5f7d8d70

Successfully tagged localhost/nginx-root:latest

21c5f7d8d709e7cfdf764a14fd6e95fb4611b2cde52b57aa46d43262a 6489f41

Once you've built the image, name it nginx-root. Now, we are ready to run our container:

$ podman run --name myrootnginx -p 127.0.0.1::80 -d nginx-root

364ec7f5979a5059ba841715484b7238db3313c78c5c577629364aa46b6d 9bdc

Here, we used the–p option to publish the port and make it reachable from the host. Let's find out what local port has been chosen, randomly, in the host system:

$ podman port myrootnginx 80

127.0.0.1:38029

Finally, let's call our containerized web server:

$ curl localhost:38029

Hello Podman user!

Server address: 10.0.2.100:80

The container is finally running, but what user is using our container? Let's find out:

$ podman ps | grep root

364ec7f5979a  localhost/nginx-root:latest  nginx -g daemon o...  55 minutes ago  Up 55 minutes ago  0.0.0.0:38029->80/tcp      myrootnginx

$ podman exec 364ec7f5979a id

uid=0(root) gid=0(root)

As expected, the container is running as root!

Now, let's make a few edits to change the user. First, we need to change the listening port in the Nginx server configuration:

$ cat hello-podman.conf

server {

    listen 8080;

    location / {

        default_type text/plain;

        expires -1;

        return 200 'Hello Podman user! Server address: $server_addr:$server_port ';

    }

}

Here, we replaced the listening port (80) with 8080; we cannot use a port that's below 1024 with unprivileged users.

Then, we need to edit our Dockerfile:

$ cat Dockerfile

FROM docker.io/library/nginx:mainline-alpine

RUN rm /etc/nginx/conf.d/*

ADD hello-podman.conf /etc/nginx/conf.d/

RUN chmod -R a+w /var/cache/nginx/

        && touch /var/run/nginx.pid

        && chmod a+w /var/run/nginx.pid

EXPOSE 8080

USER nginx

As you can see, we fixed the permissions for the main file and folder on the Nginx server, exposed the new 8080 port, and set the default user to an Nginx one.

Now, we are ready to build a brand-new container image. Let's call it nginx-user:

$ buildah bud -t nginx-user:latest -f .

STEP 1/6: FROM docker.io/library/nginx:mainline-alpine

STEP 2/6: RUN rm /etc/nginx/conf.d/*

STEP 3/6: ADD hello-podman.conf /etc/nginx/conf.d/

STEP 4/6: RUN chmod -R a+w /var/cache/nginx/         && touch /var/run/nginx.pid         && chmod a+w /var/run/nginx.pid

STEP 5/6: EXPOSE 8080

STEP 6/6: USER nginx

COMMIT nginx-user:latest

Getting image source signatures

Copying blob 8d3ac3489996 done  

...

Copying config 7628852470 done  

Writing manifest to image destination

Storing signatures

--> 76288524704

Successfully tagged localhost/nginx-user:latest

762885247041fd233c7b66029020c4da8e1e254288e1443b356cbee4d73 adf3e

Now, we can run the container:

$ podman run --name myusernginx -p 127.0.0.1::8080 -d nginx-user

299e0fb727f339d87dd7ea67eac419905b10e36181dc1ca7e35dc7d0a 9316243

Find the associated random host port and check whether the web server is working:

$ podman port myusernginx 8080

127.0.0.1:42209

$ curl 127.0.0.1:42209

Hello Podman user!

Server address: 10.0.2.100:8080

Finally, let's see whether we changed the user that's running the target process in our container:

$ podman ps | grep user

299e0fb727f3  localhost/nginx-user:latest  nginx -g daemon o...  38 minutes ago  Up 38 minutes ago  127.0.0.1:42209->8080/tcp  myusernginx

$ podman exec 299e0fb727f3 id

uid=101(nginx) gid=101(nginx) groups=101(nginx)

As you can see, our container is running as an unprivileged user, which is what we wanted.

If you want to look at a ready-to-use example of this, please go to this book's GitHub repository: https://github.com/PacktPublishing/Podman-for-DevOps.

Unfortunately, security is not all about permissions and users – we also need to take care of the base image and its source and check container image signatures. We'll learn about this in the next section.

Signing our container images

When we're dealing with images that have been pulled from external registries, we will have some security concerns related to the potential attack tactics that have been conducted on the containers (see [1] in the Further reading section), especially masquerading techniques, which help the attacker manipulate image components to make them appear legitimate. This could also happen due to a man-in-the-middle (MITM) attack being conducted by an attacker over the wire.

To prevent certain kinds of attacks while you're managing containers, the best solution is to use a detached image signature to trust the image provider and guarantee its reliability.

GNU Privacy Guard (GPG) is a free implementation of the OpenPGP standard and can be used, together with Podman, to sign images and check their valid signatures once they've been pulled.

When an image is pulled, Podman can verify the validity of the signatures and reject images without valid signatures.

Now, let's learn how to implement a basic image signature workflow.

Signing images with GPG and Podman

In this section, we will create a basic GPG key pair and configure Podman to push and sign the image while storing the signature in a staging store. For the sake of clarity, we will run a registry using the basic Docker Registry V2 container image without any customization.

Before testing the image pull and signature validation workflow, we will expose a basic web server to publish the detached signature.

To create image signatures with GPG, we need to create a valid GPG key pair or use an existing one. For this reason, we will provide a short recap on GPG key pairs to help you understand how image signatures work.

A key pair is composed of a private key and a public key. The public key can be shared universally, while the private key is kept private and never shared with anybody. The public key that belongs to the receiver can be used by the sender of a file or message to sign it. In this way, only the owner of the private key (that is, the receiver) will be able to decrypt the message.

We can easily translate this concept into container images: the image owner that pushes it to the remote registry can sign it using a key pair and store the detached signature on a store (from now on, sigstore) that is publicly accessible by users. Here, the signature is separated by the image itself – the registry will store the image blobs while the sigstore will hold and expose the image signatures.

Users who are pulling the image will be able to validate the image signature using the previously shared public key.

Now, let's go back to creating the GPG key pair. We are going to create a simple one with the following command:

$ gpg --full-gen-key

The preceding command will ask you a series of questions and provide a passphrase to help you generate the key pair. By default, this will be stored in the $HOME/.gnupg folder.

The key pair's output should be similar to the following:

$ gpg --list-keys

/home/vagrant/.gnupg/pubring.kbx

pub   rsa3072 2022-01-05 [SC]

      2EA4850C32D29DA22B7659FEC38D92C0F18764AC

uid           [ultimate] Foo Bar [email protected]

sub   rsa3072 2022-01-05 [E]

It is also possible to export generated key pairs. The following command will export the public key to a file:

$ gpg --armor --export [email protected] > pubkey.pem

This command will be useful later when we define the image signature's verification.

The following command can be used to export the private key:

$ gpg --armor

  --export-secret-keys [email protected] > privkey.pem

In both examples, the --armor option has been used to export the keys in Privacy Enhanced Mail (PEM) format.

Once the key pair has been generated, we can create a basic registry that will host our container images. To do so, we will reuse the basic example from Chapter 9, Pushing Images to a Container Registry, and run the following command as root:

# mkdir /var/lib/registry

# podman run -d

   --name local_registry

   -p 5000:5000

   -v /var/lib/registry:/var/lib/registry:z

   --restart=always registry:2

We now have a local registry without authentication that can be used to push the test images. As we mentioned previously, the registry is unaware of the image's detached signature.

Podman must be able to write signatures on a staging sigstore. There is already a default configuration in the /etc/containers/registries.d/default.yaml file, which looks as follows:

default-docker:

#  sigstore: file:///var/lib/containers/sigstore

  sigstore-staging: file:///var/lib/containers/sigstore

The sigstore-staging path is where Podman writes image signatures; it must write them to a writable folder. It is possible to customize this path or keep the default configuration as-is.

If we want to create multiple user-related sigstores, we can create the $HOME/.config/containers/registries.d/default.yaml files and define a custom sigstore-staging path in the user's home directory, following the same syntax that was shown in the previous example. This will allow users to run Podman in rootless mode and successfully write to their sigstore.

Important

It is not a good idea to share the default sigstore across all users by allowing general write permissions. This is because every user in the host would have write access to the existing signatures.

Since we want to use the default sigstore while still using the default GPG key pair under the user's home directory, we will run Podman by elevating privileges with sudo, an exception to the approach that this book follows.

The following example shows the Dockerfile of a custom httpd image that's been built using UBI 8:

Chapter11/image_signature/Dockerfile

FROM registry.access.redhat.com/ubi8

# Update image and install httpd

RUN yum install -y httpd && yum clean all –y

# Expose the default httpd port 80

EXPOSE 80

# Run the httpd

CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

To build the image, we can run the following command:

$ cd Chapter11/image_signature

$ sudo podman build -t custom_httpd .

Now, we can tag the image with the local registry name:

$ sudo podman tag custom_httpd localhost:5000/custom_httpd

Finally, it's time to push the image on the temporary registry and sign it using the generated key pair. The --sign-by option allows users to pass a valid key pair that's been identified by the user's email:

$ sudo GNUPGHOME=$HOME/.gnupg podman

   push --tls-verify=false

   --sign-by [email protected]

   localhost:5000/custom_httpd

Getting image source signatures

Copying blob 3ba8c926eef9 done  

Copying blob a59107c02e1f done  

Copying blob 352ba846236b done  

Copying config 569b015109 done  

Writing manifest to image destination

Signing manifest

Storing signatures

The preceding code successfully pushed the image blobs to the registry and stored the image signature. Notice the GNUPGHOME variable, which was passed at the beginning of the command to define the GPG keystore path that's accessed by Podman.

Warning

The --sign-by option is not supported on the remote Podman client.

To verify that the image has been signed correctly and that its signature is being saved in the sigstore, we can check the content of /var/lib/containers/sigstore:

$ ls -al /var/lib/containers/sigstore/

drwxr-xr-x. 6 root    root    4096 Jan  5 18:58  .

drwxr-xr-x. 5 root    root    4096 Jan  5 13:29  ..

drwxr-xr-x. 2 root    root    4096 Jan  5 18:58 'custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f844255c3c7a9e 308bf639'

As you will see, the new directory contains the image signature file:

$ ls -al /var/lib/containers/sigstore/'custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f844255c3c7a9 e308bf639'

total 12

drwxr-xr-x. 2 root root 4096 Jan  5 18:58 .

drwxr-xr-x. 6 root root 4096 Jan  5 18:58 ..

-rw-r--r--. 1 root root  730 Jan  5 18:58 signature-1

With that, we have successfully pushed and signed the image, making it more secure for future use. Now, let's learn how to configure Podman to retrieve signed images.

Configuring Podman to pull signed images

To successfully pull a signed image, Podman must be able to retrieve the signature from a sigstore and have access to a public key to verify the signature.

Here, we are dealing with detached signatures, and we have already learned that the registry doesn't hold any information about image signatures. For this reason, we need to make them available to users with a publicly accessible sigstore: a web server (Nginx, Apache httpd, and so on) will be a good fit.

Since the signing host will be the same as the one used to test image pulls, we will run an Apache httpd server that exposes the sigstore staging folder as the server document root. In a real-life scenario, we would move the signatures to a dedicated web server.

For this example, we will use the standard docker.io/library/httpd image and run the container with root privileges to grant access to the sigstore folder:

# podman run -d -p 8080:80

  --name sigstore_server

  -v /var/lib/containers/sigstore:/usr/local/apache2/htdocs:z

  docker.io/library/httpd

The web server is now available at http://localhost:8080 and can be used by Podman to retrieve image signatures.

Now, let's configure Podman for image pulling. First, we must configure the default image sigstore. We have already defined the staging sigstore that's used by Podman to write a signature, so now, we need to define the sigstore that's used to read image signatures.

To do so, we must edit the /etc/containers/registries.d/default.yaml file one more time and add a reference to the default sigstore web server that's running on http://localhost:8080:

default-docker:

  sigstore: http://localhost:8080

  sigstore-staging: file:///var/lib/containers/sigstore

The preceding code configures the sigstore that's used by Podman for all images. However, it is possible to add more sigstores for specific registries by populating the docker field of the file. The following code configures the sigstore for the public Red Hat registry:

docker:

  registry.access.redhat.com:

    sigstore: https://access.redhat.com/webassets/docker/content/sigstore

Before we test the image pulls, we must implement the public key that's used by Podman to verify the signatures. This public key must be stored in the host that pulls the image and belongs to the key pair that's used to sign the image.

The configuration file that's used to define the public key's path is /etc/containers/policy.json.

The following code shows the /etc/containers/policy.json file with a custom configuration for the registry's localhost:5000:

{

    "default": [

        {

            "type": "insecureAcceptAnything"

        }

    ],

    "transports": {

        "docker": {

            "localhost:5000": [

                {

                    "type": "signedBy",

                    "keyType": "GPGKeys",

                    "keyPath": "/tmp/pubkey.gpg"

                }

            ]

        },

        "docker-daemon": {

            "": [

                {

                    "type": "insecureAcceptAnything"

                }

            ]

        }

    }

}

To verify the signatures of images that have been pulled from localhost:5000, we can use a public key that's stored in the path defined by the keyPath field. The public key must exist in the defined path and be readable by Podman.

If we need to extract the public key from the example key pair that was generated at the beginning of this section, we can use the following GPG command:

$ gpg --armor --export [email protected] > /tmp/pubkey.gpg

Now, we are ready to test the image pull and verify its signature:

$ podman pull --tls-verify=false localhost:5000/custom_httpd

Getting image source signatures

Checking if image destination supports signatures

Copying blob 23fdb56daf15 skipped: already exists  

Copying blob d4f13fad8263 skipped: already exists  

Copying blob 96b0fdd0552f done  

Copying config 569b015109 done  

Writing manifest to image destination

Storing signatures

569b015109d457ae5fabb969fd0dc3cce10a3e6683ab60dc10505fc2d68 e769f

The image was successfully pulled into the local store after signature verification using the public key provided.

Now, let's see how Podman behaves when it is unable to correctly verify the signature.

Testing signature verification failures

What if we make the sigstore unavailable? Will Podman still succeed in pulling the image if it's unable to verify the signature? Let's try to stop the local httpd server that exposes the sigstore:

# podman stop sigstore_server

Before pulling it again, let's remove the previously cached image to avoid false positives:

$ podman rmi localhost:5000/custom_httpd

Now, we can try to pull the image again:

$ podman pull --tls-verify=false localhost:5000/custom_httpd

Trying to pull localhost:5000/custom_httpd:latest...

WARN[0000] failed, retrying in 1s ... (1/3). Error: Source image rejected: Get "http://localhost:8080/custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f844255c3c7a9 e308bf639/signature-1": dial tcp [::1]:8080: connect: connection refused

WARN[0001] failed, retrying in 1s ... (2/3). Error: Source image rejected: Get "http://localhost:8080/custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f844255c3c7a9 e308bf639/signature-1": dial tcp [::1]:8080: connect: connection refused

WARN[0002] failed, retrying in 1s ... (3/3). Error: Source image rejected: Get "http://localhost:8080/custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f844255c3c7a9 e308bf639/signature-1": dial tcp [::1]:8080: connect: connection refused

Error: Source image rejected: Get "http://localhost:8080/custom_httpd@sha256=573c1eb93857c0169a606f1820271b143ac5073456f 844255c3c7a9e308bf639/signature-1": dial tcp [::1]:8080: connect: connection refused

The preceding error demonstrates that Podman is trying to connect to the web server that exposes the sigstore and failed. This error blocked the whole image pull process.

A different error occurs when the public key we use to verify the signature is not valid or not part of the key pair that was used to sign the image. To test this, let's replace the public key with another one from a different key pair – in this example, the public Fedora 34 RPM-GPG key, which has been taken from the /etc/pki/rpm-gpg directory (any other public key can be used):

$ mv /tmp/pubkey.gpg /tmp/pubkey.gpg.bak

$ cp /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-34-x86_64

     /tmp/pubkey.gpg

The previously stopped httpd server must be restarted; we want to make the signatures available and focus on the wrong public key error:

# podman start sigstore_server

Now, we can pull the image again and inspect the generated errors:

$ podman pull --tls-verify=false localhost:5000/custom_httpd

Trying to pull localhost:5000/custom_httpd:latest...

Error: Source image rejected: Invalid GPG signature: gpgme.Signature{Summary:128, Fingerprint:"2EA4850C32D29DA22B7659FEC38D92C0F18764AC", Status:gpgme.Error{err:0x9}, Timestamp:time.Time{wall:0x0, ext:63777026489, loc:(*time.Location)(0x560e17e5d680)}, ExpTimestamp:time.Time{wall:0x0, ext:62135596800, loc:(*time.Location)(0x560e17e5d680)}, WrongKeyUsage:false, PKATrust:0x0, ChainModel:false, Validity:0, ValidityReason:error(nil), PubkeyAlgo:1, HashAlgo:8}

Here, Podman generates an error that's caused by an invalid GPG signature, which is correct since the public key that's being used does not belong to the correct key pair.

Important

Do not forget to restore the valid public key before proceeding with the following examples.

Podman can manage multiple registries and sigstores, and also offers dedicated commands to help you customize security policies, as we'll see in the next subsection.

Managing keys with Podman image trust commands

It is possible to edit the /etc/containers/policy.json file and modify its JSON objects to add or remove configurations for dedicated registries. However, manual editing can be prone to errors and hard to automate.

Alternatively, we can use the podman image trust command to dump or modify the current configuration.

The following code shows how to print the current configuration with the podman image trust show command:

$

default         accept                                        

localhost:5000  signedBy                [email protected]  http://localhost:8080

                insecureAcceptAnything                         http://localhost:8080

It is also possible to configure new trusts. For example, we can add the Red Hat public GPG key to check the signature of UBI images.

First, we need to download the Red Hat public key:

$ sudo wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat

  https://www.redhat.com/security/data/fd431d51.txt

Note

Red Hat's product signing keys, including the one that was used in this example, can be found at https://access.redhat.com/security/team/key.

After downloading the key, we must configure the image trust for UBI 8 images that have been pulled from registry.access.redhat.com using the podman image trust set command:

$ sudo podman image trust set -f /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat registry.access.redhat.com/ubi8

After running the preceding command, the /etc/containers/policy.json file will change, as follows:

{

    "default": [

        {

            "type": "insecureAcceptAnything"

        }

    ],

    "transports": {

        "docker": {

            "localhost:5000": [

                {

                    "type": "signedBy",

                    "keyType": "GPGKeys",

                    "keyPath": "/tmp/pubkey.gpg"

                }

            ],

            "registry.access.redhat.com/ubi8": [

                {

                    "type": "signedBy",

                    "keyType": "GPGKeys",

                    "keyPath": "/etc/pki/rpm-gpg/RPM-GPG-KEY-redhat"

                }

            ]

        },

        "docker-daemon": {

            "": [

                {

                    "type": "insecureAcceptAnything"

                }

            ]

        }

    }

Note that the entry that's related to registry.access.redhat.com/ubi8 and the public key that was used to verify the image signatures have been added to the file.

To complete the configuration, we need to add the Red Hat sigstore configuration to the /etc/containers/registries.d/default.yaml configuration file:

docker:

  registry.access.redhat.com:

    sigstore: https://access.redhat.com/webassets/docker/content/sigstore

Tip

It is possible to create custom registry configuration files for different providers in the /etc/containers/registries.d folder. For example, the preceding example could be defined in a dedicated /etc/containers/registries.d/redhat.yaml file. This allows you to easily maintain and version registry sigstore configurations.

From now on, every time a UBI8 image is pulled from registry.access.redhat.com, its signature will be pulled from the Red Hat sigstore and validated using the provided public key.

So far, we have looked at examples of managing keys concerning Podman, but it is also possible to manage signature verification with Skopeo. In the next subsection, we are going to look at some basic examples.

Managing signatures with Skopeo

We can verify an image signature using Skopeo when we're pulling an image from a valid transport.

The following example uses the skopeo copy command to pull the image from our registry to the local store. This command has the same effects as using a podman pull command but allows more control over the source and destination transports:

$ skopeo copy --src-tls-verify=false

  docker://localhost:5000/custom_httpd

  containers-storage:localhost:5000/custom_httpd

Skopeo does not need any further configuration since the previously modified configuration files already define the sigstore and public key path.

We can also use Skopeo to sign an image before copying it to a transport:

$ sudo GNUPGHOME=$HOME/.gnupg skopeo copy

   --dest-tls-verify=false

   --sign-by [email protected]

   containers-storage:localhost:5000/custom_httpd

   docker://localhost:5000/custom_httpd

Once again, the configuration files that are used by Podman are still valid for Skopeo, which uses the same sigstore to write the signatures and the same GPG store to retrieve the key that's used to generate the signature.

In this section, we learned how to verify image signatures and avoid potential MITM attacks. In the next section, we'll shift focus and learn how to execute the container runtime by customizing Linux kernel capabilities.

Customizing Linux kernel capabilities

Capabilities are features that were introduced in Linux kernel 2.2 with the purpose of splitting elevated privileges into single units that can be arbitrarily assigned to a process or thread.

Instead of running a process as a fully privileged instance with effective UID 0, we can assign a limited subset of specific capabilities to an unprivileged process. By providing more granular control over the security context of the process's execution, this approach helps mitigate potential attack tactics.

Before we discuss the capabilities of containers, let's recap on how they work in a Linux system so that we understand their inner logic.

Capabilities quickstart guide

Capabilities are associated with the file executables using extended attributes (see man xattr) and are automatically inherited by the process that's executed with an execve() system call.

The list of available capabilities is quite large and still growing; it includes very specific actions that can be performed by a thread. Some basic examples are as follows:

  • CAP_CHOWN: This capability allows a thread to modify a file's UID and GID.
  • CAP_KILL: This capability allows you to bypass the permission checks to send a signal to a process.
  • CAP_MKNOD: This capability allows you to create a special file with the mknod() syscall.
  • CAP_NET_ADMIN: This capability allows you to operate various privileged actions on the system's network configuration, including changing the interface configuration, enabling/disabling promiscuous mode for an interface, editing routing tables, and enabling/disabling multicasting.
  • CAP_NET_RAW: This capability allows a thread to use RAW and PACKET sockets. This capability can be used by programs such as ping to send ICMP packets without the need for elevated privileges.
  • CAP_SYS_CHROOT: This capability allows you to use the chroot() syscall and change mount namespaces with the setns() syscall.
  • CAP_DAC_OVERRIDE: This capability allows you to bypass discretionary access control (DAC) checks for file read, write, and execution.

For more details and an extensive list of available capabilities, see the relevant man page (man capabilities).

To assign a capability to an executable, we can use the setcap command, as shown in the following example, where CAP_NET_ADMIN and CAP_NET_RAW are being permitted in the /usr/bin/ping executable:

$ sudo setcap 'cap_net_admin,cap_net_raw+p' /usr/bin/ping

The '+p' flag in the preceding command indicates that the capabilities have been set to Permitted.

To inspect the capabilities of a file, we can use the getcap command:

$ getcap /usr/bin/ping

/usr/bin/ping cap_net_admin,cap_net_raw=p

See man getcap and man setcap for more details about these utilities.

We can inspect the active capabilities of a running process by looking at the /proc/<PID>/status file. In the following code, we are launching a ping command after setting the CAP_NET_ADMIN and CAP_NET_RAW capabilities. We want to launch the process in the background and check its current capabilities:

$ ping example.com > /dev/null 2>&1 &

$ grep 'Cap.*' /proc/$(pgrep ping)/status

CapInh: 0000000000000000

CapPrm: 0000000000003000

CapEff: 0000000000000000

CapBnd: 000000ffffffffff

CapAmb: 0000000000000000

Here, we are interested in evaluating the bitmap in the CapPrm field, which represents the permitted capabilities. To get a user-friendly value, we can use the capsh command to decode the bitmap hex value:

$ capsh --decode=0000000000003000

0x0000000000003000=cap_net_admin,cap_net_raw

The result is similar to the output of the getcap command in the /usr/bin/ping file, demonstrating that executing the command propagated the file's permitted capabilities to its process instance.

For a full list of the constants that were used to set the bitmaps, as well as their capabilities, see the following kernel header file: https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h.

Tip

Distributions such as RHEL and CentOS use the preceding configuration to allow the ping to send ICMP packets with access from all users without them being executed as privileged processes with setuid 0. This is an insecure approach where an attacker could leverage a vulnerability or bug in the executable to escalate privileges and gain control of the system.

Fedora introduced a new and more secure approach in version 31 that's based on using the net.ipv4.ping_group_range Linux kernel parameter. By setting an extensive range that covers all system groups, this parameter allows users to send ICMP packets without the need to enable the CAP_NET_ADMIN and CAP_NET_RAW capabilities.

For more details, see the following wiki page from the Fedora Project: https://fedoraproject.org/wiki/Changes/EnableSysctlPingGroupRange.

Now that we've provided a high-level description of the Linux kernel's capabilities, let's learn how they are applied to containers.

Capabilities in containers

Capabilities can be applied inside containers to allow targeted actions to take place. By default, Podman runs containers using a set of Linux kernel capabilities that are defined in the /usr/share/containers/containers.conf file. At the time of writing, the following capabilities are enabled inside this file:

default_capabilities = [

    "CHOWN",

    "DAC_OVERRIDE",

    "FOWNER",

    "FSETID",

    "KILL",

    "NET_BIND_SERVICE",

    "SETFCAP",

    "SETGID",

    "SETPCAP",

    "SETUID",

    "SYS_CHROOT"

]

We can run a simple test to verify that those capabilities have been effectively applied to a process running inside a container. For this test, we will use the official Nginx image:

$ podman run -d --name cap_test docker.io/library/nginx

$ podman exec -it cap_test sh -c 'grep Cap /proc/1/status'

CapInh: 00000000800405fb

CapPrm: 00000000800405fb

CapEff: 00000000800405fb

CapBnd: 00000000800405fb

CapAmb: 0000000000000000

Here, we have extracted the current capabilities from the parent Nginx process (running with PID 1 inside the container). Now, we can check the bitmap with the capsh utility:

$ capsh --decode=00000000800405fb

0x00000000800405fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_setfcap

The preceding list of capabilities is the same as the list that was defined in the default Podman configuration. Note that the capabilities are applied in both rootless and rootful mode.

Note

If you're curious, the capabilities for the containerized process(es) are set up by the container runtime, which is either runc or crun, based on the distribution.

Now that we know how capabilities are configured and applied inside containers, let's learn how to customize a container's capabilities.

Customizing a container's capabilities

We can add or drop capabilities either at runtime or statically.

To statically change the default capabilities, we can simply edit the default_capabilities field in the /usr/share/containers/containers.conf file and add or remove them according to our desired results.

To modify capabilities at runtime, we can use the –cap-add and –cap-drop options, both of which are provided by the podman run command.

The following code removes the CAP_DAC_OVERRIDE capability from a container:

$ podman run -d --name cap_test2 --cap-drop=DAC_OVERRIDE docker.io/library/nginx

If we look at the capability bitmaps again, we will see that they were updated accordingly:

$ podman exec cap_test2 sh -c 'grep Cap /proc/1/status'

CapInh: 00000000800405f9

CapPrm: 00000000800405f9

CapEff: 00000000800405f9

CapBnd: 00000000800405f9

CapAmb: 0000000000000000

$ capsh --decode=00000000800405f9

0x00000000800405f9=cap_chown,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_setfcap

It is possible to pass the --cap-add and --cap-drop options multiple times:

$ podman run -d --name cap_test3

   --cap-drop=KILL

   --cap-drop=DAC_OVERRIDE

   --cap-add=NET_RAW

   --cap-add=NET_ADMIN

   docker.io/library/nginx

When we're dealing with capabilities, we must be careful while dropping a default capability. The following code shows an error in the Nginx container when dropping the CAP_CHOWN capability:

$ podman run --name cap_test4

  --cap-drop=CHOWN

  docker.io/library/nginx

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration

/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/

/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh

10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf

10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf

/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh

/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh

/docker-entrypoint.sh: Configuration complete; ready for start up

2022/01/06 23:19:39 [emerg] 1#1: chown("/var/cache/nginx/client_temp", 101) failed (1: Operation not permitted)

nginx: [emerg] chown("/var/cache/nginx/client_temp", 101) failed (1: Operation not permitted)

Here, the container fails. From the output, we can see that the Nginx process was unable to show the /var/cache/nginx/client_temp directory. This is a direct consequence of the CAP_CHOWN capability being removed.

Not all capabilities can be applied to rootless containers. For example, if we try to apply the CAP_MKNOD capability to a rootless container, any attempt to create a special file inside a rootless container won't be allowed by the kernel:

$ podman run -it --cap-add=MKNOD

  docker.io/library/busybox /bin/sh

/ # mkdir -p /test/dev

/ # mknod -m 666 /test/dev/urandom c 1 8

mknod: /test/dev/urandom: Operation not permitted

Instead, if we run the container with elevated root privileges, the capability can be assigned successfully:

# podman run -it --cap-add=MKNOD

  docker.io/library/busybox /bin/sh

/ # mkdir -p /test/dev

/ # mknod -m 666 /test/dev/urandom c 1 8

/ # stat /test/dev/urandom

File: /test/dev/urandom

  Size: 0          Blocks: 0          IO Block: 4096   character special file

Device: 31h/49d Inode: 530019      Links: 1     Device type: 1,8

Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)

Access: 2022-01-06 23:50:06.056650747 +0000

Modify: 2022-01-06 23:50:06.056650747 +0000

Change: 2022-01-06 23:50:06.056650747 +0000

Note

Generally, adding capabilities to containers implies enlarging the potential attack surface that a malicious attacker could use. If it's not necessary, it is a good practice to keep the default capabilities and drop the unwanted ones once the potential side effects have been analyzed.

In this section, we learned how to manage capabilities inside containers. However, capabilities are not the only security aspect to consider when you're securing containers. SELinux, as we will learn in the next section, has a crucial role in guaranteeing container isolation.

SELinux interaction with containers

In this section, we will discuss SELinux policies and introduce Udica, a tool that's used to generate SELinux profiles for containers.

SELinux works directly in kernel space and manages object isolation while following a least-privilege model that contains a series of policies that can handle enforcing or exceptions. To define these objects, SELinux uses labels that define types. By default, SELinux works in Enforcing mode, denying access to resources with a series of exceptions defined by policies. To disable Enforcing mode, SELinux can be put in Permissive mode, where violations are only audited, without them being blocked.

Security Alert

As we mentioned previously, switching SELinux to Permissive mode or completely disabling it is not a good practice as it opens you up to potential security threats. Instead of doing that, users should create custom policies to manage the necessary exceptions.

By default, SELinux uses a targeted policy type, which tries to target and confine specific object types (processes, files, devices, and so on) using a set of predefined policies.

SELinux allows different kinds of access control. They can be summarized as follows:

  • Type Enforcement (TE): This controls access to resources according to process and file types. This is the main use case of SELinux access control.
  • Role-Based Access Control (RBAC): This controls access to resources using SELinux users (which can be mapped to real system users) and their associated SELinux roles.
  • Multi-Level Security (MLS): This grants all processes with the same sensitivity level read/write access to the resources.
  • Multi-Category Security (MCS): This controls access using categories, which are plain text labels that are applied to resources. Categories are used to create compartments of objects, along with the other SELinux labels. Only processes that belong to the same category can access a given resource. In Chapter 5, Implementing Storage for the Container's Data, we discussed MCS and how we can map categories to resources that have been accessed by containers.

With Type Enforcement, the system files receive labels called types, while processes receive labels called domains. A process that belongs to a domain can be allowed to access a file that belongs to a given type, and this access can be audited by SELinux.

For example, according to SELinux, the Apache httpd process, which is labeled with the httpd_t domain, can access files or directories with httpd_sys_content_t labels.

An SELinux-type policy is based on the following pattern:

POLICY DOMAIN TYPE:CLASS OPERATION;

Here, POLICY is the kind of policy (allow, allowxperm, auditallow, neverallow, dontaudit, and so on), DOMAIN is the process domain, TYPE is the resource type context, CLASS is the object category (for example, file, dir, lnk_file, chr_file, blk_file, sock_file, or fifo_file), and OPERATION is a list of actions that are handled by the policy (for example, open, read, use, lock, getattr, or revc).

The following example shows a basic allow rule:

allow myapp_t myapp_log_t:file { read_file_perms append_file_perms };

In this example, the process that's running in the myapp_t domain is allowed to access files of the myapp_log_t type and perform the read_file_perms and append_file_perms actions.

SELinux manages policies in a modular fashion, allowing you to dynamically load and unload policy modules without the need to recompile the whole policy set every time. Policies can be loaded and unloaded using the semodule utility, as shown in the following example, which shows an example of loading a custom policy:

# semodule -i custompolicy.pp

The semodule utility can also be used to view all the loaded policies:

# semodule -l

On Fedora, CentOS, RHEL, and derivate distributions, the current binary policy is installed under the /etc/selinux/targeted/policy directory in a file named polixy.XX, with XX representing the policy version.

On the same distributions, container policies are defined inside the container-selinux package, which contains the already compiled SELinux module. The source code of the package is available on GitHub if you wish to look at it in more detail: https://github.com/containers/container-selinux.

By looking at the repository's content, we will find the three most important policy source files for developing any module:

  • container.fc: This file defines the files and directories that are bound to the types defined in the module.
  • container.te: This file defines the policy rules, attributes, and aliases.
  • container.if: This file defines the module interface. It contains a set of public macro functions that are exposed by the module.

A process that's running inside a container is labeled with the container_t domain. It has read/write access to resources labeled with the container_file_t type context and read/execute access to resources labeled with the container_share_t type context.

When a container is executed, the podman process, as well as the container runtime and the conmon process, run with the container_runtime_t domain type and are allowed to execute processes that transition only to specific types. Those types are grouped in the container_domain attribute and can be inspected with the seinfo utility (installed with the setools-console package on Fedora), as shown in the following code:

$ seinfo -a container_domain -x

Type Attributes: 1

   attribute container_domain;

container_engine_t

container_init_t

container_kvm_t

container_logreader_t

container_t

container_userns_t

spc_t

The container_domain attribute is declared in the container.te source file in the container-policy repository using the attribute keyword:

attribute container_domain;

attribute container_user_domain;

attribute container_net_domain;

The preceding attributes are mapped to the container_t type using a typeattribute declaration:

typeattribute container_t container_domain, container_net_domain, container_user_domain;

Using this approach, SELinux guarantees process isolations across containers and between a container and its host. In this way, a process escaping the container (maybe exploiting a vulnerability) cannot access resources on the host or inside other containers.

When a container is created, the image's read-only layers, which form the OverlayFS set of LowerDirs, are labeled with the container_ro_file_t type, which prevents the container from writing inside those directories. At the same time, MergedDir, which is the sum of LowerDirs and UpperDir, is writable and labeled as container_file_t.

To prove this, let's run a rootful container with the c1 and c2 MCS categories:

# podman run -d --name selinux_test1 --security-opt label=level:s0:c1,c2 nginx

Now, we can find all the files labeled as container_file_t:s0:c1,c2 under the host filesystem:

# find /var/lib/containers/storage/overlay -type f -context '*container_file_t:s0:c1,c2*' -printf '%-50Z%p '

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/x86_64-linux-gnu/libreadline.so.8.1

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/x86_64-linux-gnu/libhistory.so.8.1

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/x86_64-linux-gnu/libexpat.so.1.6.12

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/udev/rules.d/96-e2scrub.rules

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/terminfo/r/rxvt-unicode-256color

system_u:object_r:container_file_t:s0:c1,c2       /var/lib/containers/storage/overlay/4b147975bb5c336b10e71d21c49fe88ddb00d0569b77ddab1 d7737f80056677b/merged/lib/terminfo/r/rxvt-unicode

[…output omitted...]

As expected, the container_file_t label, which is associated with the c1 and c2 categories, is applied to all the files under the MergedDir container.

At the same time, we can demonstrate that the container's LowerDirs are labeled as container_ro_file_t. First, we need to extract the container's LowerDirs list:

# podman inspect selinux_test1

  --format '{{.GraphDriver.Data.LowerDir}}'

/var/lib/containers/storage/overlay/9566cbcf1773eac59951c14c52156a6164db1b0d8026d015 e193774029db18a5/diff:/var/lib/containers/storage/overlay/24de59cced7931bbcc0c4a34d4369c15119a0b8b180f98a0434 fa76a6dfcd490/diff:/var/lib/containers/storage/overlay/1bb84245b98b7e861c91ed4319972ed3287bdd2ef02a8657c696 a76621854f3b/diff:/var/lib/containers/storage/overlay/97f26271fef21bda129ac431b5f0faa03ae0b2b50bda6af 969315308fc16735b/diff:/var/lib/containers/storage/overlay/768ef71c8c91e4df0aa1caf96764ceec999d7eb0aa584 e241246815c1fa85435/diff:/var/lib/containers/storage/overlay/2edcec3590a4ec7f40cf0743c15d78fb39d8326bc029073 b41ef9727da6c851f/diff

The rightmost directory represents the container's lowest layer and is usually the base filesystem tree of the image. Let's inspect the type context of this directory:

# ls -alZ /var/lib/containers/storage/overlay/2edcec3590a4ec7f40 cf0743c15d78fb39d8326bc029073b41ef9727da6c851f/diff

total 84

dr-xr-xr-x. 21 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Jan  5 23:16 .

drwx------.  6 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Jan  5 23:16 ..

drwxr-xr-x.  2 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 20 00:00 bin

drwxr-xr-x.  2 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 11 17:25 boot

drwxr-xr-x.  2 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 20 00:00 dev

drwxr-xr-x. 30 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 20 00:00 etc

drwxr-xr-x.  2 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 11 17:25 home

drwxr-xr-x.  8 root root unconfined_u:object_r:container_ro_file_t:s0 4096 Dec 20 00:00 lib

[...omitted output...]

The preceding output also shows another interesting aspect: since the LowerDir layers are shared across multiple containers that use the same image, we won't find any MCS categories that have been applied here.

Containers do not have read/write access to files or directories that are not labeled as container_file_t. Previously, we saw that it is possible to relabel those files by applying the :z suffix to mounted volumes or by manually relabeling them in advance before running the containers.

However, relabeling crucial directories such as /home or /var/logs is a very bad idea since many other non-containerized processes won't be able to access them anymore.

The only solution is to manually create custom policies that override the default behavior. However, this is too complex to manage in everyday use and production environments.

Luckily, we can solve this limitation with a tool that generates custom SELinux security profiles for our containers: Udica.

Introducing Udica

Udica is an open source project (https://github.com/containers/udica) that was created by Lukas Vrabec, SELinux evangelist and team leader of the SELinux and Security Special Projects engineering teams at Red Hat.

Udica aims to overcome the rigid policy limitations that were described previously by generating SELinux profiles for containers and allowing them to access resources that would normally be prevented with the common container_t domain.

To install Udica on Fedora, simply run the following command:

$ sudo dnf install -y udica setools-console container-selinux

On other distributions, Udica can be installed from its source by running the following commands:

$ sudo dnf install -y setools-console git container-selinux

$ git clone

$ cd udica && sudo python3 ./setup.py install

To demonstrate how Udica works, we are going to create a container that writes to the /var/log directory of the host, which is bind-mounted when the container is created. By default, the process with the container_t domain would not be able to write a directory labeled with the var_log_t type.

The following script, which has been executed inside the container, is an endless loop that writes a log line composed of the current date and a counter:

Chapter11/custom_logger/logger.sh

#!/bin/bash

set -euo pipefail

trap "echo Exited; exit;" SIGINT SIGTERM

# Run an endless loop writing a simple log entry with date

count=1

while true; do

echo "$(date +%y/%m/%d_%H:%M:%S) - Line #$count" | tee -a /var/log/custom.log

  count=$((count+1))

  sleep 2

done

The preceding script uses the set -euo pipefail option, to exit immediately in case an error occurs, and the tee utility, to write both to standard output and the /var/log/custom.log file in append mode. The count variable increments on each loop cycle.

The Dockerfile for this container is kept minimal – it just copies the logger script and executes it at container startup:

Chapter11/custom_logger/Dockerfile

FROM docker.io/library/fedora

# Copy the logger.sh script

COPY logger.sh /

# Exec the logger.sh script

CMD ["/logger.sh"]

Important

The logger.sh script must be executed before the build so that it can be launched correctly at container startup.

The container image is built with the name custom_logger:

# cd /Chapter11/custom_logger

# buildah build -t custom_logger .

Now, it's time to test the container and see how it behaves. The /var/log directory is bind-mounted with rw permissions to the container's /var/log, without this altering its type context. We should keep the execution in the foreground to see the immediate output:

# podman run -v /var/log:/var/log:rw

  --name custom_logger1 custom_logger

tee: /var/log/custom.log: Permission denied

22/01/08_09:09:33 - Custom log event #1

As expected, the script failed to write to the target file. We could fix this by changing the directory type context to container_file_t but, as we learned previously, this is a poor idea since it would prevent other processes from writing their logs.

Instead, we can use Udica to generate a custom SELinux security profile for the container. In the following code, the container specs are exported to a container.json file and then parsed by Udica to generate a custom profile called custom_logger:

# podman inspect custom_logger1 > container.json

# udica -j container.json custom_logger

Policy custom_logger created!

Please load these modules using:

# semodule -i custom_logger.cil /usr/share/udica/templates/{base_container.cil,log_container.cil}

Restart the container with: "--security-opt label=type:custom_logger.process" parameter

Once the profile has been generated, Udica outputs the instructions to configure the container. First, we need to load the new custom policy using the semodule utility. The generated file is in Common Intermediate Language (CIL) format, an intermediate policy language for SELinux. Along with the generated CIL file, the example loads some Udica templates, /usr/share/udica/templates/base_container.cil and /usr/share/udica/templates/log_container.cil, whose rules are inherited in the custom container policy file.

Let's load the modules using the suggested command:

# semodule -i custom_logger.cil /usr/share/udica/templates/{base_container.cil,log_container.cil}

After loading the modules in SELinux, we are ready to run the container with the custom custom_logger.process label, passing it as an argument to the Podman --security-opt option. The other container option was kept identical, except for its name, which has been updated to custom_logger2 to differentiate it from the previous instance:

# podman run -v /var/log:/var/log:rw

  --name custom_logger2

  --security-opt label=type:custom_logger.process

  custom_logger

22/01/08_09:05:19 - Line #1

22/01/08_09:05:21 - Line #2

22/01/08_09:05:23 - Line #3

22/01/08_09:05:25 - Line #5

[...Omitted output...]

This time, the script successfully wrote to the /var/log/custom.log file thanks to the custom profile that was generated with Udica.

Note that the container processes are not running with the container_t domain, but with the new custom_logger.process superset, which includes additional rules on top of the default.

We can confirm this by running the following command on the host:

# ps auxZ | grep 'custom_logger.process'

unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 26546 0.1  0.6 1365088 53768 pts/0 Sl+ 09:16   0:00 podman run -v /var/log:/var/log:rw --security-opt label=type:custom_logger.process custom_logger system_u:system_r:custom_logger.process:s0:c159,c258 root 26633 0.0  0.0 4180 3136 ? Ss 09:16   0:00 /bin/bash /logger.sh

system_u:system_r:custom_logger.process:s0:c159,c258 root 26881 0.0  0.0 2640 1104 ? S 09:18   0:00 sleep 2

Udica creates the custom policy by parsing the JSON spec file and looking for the container mount points, ports, and capabilities. Let's look at the content of the generated custom_logger.cil file from our example:

(block custom_logger

    (blockinherit container)

    (allow process process ( capability ( chown dac_override fowner fsetid kill net_bind_service setfcap setgid setpcap setuid sys_chroot )))

    (blockinherit log_rw_container)

The CIL language syntax is beyond the scope of this book, but we still can notice some interesting things:

  • The custom_logger profile is defined by a block statement.
  • The allow rule enables the default capabilities for the container.
  • The policy inherits the container and log_rw_container blocks with the blockinherit statements.

The generated CIL file inherits the blocks that have been defined in the available Udica templates, each one focused on specific actions. On Fedora, the templates are installed via the container-selinux package and are available in the /usr/share/udica/templates/ folder:

# ls -1 /usr/share/udica/templates/

base_container.cil

config_container.cil

home_container.cil

log_container.cil

net_container.cil

tmp_container.cil

tty_container.cil

virt_container.cil

x_container.cil

The available templates are implemented for common scenarios, such as accessing log directories or user homes, or even for opening network ports. Among them, the base_container.cil template is always included by all the Udica-generated policies as the base building block that's used to generate the custom policies.

According to the behavior of the container that's derived from the spec file, other templates are included. For example, the policy inherited the log_rw_container block from the log_container.cil template to let the custom logger container access the /var/log directory.

Udica is a great tool for addressing container isolation issues and helps administrators address SELinux confinement use cases by overcoming the complexity of writing rules manually.

Generated security profiles can also be versioned inside a GitHub repository and reused for similar containers on different hosts.

Summary

In this chapter, we learned how to develop and apply techniques to improve the overall security of our container-based service architecture. We learned how leveraging rootless containers and avoiding UID 0 can reduce the attack surface of our services. Then, we learned how to sign and trust container images to avoid MITM attacks. Finally, we went under the hood of a containers' tools and looked at the Linux kernel's capabilities and the SELinux subsystem, which can help us fine-tune various security aspects for our running containers.

Now that we've done a deep dive into security, we are ready to move on to the next chapter, where we will take an advanced look at networking for containers.

Further reading

For more information about the topics that were covered in this chapter, take a look at the following resources:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset