SIGTERM isn't sent to the application process

In Chapter 2, DevOps with Containers, we learned there are two forms to invoke our program when writing a Dockerfile: the shell form and the exec form. The shell to run the shell form commands defaults to /bin/sh -c on Linux containers. Hence, there are a few questions related to whether SIGTERM can be received by our applications:

How is our application invoked?
What shell implementation is used in the image?
How does the shell implementation deal with the -c parameter?

Let's approach these questions one by one. The Dockerfile used in the following example can be found here: https://github.com/PacktPublishing/DevOps-with-Kubernetes-Second-Edition/tree/master/chapter9/9-3_on_pods/graceful_docker.

Say we're using the shell form command, CMD python -u app.py, in our Dockerfile to execute our application. The starting command of the container would be /bin/sh -c "python3 -u app.py". When the container starts, the structure of the processes inside it is as follows:

# the image is from "graceful_docker/Dockerfile.shell-sh"
$ kubectl run --generator=run-pod/v1 
--image=devopswithkubernetes/ch93:shell-sh my-app
pod/my-app created
$ kubectl exec my-app ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 /bin/sh -c python3 -u app.py
    6 ?        S      0:00 python3 -u app.py
    7 ?        Rs     0:00 ps ax

We can see that the PID 1 process isn't our application with handlers; it's the shell instead. When we try to kill the pod, SIGTERM will be sent to the shell rather than to our application, and the pod will be terminated after the grace period expires. We can check the log in our application when deleting it to see whether it received SIGTERM:

$ kubectl delete pod my-app &
pod "my-app" deleted
$ kubectl logs -f my-app
$ 1544368565.736720800 - [app] starting server.
rpc error: code = Unknown desc = Error: No such container: 2f007593553cfb700b0aece1f8b6045b4096b2f50f97a42e684a98e502af29ed

Our application exited without going to the stop handler in the code. There are a couple of ways to properly promote our application to PID 1. For example, we can explicitly call exec in the shell form, such as CMD exec python3 -u app.py, so that our program will inherit PID 1. Or, we can choose the exec form, CMD [ "python3", "-u", "app.py" ], to execute our program directly:

## shell form with exec
$ kubectl run --generator=run-pod/v1 
--image=devopswithkubernetes/ch93:shell-exec my-app-shell-exec
pod/my-app-shell-exec created
$ kubectl exec my-app-exec ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 python3 -u app.py
    5 ?        Rs     0:00 ps ax
## delete the pod in another terminal
$ kubectl logs -f my-app-shell-exec
1544368913.313778162 - [app] starting server.
1544369448.991261721 - [app] stopping server.
rpc error: code = Unknown desc =...

## exec form
$ kubectl run --generator=run-pod/v1 
--image=devopswithkubernetes/ch93:exec-sh my-app-exec
pod/my-app-exec created
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 python3 -u app.py
    5 ?        Rs     0:00 ps ax
$ kubectl logs -f my-app-exec
1544368942.935727358 - [app] starting server.
1544369503.846865654 - [app] stopping server.
rpc error: code = Unknown desc =...

The program, executed in either way, can now receive SIGTERM properly. Besides, if we need to set up the environment with a shell script for our program, we should either trap signals in the script to propagate them to our program, or use the exec call to invoke our program so that the handler in our application is able to work as desired.

The second and the third questions are about the shell implication: how could it affect our graceful handler? Again, the default command of a Docker container in Linux is /bin/sh -c. As sh differs among popular Docker images, the way it handles -c could also affect the signals if we're using the shell form. For example, Alpine Linux links ash to /bin/sh, and the Debian family of distributions use dash. Before Alpine 3.8 (or BusyBox 1.28.0), ash forks a new process when using sh -c, and it uses exec in 3.8. We can observe the difference with ps, where we can see the one in 3.7 gets PID 6 while it's PID 1 in 3.8:

$ docker run alpine:3.7 /bin/sh -c "ps ax"
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c ps ax
    6 root       0:00 ps ax
$ docker run alpine:3.8 /bin/sh -c "ps ax"
PID   USER     TIME  COMMAND
    1 root      0:00 ps ax

How do dash and bash handle these cases? Let's take a look:

## there is no ps inside the official debian image, Here we reuse the one from above, which is also based on the debian:
$ docker run devopswithkubernetes/ch93:exec-sh /bin/sh -c "ps ax"
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 /bin/sh -c ps ax
    6 ?        R      0:00 ps ax
$ docker run devopswithkubernetes/ch93:exec-sh /bin/bash -c "ps ax"
  PID TTY      STAT   TIME COMMAND
    1 ?        Rs     0:00 ps ax

As we can see, their results are different as well. Our application can now respond to the terminating event appropriately. There is one more thing, however, that could potentially harm our system if our application is run as PID 1 and it uses more than one process inside the container.

On Linux, a child process becomes a zombie if its parent doesn't wait for its execution. If the parent dies before its child process ends, the init process should adopt those orphaned processes and reap processes that become zombies. System programs know how to deal with orphaned processes, so zombie processes are not a problem most of the time. However, in a containerized context, the process that holds PID 1 is our application, and the operating system would expect our application to reap zombie processes. Because our application isn't designed to act as a proper init process, however, handling the state of child processes is unrealistic. If we just ignore it, at worst the process table of the node will be filled with zombie processes and we won't be able to launch new programs on the node anymore. In Kubernetes, if a pod with zombie processes is gone, then all zombie processes inside will be cleaned. Another possible scenario is if our application performs some tasks frequently through scripts in the background, which could potentially fork lots of processes. Let's consider the following simple example:

$ kubectl run --generator=run-pod/v1 
--image=devopswithkubernetes/ch93:exec-sh my-app-exec
pod/my-app-exec created
## let's enter our app pod and run sleep in background inside it
$ kubectl exec -it my-app-exec /bin/sh
# ps axf
  PID TTY      STAT   TIME COMMAND
    5 pts/0    Ss     0:00 /bin/sh
   10 pts/0    R+     0:00  \_ ps axf
    1 ?        Ss     0:00 python3 -u app.py
# sleep 30 &
# ps axf
  PID TTY      STAT   TIME COMMAND
    5 pts/0    Ss     0:00 /bin/sh
   11 pts/0    S      0:00  \_ sleep 30
   12 pts/0    R+     0:00  \_ ps axf
    1 ?        Ss     0:00 python3 -u app.py

## now quit kubectl exec, wait 30 seconds, and check the pod again
$ kubectl exec my-app-exec ps axf
  PID TTY      STAT   TIME COMMAND
   23 ?        Rs     0:00 ps axf
    1 ?        Ss     0:00 python3 -u app.py
   11 ?        Z      0:00 [sleep] <defunct>

sleep 30 is now a zombie in our pod. In Chapter 2, DevOps with Containers, we mentioned that the docker run --init parameter can set a simple init process for our container. In Kubernetes, we can make the pause container, a special container that deals with those chores silently for us, be present in our pod by specifying .spec.shareProcessNamespace in the pod specification:

$ kubectl apply -f chapter9/9-3_on_pods/sharepidns.yml
pod/my-app-with-pause created
$ kubectl exec my-app-with-pause ps ax
    1 ?        Ss     0:00 /pause
    6 ?        Ss     0:00 python3 -u app.py
   10 ?        Rs     0:00 ps ax

The pause process ensures that zombies are reaped and SIGTERM goes to our application process. Notice that by enabling process namespace sharing, aside from our application no longer having PID 1, there are two other key differences:

All containers in the same pod share process information with each other, which means a container can send signals to another container
The filesystem of containers can be accessed via the /proc/$PID/root path

If the described behaviors aren't feasible to your application while an init process is still needed, you can opt for Tini (https://github.com/krallin/tini), or dump-init (https://github.com/Yelp/dumb-init), or even write a wrapper script to resolve the zombie reaping problem.

Table of Contents for SIGTERM isn't sent to the application process

Create new playlist

Sign In

Sign Up

Table of Contents for
SIGTERM isn't sent to the application process