Containerized applications managed by cloud-native platforms have no control over their lifecycle, and to be good cloud-native citizens, they have to listen to the events emitted by the managing platform and adapt their lifecycles accordingly. The Managed Lifecycle pattern describes how applications can and should react to these lifecycle events.
In Chapter 4, Health Probe we explained why containers have to provide APIs for the different health checks. Health-check APIs are read-only endpoints the platform is continually probing to get application insight. It is a mechanism for the platform to extract information from the application.
In addition to monitoring the state of a container, the platform sometimes may issue commands and expect the application to react on these. Driven by policies and external factors, a cloud-native platform may decide to start or stop the applications it is managing at any moment. It is up to the containerized application to determine which events are important to react to and how to react. But in effect, this is an API that the platform is using to communicate and send commands to the application. Also, applications are free to either benefit from lifecycle management or ignore it if they don’t need this service.
We saw that checking only the process status is not a good enough indication of the health of an application. That is why there are different APIs for monitoring the health of a container. Similarly, using only the process model to run and stop a process is not good enough. Real-world applications require more fine-grained interactions and lifecycle management capabilities. Some applications need help to warm up, and some applications need a gentle and clean shutdown procedure. For this and other use cases, some events, as shown in Figure 5-1, are emitted by the platform that the container can listen to and react to if desired.
The deployment unit of an application is a Pod. As you already know, a Pod is composed of one or more containers. At the Pod level, there are other constructs such as init containers, which we cover in Chapter 14, Init Container (and defer-containers, which is still at the proposal stage as of this writing) that can help manage the container lifecycle. The events and hooks we describe in this chapter are all applied at an individual container level rather than Pod level.
Whenever Kubernetes decides to shut down a container, whether that is because the Pod it belongs to is shutting down or simply a failed liveness probe causes the container to be restarted, the container receives a SIGTERM signal. SIGTERM is a gentle poke for the container to shut down cleanly before Kubernetes sends a more abrupt SIGKILL signal. Once a SIGTERM signal has been received, the application should shut down as quickly as possible. For some applications, this might be a quick termination, and some other applications may have to complete their in-flight requests, release open connections, and clean up temp files, which can take a slightly longer time. In all cases, reacting to SIGTERM is the right moment to shut down a container in a clean way.
If a container process has not shut down after a SIGTERM signal, it is shut down forcefully by the following SIGKILL signal. Kubernetes does not send the SIGKILL signal immediately but waits for a grace period of 30 seconds by default after it has issued a SIGTERM signal. This grace period can be defined per Pod using the .spec.terminationGracePeriodSeconds
field, but cannot be guaranteed as it can be overridden while issuing commands to Kubernetes. The aim here should be to design and implement containerized applications to be ephemeral with quick startup and shutdown processes.
Using only process signals for managing lifecycles is somewhat limited. That is why there are additional lifecycle hooks such as postStart
and preStop
provided by Kubernetes. A Pod manifest containing a postStart
hook looks like the one in Example 5-1.
apiVersion
:
v1
kind
:
Pod
metadata
:
name
:
post-start-hook
spec
:
containers
:
-
image
:
k8spatterns/random-generator:1.0
name
:
random-generator
lifecycle
:
postStart
:
exec
:
command
:
-
sh
-
-c
-
sleep
30
&&
echo
"Wake
up!"
>
/tmp/postStart_done
The postStart
command is executed after a container is created, asynchronously with the primary container’s process. Even if many of the application initialization and warm-up logic can be implemented as part of the container startup steps, postStart
still covers some use cases. The postStart
action is a blocking call, and the container status remains Waiting until the postStart
handler completes, which in turn keeps the Pod status in the Pending state. This nature of postStart
can be used to delay the startup state of the container while giving time to the main container process to initialize.
Another use of postStart
is to prevent a container from starting when the Pod does not fulfill certain preconditions. For example, when the postStart
hook indicates an error by returning a nonzero exit code, the main container process gets killed by Kubernetes.
postStart
and preStop
hook invocation mechanisms are similar to the Health Probes described in Chapter 4 and support these handler types:
Runs a command directly in the container
Executes an HTTP GET request against a port opened by one Pod container
You have to be very careful what critical logic you execute in the postStart
hook as there are no guarantees for its execution. Since the hook is running in parallel with the container process, it is possible that the hook may be executed before the container has started. Also, the hook is intended to have at-least once semantics, so the implementation has to take care of duplicate executions. Another aspect to keep in mind is that the platform does not perform any retry attempts on failed HTTP requests that didn’t reach the handler.
The preStop
hook is a blocking call sent to a container before it is terminated. It has the same semantics as the SIGTERM signal and should be used to initiate a graceful shutdown of the container when reacting to SIGTERM is not possible. The preStop
action in Example 5-2 must complete before the call to delete the container is sent to the container runtime, which triggers the SIGTERM notification.
apiVersion
:
v1
kind
:
Pod
metadata
:
name
:
pre-stop-hook
spec
:
containers
:
-
image
:
k8spatterns/random-generator:1.0
name
:
random-generator
lifecycle
:
preStop
:
httpGet
:
port
:
8080
path
:
/shutdown
Even though preStop
is blocking, holding on it or returning a nonsuccessful result does not prevent the container from being deleted and the process killed. preStop
is only a convenient alternative to a SIGTERM signal for graceful application shutdown and nothing more. It also offers the same handler types and guarantees as the postStart
hook we covered previously.
In this chapter, so far we have focused on the hooks that allow executing commands when a container lifecycle event occurs. But another mechanism that is not at the container level but at a Pod level allows executing initialization instructions.
We describe in Chapter 14, Init Container, in depth, but here we describe it briefly to compare it with lifecycle hooks. Unlike regular application containers, init containers run sequentially, run until completion, and run before any of the application containers in a Pod start up. These guarantees allow using init containers for Pod-level initialization tasks. Both lifecycle hooks and init containers operate at a different granularity (at container level and Pod-level, respectively) and could be used interchangeably in some instances, or complement each other in other cases. Table 5-1 summarizes the main differences between the two.
Aspect | Lifecycle hooks | Init Containers |
---|---|---|
Activates on |
Container lifecycle phases |
Pod lifecycle phases |
Startup phase action |
A |
A list of |
Shutdown phase action |
A |
No equivalent feature exists yet |
Timing guarantees |
A |
All init containers must be completed successfully before any application container can start |
Use cases |
Perform noncritical startup/shutdown cleanups specific to a container |
Perform workflow-like sequential operations using containers; reuse containers for task executions |
There are no strict rules about which mechanism to use except when you require a specific timing guarantee. We could skip lifecycle hooks and init containers entirely and use a bash script to perform specific actions as part of a container’s startup or shutdown commands. That is possible, but it would tightly couple the container with the script and turn it into a maintenance nightmare.
We could also use Kubernetes lifecycle hooks to perform some actions as described in this chapter. Alternatively, we could go even further and run containers that perform individual actions using init containers. In this sequence, the options require more effort increasingly, but at the same time offer stronger guarantees and enable reuse.
Understanding the stages and available hooks of containers and Pod lifecycles is crucial for creating applications that benefit from being managed by Kubernetes.
One of the main benefits the cloud-native platform provides is the ability to run and scale applications reliably and predictably on top of potentially unreliable cloud infrastructure. These platforms provide a set of constraints and contracts for an application running on them. It is in the interest of the application to honor these contracts to benefit from all of the capabilities offered by the cloud-native platform. Handling and reacting to these events ensures your application can gracefully start up and shut down with minimal impact on the consuming services. At the moment, in its basic form, that means the containers should behave as any well-designed POSIX process. In the future, there might be even more events giving hints to the application when it is about to be scaled up, or asked to release resources to prevent being shut down. It is essential to get into the mindset where the application lifecycle is no longer in the control of a person but fully automated by the platform.
Besides managing the application lifecycle, the other big duty of orchestration platforms like Kubernetes is to distribute containers over a fleet of nodes. The next pattern, Automated Placement, explains the options to influence the scheduling decisions from the outside.