Liveness and readiness probes

A probe is an indicator of a container's health. It judges health through periodically performing diagnostic actions against a container via kubelet. There are two kinds of probes for determining the state of a container:

Liveness probe: This indicates whether or not a container is alive. If a container fails on this probe, kubelet kills it and may restart it based on the restartPolicy of a pod.
Readiness probe: This indicates whether a container is ready for incoming traffic. If a pod behind a service isn't ready, its endpoint won't be created until the pod is ready.

The restartPolicy tells us how Kubernetes treats a pod on failures or terminations. It has three modes: Always, OnFailure, or Never. The default is set to Always.

Three kinds of action handlers can be configured to diagnose a container:

exec: This executes a defined command inside the container. It's considered to be successful if the exit code is 0.
tcpSocket: This tests a given port via TCP and is successful if the port is opened.

httpGet: This performs HTTP GET on the IP address of the target container. Headers in the request to be sent are customizable. This check is considered to be healthy if the status code satisfies 400 > CODE >= 200.

Additionally, there are five parameters that define a probe's behavior:

initialDelaySeconds: How long kubelet should wait for before the first probing
successThreshold: A container is considered to be healthy only if it got consecutive times of probing successes over this threshold
failureThreshold: The same as the previous one, but defines the negative side instead
timeoutSeconds: The time limitation of a single probe action
periodSeconds: Intervals between probe actions

The following code snippet demonstrates the use of a readiness probe. The full template can be found at https://github.com/PacktPublishing/DevOps-with-Kubernetes-Second-Edition/blob/master/chapter9/9-3_on_pods/probe.yml:

...
     containers:
      - name: main
        image: devopswithkubernetes/okeydokey:v0.0.4
        readinessProbe:
          httpGet:
            path: /
            port: 5000
          periodSeconds: 5
          initialDelaySeconds: 10
          successThreshold: 2
          failureThreshold: 3
          timeoutSeconds: 1
        command:
...

In this example, we used some tricks with our main application, which set the starting time of the application to around six seconds and replace the application after 20 seconds with another one that echoes HTTP 500. The application's interaction with the readiness probe is illustrated in the following diagram:

The upper timeline is a pod's real readiness, and the other one is its readiness from Kubernetes' perspective. The first probe executes 10 seconds after the pod is created, and the pod is regarded as ready after two probing successes. A few seconds later, the pod goes out of service due to the termination of our application, and it becomes unready after the next three failures. Try to deploy the preceding example and observe its output:

$ kubectl logs -f my-app-6759578c94-kkc5k
1544137145.530593922 - [sys] pod is created.
1544137151.658855438 - [app] starting server.
1544137155.164726019 - [app] GET / HTTP/1.1
1544137160.165020704 - [app] GET / HTTP/1.1
1544137161.129309654 - [app] GET /from-tester
1544137165.141985178 - [app] GET /from-tester
1544137165.165597677 - [app] GET / HTTP/1.1
1544137168.533407211 - [app] stopping server.
1544137170.169371453 - [500] readiness test fail#1
1544137175.180640604 - [500] readiness test fail#2
1544137180.171766986 - [500] readiness test fail#3
...

In our example file, there is another pod, tester, which is constantly making requests to our service and the log entries. /from-tester in our service represents the requests from the tester. From the tester's activity logs, we can observe that the traffic from tester is stopped after our service becomes unready (notice the activities of two pods around the time 1544137180):

$ kubectl logs tester
1544137141.107777059 - timed out
1544137147.116839441 - timed out
1544137154.078540367 - timed out
1544137160.094933434 - OK
1544137165.136757412 - OK
1544137169.155453804 -
1544137173.161426446 - HTTP/1.1 500
1544137177.167556193 - HTTP/1.1 500
1544137181.173484008 - timed out
1544137187.189133495 - timed out
1544137193.198797682 - timed out
...

Since we didn't configure the liveness probe in our service, the unhealthy container won't be restarted unless we kill it manually. In general, we would use both probes together to automate the healing process.

Table of Contents for Liveness and readiness probes

Create new playlist

Sign In

Sign Up

Table of Contents for
Liveness and readiness probes