Node failures

Intentionally (to save costs) or unintentionally, nodes can go down. When that happens, you don't want to get the proverbial 3AM call when Kubernetes can handle it automatically for you instead. In this exercise, we are going to bring a node down in our cluster and see what Kubernetes does in response:

Ensure that your cluster has at least two nodes:

Check that your URL is working as shown in the following output, using the external IP to reach the frontend:

kc get svc
NAME           TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
frontend       LoadBalancer   10.0.196.116   EXTERNAL-IP   80:30063/TCP   14h

Go to http://<EXTERNAL-IP>:

Let's see where the pods are running currently using the following code:

kubectl describe nodes

The following output is edited to show only the lines we are interested in:

  1 ab443838-9b3e-4811-b287-74e417a9@Azure:~$ kc describe nodes
  2 Name:               aks-agentpool-18162866-0
  5 Addresses:
  6   InternalIP:  10.240.0.4
 7 Hostname: aks-agentpool-18162866-0
 16 Non-terminated Pods:         (12 in total)
 17   Namespace                  Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits
 18   ---------                  ----                               ------------  ----------  ---------------  -------------
 19   default                    frontend-56f7975f44-9k7f2          100m (10%)    0 (0%)      100Mi (7%)       0 (0%)
 20   default                    frontend-56f7975f44-rflgz          100m (10%)    0 (0%)      100Mi (7%)       0 (0%)
 21   default                    redis-master-6b464554c8-8nv4s      100m (10%)    0 (0%)      100Mi (7%)       0 (0%)
 22   default                    redis-slave-b58dc4644-wtkwj        100m (10%)    0 (0%)      100Mi (7%)       0 (0%)
 23   default                    redis-slave-b58dc4644-xtdkx        100m (10%)    0 (0%)      100Mi (7%)       0 (0%)
 39 Name:               aks-agentpool-18162866-1
 42 Addresses:
 43   InternalIP:  10.240.0.5
 44   Hostname:    aks-agentpool-18162866-1
 54   Namespace                  Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits
 55   ---------                  ----                                     ------------  ----------  ---------------  -------------
 56   default                    frontend-56f7975f44-gbsfv                100m (10%)    0 (0%)      100Mi (7%)       0 (0%)

We can see that on agent-0, we have the following:

- Two frontend servers (out of three)
- One redis master
- Two redis slaves

On agent-1, we have the following:

- One frontend server

In this case, we are going for maximum damage, so let's shut down agent-0 (you can choose whichever node you want – for illustration purposes, it doesn't really matter):

Let the fun begin.

For maximum fun, you can run the following command to hit the guestbook frontend every 5 seconds and return the HTML (on any Bash Terminal):

while true; do curl http://<EXTERNAl-IP>/ ; sleep 5; done

The preceding command will generate infinite scroll till you press Ctrl + C.

Add some Guestbook entries to see what happens to them when you cause the node to shut down.

Things will go crazy during the shutdown of agent-0. You can see this in the following edited output generated during the shutdown:

ab443838-9b3e-4811-b287-74e417a9@Azure:~$ kc get events --watch
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f94ac87d7c Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-9k7f2 to aks-agentpool-18162866-0
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f9c9eb2713 Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-0 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f9e3ee2348 Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-0 Created container
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5fa0ec58afa Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-0 Started container
52s 52s 1 frontend-56f7975f44-fbksv.1574e88a6e05a7eb Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-fbksv to aks-agentpool-18162866-1
50s 50s 1 frontend-56f7975f44-fbksv.1574e88aec0fb81d Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-0 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
47m 47m 1 frontend-56f7975f44-rflgz.1574e5f9e7166672 Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-0 Created container
47m 47m 1 frontend-56f7975f44-rflgz.1574e5fa1524773e Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-0 Started container
52s 52s 1 frontend-56f7975f44-xw7vd.1574e88a716fa558 Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-xw7vd to aks-agentpool-18162866-1
49s 49s 1 frontend-56f7975f44-xw7vd.1574e88b37cf57f1 Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-1 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
48s 48s 1 frontend-56f7975f44-xw7vd.1574e88b4cb8959f Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-1 Created container
47s 47s 1 frontend-56f7975f44-xw7vd.1574e88b8aee5ee6 Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-1 Started container
47m 47m 1 frontend-56f7975f44.1574e5f9483ea97c ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: frontend-56f7975f44-gbsfv
47m 47m 1 frontend-56f7975f44.1574e5f949bd8e43 ReplicaSet Normal SuccessfulCreate replicaset-
8s 52s 8 redis-master-6b464554c8-f5p7f.1574e88a71687da6 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
52s 52s 1 redis-master-6b464554c8.1574e88a716d02d9 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-master-6b464554c8-f5p7f
8s 52s 7 redis-slave-b58dc4644-7w468.1574e88a73b5ecc4 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
8s 52s 8 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
52s 52s 1 redis-slave-b58dc4644.1574e88a73b40e64 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-slave-b58dc4644-7w468
52s 52s 1 redis-slave-b58dc4644.1574e88a78901fd9 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-slave-b58dc4644-lqkdp
0s 54s 8 redis-slave-b58dc4644-7w468.1574e88a73b5ecc4 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
0s 54s 9 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
0s 54s 9 redis-master-6b464554c8-f5p7f.1574e88a71687da6 Pod 
0s 1m 13 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.

Take a look at the guestbook application in the browser:

What you can see is that all your precious messages are gone! This shows the importance of having Persistent Volume Claims (PVCs) for any data that you want to survive in the case of a node failure.

Let's look at some messages from the frontend and understand what they mean:

9m 1h 3 frontend.1574e31070390293 Service Normal UpdatedLoadBalancer service-controller                  Updated load balancer with new hosts

The preceding message is the first hint we get when something goes wrong. Your curl command might have hiccupped a little bit, but has continued. You have to hit the frontend URL on your browser for migration to kick in. The reason you have to reload the frontend is because of how the frontend is that constructed, it just loads the HTML and expects JavaScript to hit the Redis database. So, hit refresh on your browser:

52s         52s          1         frontend-56f7975f44-fbksv.1574e88a6e05a7eb       Pod                                       Normal    Scheduled             default-scheduler                   Successfully assigned default/frontend-56f7975f44-fbksv to aks-agentpool-18162866-1

You can see that one of the frontend pods is scheduled for migration to agent-1:

50s         50s          1         frontend-56f7975f44-fbksv.1574e88aec0fb81d       Pod          spec.containers{php-redis}   Normal    Pulled                kubelet, aks-agentpool-18162866-1   Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
50s         50s          1         frontend-56f7975f44-fbksv.1574e88b004c01e6       Pod          spec.containers{php-redis}   Normal    Created               kubelet, aks-agentpool-18162866-1   Created container
49s         49s          1         frontend-56f7975f44-fbksv.1574e88b44244673       Pod          spec.containers{php-redis}   Normal    Started               kubelet, aks-agentpool-18162866-1   Started container

Next, Kubernetes checks whether the Docker image is present on the node and downloads it if required. Furthermore, the container is created and started.

Table of Contents for Node failures

Create new playlist

Sign In

Sign Up

Table of Contents for
Node failures