Node failures

Intentionally (to save costs) or unintentionally, nodes can go down. When that happens, you don't want to get the proverbial 3AM call when Kubernetes can handle it automatically for you instead. In this exercise, we are going to bring a node down in our cluster and see what Kubernetes does in response:

  1. Ensure that your cluster has at least two nodes:

  1. Check that your URL is working as shown in the following output, using the external IP to reach the frontend:
kc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend LoadBalancer 10.0.196.116 EXTERNAL-IP 80:30063/TCP 14h
  1. Go to http://<EXTERNAL-IP>:

  1. Let's see where the pods are running currently using the following code:
kubectl describe nodes

The following output is edited to show only the lines we are interested in:

  1 ab443838-9b3e-4811-b287-74e417a9@Azure:~$ kc describe nodes
2 Name: aks-agentpool-18162866-0
5 Addresses:
6 InternalIP: 10.240.0.4
7 Hostname: aks-agentpool-18162866-0
16 Non-terminated Pods: (12 in total)
17 Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
18 --------- ---- ------------ ---------- --------------- -------------
19 default frontend-56f7975f44-9k7f2 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)
20 default frontend-56f7975f44-rflgz 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)
21 default redis-master-6b464554c8-8nv4s 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)
22 default redis-slave-b58dc4644-wtkwj 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)
23 default redis-slave-b58dc4644-xtdkx 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)
39 Name: aks-agentpool-18162866-1
42 Addresses:
43 InternalIP: 10.240.0.5
44 Hostname: aks-agentpool-18162866-1
54 Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
55 --------- ---- ------------ ---------- --------------- -------------
56 default frontend-56f7975f44-gbsfv 100m (10%) 0 (0%) 100Mi (7%) 0 (0%)

We can see that on agent-0, we have the following:

    • Two frontend servers (out of three)
    • One redis master
    • Two redis slaves

On agent-1, we have the following:

    • One frontend server
  1. In this case, we are going for maximum damage, so let's shut down agent-0 (you can choose whichever node you want – for illustration purposes, it doesn't really matter):

Let the fun begin.

  1. For maximum fun, you can run the following command to hit the guestbook frontend every 5 seconds and return the HTML (on any Bash Terminal):
while true; do curl http://<EXTERNAl-IP>/ ; sleep 5; done
The preceding command will generate infinite scroll till you press Ctrl + C.

Add some Guestbook entries to see what happens to them when you cause the node to shut down.

Things will go crazy during the shutdown of agent-0. You can see this in the following edited output generated during the shutdown:

ab443838-9b3e-4811-b287-74e417a9@Azure:~$ kc get events --watch
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f94ac87d7c Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-9k7f2 to aks-agentpool-18162866-0
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f9c9eb2713 Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-0 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5f9e3ee2348 Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-0 Created container
47m 47m 1 frontend-56f7975f44-9k7f2.1574e5fa0ec58afa Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-0 Started container
52s 52s 1 frontend-56f7975f44-fbksv.1574e88a6e05a7eb Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-fbksv to aks-agentpool-18162866-1
50s 50s 1 frontend-56f7975f44-fbksv.1574e88aec0fb81d Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-0 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
47m 47m 1 frontend-56f7975f44-rflgz.1574e5f9e7166672 Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-0 Created container
47m 47m 1 frontend-56f7975f44-rflgz.1574e5fa1524773e Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-0 Started container
52s 52s 1 frontend-56f7975f44-xw7vd.1574e88a716fa558 Pod Normal Scheduled default-scheduler Successfully assigned default/frontend-56f7975f44-xw7vd to aks-agentpool-18162866-1
49s 49s 1 frontend-56f7975f44-xw7vd.1574e88b37cf57f1 Pod spec.containers{php-redis} Normal Pulled kubelet, aks-agentpool-18162866-1 Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
48s 48s 1 frontend-56f7975f44-xw7vd.1574e88b4cb8959f Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-1 Created container
47s 47s 1 frontend-56f7975f44-xw7vd.1574e88b8aee5ee6 Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-1 Started container
47m 47m 1 frontend-56f7975f44.1574e5f9483ea97c ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: frontend-56f7975f44-gbsfv
47m 47m 1 frontend-56f7975f44.1574e5f949bd8e43 ReplicaSet Normal SuccessfulCreate replicaset-
8s 52s 8 redis-master-6b464554c8-f5p7f.1574e88a71687da6 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
52s 52s 1 redis-master-6b464554c8.1574e88a716d02d9 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-master-6b464554c8-f5p7f
8s 52s 7 redis-slave-b58dc4644-7w468.1574e88a73b5ecc4 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
8s 52s 8 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
52s 52s 1 redis-slave-b58dc4644.1574e88a73b40e64 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-slave-b58dc4644-7w468
52s 52s 1 redis-slave-b58dc4644.1574e88a78901fd9 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: redis-slave-b58dc4644-lqkdp
0s 54s 8 redis-slave-b58dc4644-7w468.1574e88a73b5ecc4 Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
0s 54s 9 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.
0s 54s 9 redis-master-6b464554c8-f5p7f.1574e88a71687da6 Pod
0s 1m 13 redis-slave-b58dc4644-lqkdp.1574e88a78913f1a Pod Warning FailedScheduling default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were not ready, 1 node(s) were out of disk space.

Take a look at the guestbook application in the browser:

What you can see is that all your precious messages are gone! This shows the importance of having Persistent Volume Claims (PVCs) for any data that you want to survive in the case of a node failure.

Let's look at some messages from the frontend and understand what they mean:

9m 1h 3 frontend.1574e31070390293 Service Normal UpdatedLoadBalancer service-controller                  Updated load balancer with new hosts

The preceding message is the first hint we get when something goes wrong. Your curl command might have hiccupped a little bit, but has continued. You have to hit the frontend URL on your browser for migration to kick in. The reason you have to reload the frontend is because of how the frontend is that constructed, it just loads the HTML and expects JavaScript to hit the Redis database. So, hit refresh on your browser:

52s         52s          1         frontend-56f7975f44-fbksv.1574e88a6e05a7eb       Pod                                       Normal    Scheduled             default-scheduler                   Successfully assigned default/frontend-56f7975f44-fbksv to aks-agentpool-18162866-1

You can see that one of the frontend pods is scheduled for migration to agent-1:

50s         50s          1         frontend-56f7975f44-fbksv.1574e88aec0fb81d       Pod          spec.containers{php-redis}   Normal    Pulled                kubelet, aks-agentpool-18162866-1   Container image "gcr.io/google-samples/gb-frontend:v4" already present on machine
50s 50s 1 frontend-56f7975f44-fbksv.1574e88b004c01e6 Pod spec.containers{php-redis} Normal Created kubelet, aks-agentpool-18162866-1 Created container
49s 49s 1 frontend-56f7975f44-fbksv.1574e88b44244673 Pod spec.containers{php-redis} Normal Started kubelet, aks-agentpool-18162866-1 Started container

Next, Kubernetes checks whether the Docker image is present on the node and downloads it if required. Furthermore, the container is created and started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset