There are two core resource types that participate in the scheduling process, namely CPU and memory. To see the capability of a node, we can check its .status.allocatable path:
# Here we only show the path with a go template.
$ kubectl get node <node_name> -o go-template --template=
'{{range $k,$v:=.status.allocatable}}{{printf "%s: %s
" $k $v}}{{end}}'
cpu: 2
ephemeral-storage: 14796951528
hugepages-2Mi: 0
memory: 1936300Ki
pods: 110
As we can see, these resources will be allocated to any pod in need by the scheduler. But how does the scheduler know how many resources a pod will consume? We actually have to instruct Kubernetes about the request and the limit for each pod. The syntax is spec.containers[].resources.{limits,requests}.{resource_name} in the pod's manifest:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: 100m
memory: 10Mi
limits:
cpu: 0.1
memory: 100Mi
The unit of CPU resources can be either a fractional number or a millicpu expression. A CPU core (or a Hyperthread) is equal to 1,000 millicores, or a simple 1.0 in fractional number notation. Note that the fractional number notation is an absolute quantity. For instance, if we have eight cores on a node, the expression 0.5 means that we are referring to 0.5 cores, rather than four cores. In this sense, in the previous example, the amount of requested CPU, 100m, and the CPU limit, 0.1, are equivalent.
Memory is represented in bytes, and Kubernetes accepts the following suffixes and notations:
- Base 10: E, P, T, G, M, K
- Base 2: Ei, Pi, Ti, Gi, Mi, Ki
- Scientific notation: e
Hence, the following forms are roughly the same: 67108864, 67M, 64Mi, and 67e6.
Ephemeral storage:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage
Huge pages:
https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
Device resources:
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/
Extended resources:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#extended-resources
As the name suggests, a request refers to the quantity of resources a pod might take, and Kubernetes uses it to pick a node to schedule the pod. For each type of resource, the sum of requests from all containers on a node will never exceed the allocatable resource of that node. In other words, every container that is successfully scheduled is guaranteed to get the amount of resources it requested.
To maximize the overall resource utilization, as long as the node that a pod is on has spare resources, that pod is allowed to exceed the amount of resources that it requested. However, if every pod on a node uses more resources than they should, the resource that node provides might eventually be exhausted, and this might result in an unstable node. The concept of limits addresses this situation:
- If a pod uses more than a certain percentage of CPU, it will be throttled (not killed)
- If a pod reaches the memory limit, it will be killed and restarted
These limits are hard constraints, so it's always larger or equal to a request of the same resource type.
The requests and limits are configured per container. If a pod has more than one container, Kubernetes will schedule the pod base on the sum of all the containers' requests. One thing to note is that if the total requests of a pod exceed the capacity of the largest node in a cluster, the pod will never be scheduled. For example, suppose that the largest node in our cluster can provide 4,000 m (four cores) of CPU resources, then neither a single container pod that wants 4,500 m of CPU, or a pod with two containers that request 2,000 m and 2,500 m can be assigned, since no node can fulfil their requests.
Since Kubernetes schedules pods based on requests, what if all pods come without any requests or limits? In this case, as the sum of requests is 0, which would always be less than the capacity of a node, Kubernetes would keep placing pods onto the node until it exceeds the node's real capability. By default, the only limitation on a node is the number of allocatable pods. It's configured with a kubelet flag, --max-pods. In the previous example, this was 110. Another tool to set the default constraint on resources is LimitRange, which we'll talk about later on in this chapter.