Since monitoring is an important part of operating a service, the existing monitoring system in our infrastructure might already provide solutions for collecting metrics from common sources like well-known open source software and the operating system. As for applications run on Kubernetes, let's have a look at what Kubernetes and its ecosystem offer.
To collect metrics of containers managed by Kubernetes, we don't have to install any special controller on the Kubernetes master node, nor any metrics collector inside our containers. This is basically done by kubelet, which gathers various telemetries from a node, and exposes them in the following API endpoints (as of Kubernetes 1.13):
- /metrics/cadvisor: This API endpoint is used for cAdvisor container metrics that are in Prometheus format
- /spec/: This API endpoint exports machine specifications
- /stats/: This API endpoint also exports cAdvisor container metrics but in JSON format
- /stats/summary: This endpoint contains various data aggregated by kubelet. It's also known as the Summary API
The Prometheus format (https://prometheus.io/docs/instrumenting/exposition_formats/) is the predecessor of the OpenMetrics format, so it is also known as OpenMetrics v0.0.4 after OpenMetrics was published. If our monitoring system supports this kind of format, we can configure it to pull metrics from kubelet's Prometheus endpoint (/metrics/cadvisor).
To access those endpoints, kubelet has two TCP ports, 10250 and 10255. Port 10250 is the safer one and the one that it is recommended to use in production as it's an HTTPS endpoint and protected by Kubernetes' authentication and authorization system. 10255 is in plain HTTP, which should be used restrictively.
Another important component in the monitoring pipeline is the metrics server (https://github.com/kubernetes-incubator/metrics-server). This aggregates monitoring statistics from the summary API by kubelet on each node and acts as an abstraction layer between Kubernetes' other components and the real metrics sources. To be more specific, the metrics server implements the resource metrics API under the aggregation layer, so other intra-cluster components can get the data from a unified API path (/api/metrics.k8s.io). In this instance, kubectl top and kube-dashboard get data from the resource metrics API.
The following diagram illustrates how the metrics server interacts with other components in a cluster:
Most installations of Kubernetes deploy the metrics server by default. If we need to do this manually, we can download the manifest of the metrics server and apply them:
$ git clone https://github.com/kubernetes-incubator/metrics-server.git
$ kubectl apply -f metrics-server/deploy/1.8+/
While kubelet metrics are focused on system metrics, we also want to see the logical states of objects displayed on our monitoring dashboard. kube-state-metrics (https://github.com/kubernetes/kube-state-metrics) is the piece that completes our monitoring stack. It watches Kubernetes masters and transforms the object statuses we see from kubectl get or kubectl describe into metrics in the Prometheus format. We are therefore able to scrape the states into metrics storage and then be alerted on events such as unexplainable restart counts. Download the templates to install as follows:
$ git clone https://github.com/kubernetes/kube-state-metrics.git
$ kubectl apply -f kube-state-metrics/kubernetes
Afterward, we can view the state metrics from the kube-state-metrics service inside our cluster:
http://kube-state-metrics.kube-system:8080/metrics