Extracting metrics from logs

The monitoring and logging system we built around our application on top of Kubernetes is shown in the following diagram:

The logging part and the monitoring part look like two independent tracks, but the value of the logs is much more than a collection of short texts. This is structured data and usually emitted with timestamps; because of this, if we can parse information from logs and project the extracted vector into the time dimension according to the timestamps, it will become a time series metric and be available in Prometheus.

For example, an access log entry from any of the web servers may look as follows:

10.1.5.7 - - [28/Oct/2018:00:00:01 +0200] "GET /ping HTTP/1.1" 200 68 0.002

This consists of data such as the request IP address, the time, the method, the handler, and so on. If we demarcate log segments by their meanings, the counted sections can then be regarded as a metric sample, as follows:

{ip:"10.1.5.7",handler:"/ping",method:"GET",status:200,body_size:68,duration:0.002}

After the transformation, tracing the log over time will be more intuitive.

To organize logs into the Prometheus format, tools such as mtail (https://github.com/google/mtail), Grok Exporter (https://github.com/fstab/grok_exporter), or Fluentd (https://github.com/fluent/fluent-plugin-prometheus) are all widely used to extract log entries into metrics.

Arguably, lots of applications nowadays support outputting structured metrics directly, and we can always instrument our own application for this type of information. However, not everything in our tech stack provides us with a convenient way to get their internal states, especially operating system utilities, such as ntpd. It's still worth having this kind of tool in our monitoring stack to help us improve the observability of our infrastructure.

Table of Contents for Extracting metrics from logs

Create new playlist

Sign In

Sign Up

Table of Contents for
Extracting metrics from logs