Chapter 10. Telemetry

Critical to running microservices is the ability to reason over their behavior. Not only does this include the triumvirate of logs, metrics, and traces, but critically needed visualization, troubleshooting, and debugging, as well. In Chapter 2, we covered how service meshes, in general, and Istio, specifically, provide for uniform observability. In this chapter, we survey specifics of the various signals and tools available to monitor services running on Istio. We discuss troubleshooting and debugging in Chapter 11.

Mixer (described in Chapter 9) plays a key role in collecting and coalescing telemetry generated by service proxies. Service proxies generate telemetry at runtime based on traffic they process, and buffer this telemetry before flushing it to Mixer for further processing. Half of Mixer’s job is amassing, translating, and transmitting these important signals (the other half being authorization). Routing of these various signals is entirely dependent on which and what type of adapters that Mixer is running. Let’s dig into Mixer’s adapters.

Note

As mentioned in Chapter 4, Bookinfo is Istio’s canonical sample application, and we use it as the example application throughout this chapter.

Adapter Models

As described in Chapter 9, adapters integrate Mixer with different infrastructure backends that deliver core functionality, such as logging, monitoring, quotas, access control list checking, and more. Operators have a choice of the number and type of adapters deployed, opting for those that integrate with existing backends or those that provide value unto their own. Mixer supports have multiple of the same type simultaneously enabled (e.g., two logging adapters to send logs to two different backends). There is a special case of attribute-producing adapters that always run first before either telemetry or policy adapters. The kubernetesenv is the most prominent example of one of these types. It extracts information from a Kubernetes environment and produces attributes that can be used in downstream adapters.

Telemetry adapters also execute in parallel. There’s a bit more complexity with regard to batching, but logically, Mixer dispatches adapter calls in parallel and waits for them to complete. Two of the same types of adapters can be deployed at the same time.

Reporting Telemetry

As a refresher from Chapter 2, Istio supports three forms of telemetry (metrics, logs, traces), which can transmit a diverse set of insights between them. Telemetry is reported from the data plane to the control plane. Service proxy reports contain attributes (see Chapter 9 for more on attributes). Context attributes provide the ability to distinguish between HTTP and TCP protocols within policies.

As attributes are generated by service proxies, telemetry reports are sent at three different points in time:

  • When the connection is established (initial report)

  • Periodically, while the connection is alive (periodical report)

  • When the connection is closed (final report)

The default interval for periodical reports is 10 seconds. It’s recommended that this interval should not be in subseconds.

Metrics

Service metrics are collected by the sidecar service proxies that send telemetry reports to the istio-telemetry Mixer service. Mixer has any number and type of adapters loaded. A Mixer adapter based on the metric adapter template can be used to collect and process metrics forwarded to it by Mixer. Let’s walk through how adapters are configured in general and then use the Prometheus Mixer adapter as an example.

Configuring Mixer to Collect Metrics

Telemetry (and policy) is configured using three types of resources:

Handlers

These determine the set of adapters that are being used and how they operate. Providing a logging adapter with the IP address for the remote syslog server is an example of handler configuration.

Instances

These describe how to map request attributes (generated by the service proxy) to adapter inputs (where an adapter will receive the generated telemetry). Instances represent a chunk of data on which one or more adapters will operate. For example, an operator might decide to generate request_bytes metric instances from an attribute such as destination_workload.

Rules

These identify when a specific adapter is invoked and which instances it is given (what telemetry is funneled to it). Rules consist of a match expression and actions. The match expression controls when to invoke an adapter, whereas the actions determine the set of instances to give the adapter.

To use the Prometheus Mixer adapter, we need to have a Prometheus server deployed either on the same Kubernetes cluster or elsewhere that is capable of scraping metrics from the Prometheus Mixer adapter. There are many ways to deploy Prometheus, either on Kubernetes or outside, the details of which are beyond the scope of this book.

Setting Up Metrics Collection and Querying for Metrics

To configure and use the Prometheus Mixer adapter, we need to do the following:

  1. Create a metric instance that configures metrics that Istio will generate and collect, and configure a Prometheus handler to collect the metrics, assign appropriate labels, and make it available for a Prometheus instance to scrape, as shown in Example 10-1 (network traffic; see the full configuration in this section of this book’s GitHub repository).

    Example 10-1. Step 1: An excerpt from the Prometheus handler and the requests_total metric it tracks across various labels
    apiVersion: "config.istio.io/v1alpha2"
    kind: handler
    metadata:
      name: prometheus
      namespace: istio-system
    spec:
      compiledAdapter: prometheus
      params:
        metrics:
        - name: requests_total
          instance_name: requestcount.metric.istio-system
          kind: COUNTER
          label_names:2
          - reporter
          - source_app
          - source_namespace
          - source_principal
          - source_workload
          - source_workload_namespace
          - source_version
          - destination_app
          - destination_namespace
          - destination_principal
          - destination_workload
          - destination_workload_namespace
          - destination_version
          - destination_service
          - destination_service_name
          - destination_service_namespace
          - request_protocol
          - response_code
          - connection_mtls
  2. Update Prometheus to scrape metrics from the Prometheus Mixer adapter, and create Istio rules that will forward the metrics collected by Istio Mixer to the Prometheus Mixer adapter with the configured labels, as demonstrated in Example 10-2.

    Example 10-2. Step 2: A rule to match HTTP traffic and take action to forward metrics to the Prometheus handler
    apiVersion: "config.istio.io/v1alpha2"
    kind: rule
    metadata:
     name: promhttp
     namespace: istio-system
     labels:
       app: mixer
       chart: mixer
       heritage: Tiller
       release: istio
    spec:
     match: (context.protocol == "http" || context.protocol == "grpc") &&
            (match((request.useragent | "-"), "kube-probe*") == false)
     actions:
     - handler: prometheus
       instances:
       - requestcount.metric
       - requestduration.metric
       - requestsize.metric
       - responsesize.metric

Traces

Distributed traces are arguably the most insightful of the telemetry information gleaned from the service mesh, giving you insight into difficult-to-answer questions like “Why is my service slow?” Zipkin and Jaeger are both bundled into Istio releases and available as popular, open source distributed tracing systems for storing, aggregating, and interpreting trace data.

Generating trace spans

The Istio service proxy, Envoy, is responsible for generating the initial trace headers and doing so in an OpenTelemetry–compatible way. OpenTelemetry (formerly an OpenTracing–compatible way) is a language-neutral specification for distributed tracing. The x-request-id header is generated and used by Envoy to uniquely identify a request as well as perform stable access logging and tracing. Envoy propagates the x-request-id to all the services the request interacts with and incorporates the unique request ID into log messages it generates, as well. Thus, if you search for the unique request-id in a system like Kibana, you will see logs from all the services for that particular request.

Propagating trace headers

This is one area where Istio’s capabilities might be oversold. The service proxy is a sidecar to your application, and so has a lot of context about the requests coming in and out of your application. Because of this, it doesn’t have what it needs to eliminate the need for the application to be completely freed of the responsibility of instrumentation. Your application requires a thin-client library to collect and propagate a small set of HTTP headers, including the following:

  • x-request-id

  • x-b3-traceid

  • x-b3-spanid

  • x-b3-parentspanid

  • x-b3-sampled

  • x-b3-flags

  • x-ot-span-context

Each service in our sample application, Bookinfo, is instrumented to propagate these HTTP trace headers. As such, we need to be able to use Jaeger (for example) as a distributed tracing system to explore latencies between and within the various “hops” in our application requests. Example 10-3 presents a simple Go program with an A function to listen to HTTP requests, extract the trace headers, and print them to stdout (see the example code for this, available in this book’s GitHub repository).

Example 10-3. A simple Go program to print trace headers
package main

import (
    "fmt"
    "log"
    "net/http"
)

func tracingMiddleware(next http.HandlerFunc) http.HandlerFunc {
    incomingHeaders := []string{
        "x-request-id",
        "x-b3-traceid",
        "x-b3-spanid",
        "x-b3-parentspanid",
        "x-b3-sampled",
        "x-b3-flags",
        "x-ot-span-context",
    }

    return func(w http.ResponseWriter, r *http.Request) {
        for _, th := range incomingHeaders {
            w.Header().Set(th, r.Header.Get(th))
        }
        next.ServeHTTP(w, r)
    }
}

func main() {
    http.HandleFunc("/", tracingMiddleware(func(w http.ResponseWriter,
                         r *http.Request) {
        fmt.Fprintf(w, "Hello headers, %v", r.Header)
    }))

    log.Fatal(http.ListenAndServe(":8081", nil))
}

Disabling Tracing

The sampling of request traces does incur cost in terms of performance overhead. As shown by Meshery in Figure 10-1, there is a significant difference between sampling traces at a rate of 1% versus a rate of 100%.

iuar 1001
Figure 10-1. The difference in average node CPU use between two performance tests

The simplest way to run your Istio mesh without tracing being enabled at all is to not enable it when installing the service mesh. Thus, your Helm chart looks like the following:

--set tracing.enabled=false

The default and minimal Istio configuration profiles don’t enable tracing when installing Istio. If you deployed with tracing on and now wish to disable tracing, assuming that your control plane is installed in the istio-system namespace, run the following:

$TRACING_POD=`kubectl get po -n <istio namespace> | grep istio-tracing
              | awk `{print $1}`
$ kubectl delete pod $TRACING_POD -n <istio namespace>
$ kubectl delete services tracing zipkin   -n <istio namespace>

Remove references of the Zipkin URL from the Mixer deployment:

$ kubectl -n istio-system edit deployment istio-telemetry

Now, manually remove instances of trace_zipkin_url from the file and save it.

Logs

Service access logs are crucial in recording particulars of service access information. Mixer’s native logentry adapter is a foundational adapter. You can use community-contributed Mixer adapters based on the logentry Mixer adapter template to collect and process logs forwarded to it by Istio Mixer. Among those, the Fluentd Mixer adapter is a popular choice.

To use the Fluentd adapter, we need a Fluentd daemon to be running and listening either on the same Kubernetes cluster or elsewhere. (There are many ways to deploy Fluentd either on Kubernetes or outside, the details of which are beyond the scope of this book.) As with other Mixer adapters, to configure and use the Fluentd Mixer adapter, we need to do the following:

  1. Create a logentry instance that configures a log stream that Istio will generate and collect.

  2. Configure a Fluentd handler to pass the collected logs to a listening Fluentd daemon.

  3. Create an Istio rule that forwards the log stream collected by Istio Mixer to the Fluentd Mixer adapter.

The sample configuration in Example 10-4 assumes a fluentd daemon is available on localhost:24224.

Example 10-4. A logging Mixer adapter configuration
apiVersion: "config.istio.io/v1alpha2"
kind: logentry
metadata:
  name: istiolog
  namespace: istio-system
spec:
  severity: '"warning"'
  timestamp: request.time
  variables:
    source: source.labels["app"] | source.service | "unknown"
    user: source.user | "unknown"
    destination: destination.labels["app"] | destination.service | "unknown"
    responseCode: response.code | 0
    responseSize: response.size | 0
    latency: response.duration | "0ms"
  monitored_resource_type: '"UNSPECIFIED"'
---
# Configuration for a fluentd handler
apiVersion: "config.istio.io/v1alpha2"
kind: fluentd
metadata:
  name: handler
  namespace: istio-system
spec:
  address: "localhost:24224"
  integerDuration: n
---
# Rule to send logentry instances to the fluentd handler
apiVersion: "config.istio.io/v1alpha2"
kind: rule
metadata:
  name: istiologtofluentd
  namespace: istio-system
spec:
  match: "true" # Match for all requests
  actions:
   - handler: handler.fluentd
     instances:
     - istiolog.logentry

This global default logging level is set to “Info,” but you can set the desired severity level in the configuration for instances. Beyond severity levels, you can configure match conditions to log only when requests don’t complete successfully. For non-200 responses, you could edit the match condition, as demonstrated in Example 10-5.

Example 10-5. An example of configuration matching
apiVersion: "config.istio.io/v1alpha2"
kind: instance
metadata:
  name: requestcount
  namespace: {{ .Release.Namespace }}
  labels:
    app: {{ template "mixer.name" . }}
    chart: {{ template "mixer.chart" . }}
    heritage: {{ .Release.Service }}
    release: {{ .Release.Name }}
spec:
  compiledTemplate: metric
  params:
    value: "1"
    dimensions:
      reporter: conditional((context.reporter.kind | "inbound") == "outbound",
           "source", "destination")
      source_workload: source.workload.name | "unknown"
      source_workload_namespace: source.workload.namespace | "unknown"
      source_principal: source.principal | "unknown"
      source_app: source.labels["app"] | "unknown"
      source_version: source.labels["version"] | "unknown"
      destination_workload: destination.workload.name | "unknown"
      destination_workload_namespace: destination.workload.namespace | "unknown"
      destination_principal: destination.principal | "unknown"
      destination_app: destination.labels["app"] | "unknown"
      destination_version: destination.labels["version"] | "unknown"
      destination_service: destination.service.host | "unknown"
      destination_service_name: destination.service.name | "unknown"
      destination_service_namespace: destination.service.namespace | "unknown"
      request_protocol: api.protocol | context.protocol | "unknown"
      response_code: response.code | 200
      response_flags: context.proxy_error_code | "-"
      permissive_response_code: rbac.permissive.response_code | "none"
      permissive_response_policyid: rbac.permissive.effective_policy_id | "none"
      connection_security_policy: conditional((context.reporter.kind | "inbound")
           == "outbound", "unknown", conditional(connection.mtls | false,
           "mutual_tls", "none"))
    monitored_resource_type: '"UNSPECIFIED"'
Tip

Istio has tentatively started adding sampling support (and experimented with things like errors-only logging), but we haven’t turned that into any real recommendation at this point. This is an action item for a future release, it seems.

Metrics

Top-line service performance is easily gleaned from the metrics visualized through charting and dashboarding tools. Grafana is a popular, open source metrics visualization tool used to query, analyze, and alert on metrics. Grafana is not deployed as a Mixer adapter, but is included in the default Istio deployment as an add-on and is configured to read metrics from Prometheus. Prometheus is a time-series database and collection toolkit. The Istio deployment of Grafana comes with predefined dashboards. Metrics shown in the Istio dashboards in Grafana are dependent upon Prometheus running in the environment. Here are some of the items that packaged dashboards include:

Mesh Summary View

Provides global summary view of the service mesh and shows HTTP/gRPC and TCP workloads.

Individual Services View

Provides metrics about requests and responses for each individual service within the mesh (HTTP/gRPC and TCP). Also, gives metrics about client and service workloads for this service.

Individual Workloads View

Provides metrics about requests and responses for each individual workload within the mesh (HTTP/gRPC and TCP). Also, gives metrics about inbound workloads and outbound services for this workload.

Visualization

As one of the more insightful telemetric capabilities (a little like taking the blinders off), topology visualization is a key aspect to understanding your deployment. Formerly, Istio had a rudimentary solution for this need called ServiceGraph, which exposed the following endpoints:

  • /force/forcegraph.html, an interactive D3.js visualization

  • /dotviz, a static Graphviz visualization

  • /dotgraph, a DOT serialization

  • /d3graph, a JSON serialization for D3 visualization

  • /graph, a generic JSON serialization

ServiceGraph has been replaced by Kiali, installed as an add-on and used in the web-based GUI to view service graphs of the mesh and your Istio configuration objects. Focused on real-time traffic flow, Vistio is another application that helps you visualize the traffic of your cluster from Prometheus data.

Service meshes are uniquely positioned as a foundational component in the buildout of an observable system. Data-plane proxies sit in the request path and can observe important qualities of the system and report on them. There’s a cost to telemetry, so weigh the trade-off. Projects like Kiali Istio help visualize either configuration or traffic flows through Istio.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset