Chapter 11. Debugging Istio

Like any other software, operating Istio can mean troubleshooting and debugging from time to time. Istio and other open source tooling provides logging, introspection, and debugging to assist in component management.

Introspecting Istio Components

Istio components have been designed to incorporate a common introspection package called ControlZ (ctrlz). ControlZ is a flexible introspection framework that makes it easy to inspect and manipulate the internal state of a running Istio component. ControlZ offers an administrative UI, to which components provide access by opening a port (9876 by default) that can be accessed from a web browser or via REST for access and control from external tools. The simple UI of the ControlZ introspection framework gives an interactive view into the state of the Istio component.

Mixer, Pilot, Citadel, and Galley are built with the ctlz package included, whereas gateways are not. Gateways do not implement the ControlZ administrative UI, because these are Envoy instances that instead implement Envoy’s administrative console. When Mixer, Pilot, Citadel, and Galley components start, a message is logged indicating the IP address and port to connect to in order to interact with ControlZ. ControlZ is designed around the idea of “topics.” A topic corresponds to the different sections of the UI. There are a set of built-in topics representing the core introspection functionality, and each control-plane component that uses ControlZ can add new topics specialized for their purpose.

By default, ControlZ runs on port 9876. You can override this default port by using the --ctrlz_port and --ctrlz_address command-line options when starting a component to control the specific address and port where ControlZ will be exposed. To access the ControlZ interface of one of the control-plane components, use kubectl to port forward from your localhost to the remote ControlZ port.

$ kubectl port-forward -n istio-system istio-pilot-74cb7cd5f9-lbndc 9876:9876

Open your browser to http://localhost:9876 to remotely access ControlZ, as illustrated in Figure 11-1.

iuar 1101
Figure 11-1. ControlZ introspection of Pilot

ControlZ implements Istio’s introspection facility. As components integrate with ControlZ, they automatically gain an IP port that allows operators to visualize and control a number of aspects of each process, including controlling logging scopes, viewing command-line options, memory use, and more. Additionally, the port implements a REST API allowing access and control over the same state.

The ControlZ administrative interface is primarily useful for enabling more detailed logs during troubleshooting. Other tools, like the management-plane Meshery, are commonly used as a more capable point of control for managing Istio’s life cycle and workloads on the mesh. Let’s walk through a management-plane example.

Troubleshooting with a Management Plane

Recall from Chapter 1 that the management plane resides a level above the control plane and operates across multiple homogeneous and heterogeneous service mesh clusters. Among other uses, a management plane can perform workload and mesh configuration validation—whether in preparation for onboarding a workload onto the mesh or in continuously vetting their configuration as you update to new versions of components running your control and data planes or new versions of your applications. Let’s walk through a vetting exercise to run a series of checks against your existing workload configuration (and your service mesh configuration). To do so, install and start Meshery by downloading its CLI management tool, mesheryctl, using the commands shown in Example 11-1.

Example 11-1. Steps to locally installing the Meshery management plane
$ sudo curl -L https://git.io/mesheryctl -o /usr/local/bin/mesheryctl
$ sudo chmod a+x /usr/local/bin/mesheryctl
$ mesheryctl start

For ongoing validation of your configuration, install Meshery in your cluster(s). If the Meshery UI doesn’t load automatically, open it at http://localhost:9081. Depending on your environment, Meshery will automatically connect your cluster(s), analyze your mesh and workload configuration, and highlight divergence from configuration best practices or suggest remediation to problem areas of your deployment, as depicted in Figure 11-2.

iuar 1102
Figure 11-2. Meshery identifying orphaned service mesh configurations

Parlaying with kubectl

As noted in the context of troubleshooting Pilot in Chapter 7, Meshery and istioctl are powerful tools in assessing the synchronization of and coordination between Istio’s control-plane components and its data-plane service proxies. In Kubernetes environments, Meshery and istioctl both use kubectl exec under the hood. The invocation of kubectl exec involves execution over one of two HTTP streaming protocols from your local kubectl CLI to the Kubernetes API server on to the kubelet local to the node running the service proxy being interrogated by either Meshery or the istioctl command.

The specific mechanics of these two HTTP streaming protocols depend on which version of Kubernetes and which container runtime you’re using; the Kubernetes API server supports the SPDY protocol (now deprecated) and HTTP/2 WebSockets. (If WebSockets are unfamiliar, simply think of WebSockets as a protocol that transforms HTTP into a bidirectional byte-streaming protocol.) On top of that stream, the Kubernetes API server introduces an additional multiplexed streaming protocol. The reason for this is that, for many use cases, it is quite useful for the API server to be able to service multiple independent byte streams. Consider, for example, executing a command within a container. In this case, there are actually three streams that need to be maintained: stdin, stderr, and stdout.

When invoked, kubectl exec performs a sequence of actions. Initially, it issues an HTTP POST request to the Kubernetes API server at

/api/v1/namespaces/$NAMESPACE/pods/$NAME/exec

with a query string defining the command(s) to execute in which container and whether to establish multiplexed bidirectional streaming of stdin, stdout, and stderr:

?command=<command-syntax>&container=<name>&stderr=true&stdout=true

These query parameters are self-explanatory, indicating the command to run, if stdin should be enabled, if stdout should be enabled, if stderr should be enabled, and the container’s name. Using these query string parameters, a WebSocket connection is established and kube-apiserver will start streaming data between Meshery/istioctl and the respective kubelet, as shown in Figure 11-3.

iuar 1103
Figure 11-3. How Meshery and istioctl parlay with kubectl to retrieve mesh configuration from service proxies

When you write to the WebSocket, the data will be passed to standard input (stdin) and on the receiving end of the WebSocket will be standard output (stdout) and error (stderr). The API defines a simple protocol to multiplex stdout and stderr over a single connection. Every message passed through the WebSocket is prefixed by a single byte that defines which stream the message belongs to (see Table 11-1).

Table 11-1. Basic streaming protocol channels used for kubectl exec/attach/logs/proxy
Channel Purpose Description

0

stdin

The stdin stream for writing to the process. Data is not read from this stream.

1

stdout

The stdout output stream for reading stdout from the process. Data should not be written to this stream.

2

stderr

The stderr output stream for reading stderr from the process. Data should not be written to this stream.

The kube-apiserver establishes a connection to the kubelet located on the node where the pod in question resides. From there, the kubelet generates a short-lived token and issues a redirect to the container runtime interface (CRI). The CRI handles the kubectl exec request and issues a docker exec API call. The basic protocol channels of stdin, stdout, and stderr are specified as input, output, and error to the docker exec API call, respectively. kubectl exec/attach/logs/proxy might require long-running connections to the kube-apiserver since any of these commands might need data streamed over time rather than an immediate, one-time response.

Workload Preparedness

Not only might Istio’s component be experiencing an issue, but so could your freshly meshed application. As your existing services are onboarded to the service mesh, you’ll need to confirm the compatibility of your application and Istio. There are some compatibility concerns to review, as follows.

Application Configuration

Avoid UID 1337. Ensure that your pods do not run applications as a user with the user ID (UID) value of 1337. The Istio service proxy uses 1337 as its UID. You’ll need to avoid this conflict.

Network Traffic and Ports

HTTP/1.1 or HTTP/2.0 is required. Applications must use either the HTTP/1.1 or HTTP/2.0 protocols for all its HTTP traffic; HTTP/1.0 is not supported.

Service ports must be named. To use Istio’s traffic routing, ensure each service has port name key/value pairs following this syntax: name: <protocol>[-<suffix>]. The value used for <protocol> should match one of the following types (strings):

  • grpc

  • http

  • http2

  • https

  • mongo

  • redis

  • tcp

  • tls

  • udp

By default, Istio will treat traffic as TCP. Unless the port explicitly uses udp to signify UDP traffic, or unless the beginning of a service’s port name doesn’t match one of these prefixes, Istio will treat traffic on the port as TCP. Consequently, traffic on unnamed ports is treated as TCP, as well. Examples of valid port names are http2-myservice or http2; however, http2myservice is not valid.

Tip

This behavior is akin to that of Kubernetes in that its services use TCP as the default protocol for services, but you can also use any other supported protocol (TCP, UDP, HTTPS, proxy, SCTP). Because many services need to expose more than one port, Kubernetes supports multiple port definitions on a service object. Each port definition can have the same or a different protocol. However, Kubernetes doesn’t mandate that two different services referring to the same pod port have the same protocol defined.

Pods must include an explicit list of the ports on which each container listens. Use a containerPortconfiguration in the container specification for each port. Any unlisted ports bypass the Istio proxy.

Services and Deployments

Associate all pods to at least one service. Whether it exposes a port or not, all pods must belong to at least one Kubernetes service. For pods belonging to multiple Kubernetes services, ensure that each service defines the same type of protocol when referencing the same port number (for instance, HTTP and TCP).

Meaningful telemetry with Kubernetes labels. With app and version labels deployments, we recommend adding an explicit app label and version label. Add the labels to the deployment specification of pods deployed using the Kubernetes Deployment. The app and version labels add contextual information to the telemetry Istio collects:

An app label

Each deployment specification should have a distinct app label with a meaningful value.

A version label

Indicates the version of the application corresponding to the particular deployment.

These labels are helpful as context propagated in distributed traces.

Pods

Pod configuration must allow NET_ADMIN capability. If your cluster enforces pod security policies, pods must allow the NET_ADMIN capability. As described in Chapter 5, while injecting the service proxy sidecar, Istio uses an init container to manipulate the iptables rules of the pod in order to intercept requests to application containers. Although the service proxy sidecar does not require root to run, the short-lived init container does require cap_net_admin privileges to install iptables rules in each pod just prior to starting the pod’s primary containers onboard to the service mesh.

Manipulating iptables rules is an action that requires elevated access through the NET_ADMIN capability in the Linux kernel. Pods that have this kernel-level capability enabled can manipulate with the networking configuration of the other pods as well as can manipulate networking configuration on their host node. Most Kubernetes operators avoid allowing tenant pods to have this capability or at least those that operate a shared-tenant cluster.

If you use the Istio CNI Plugin, the NET_ADMIN capability requirement does not apply, because the CNI plug-in (a privileged pod) will perform administrative functions on behalf of the istio-init container. The following shows a check as to whether pod security policies are enabled in your cluster:

$ kubectl get psp
No resources found.

This example shows a cluster that has no pod security policies defined. If your cluster does have pod security policies defined, look for NET_ADMIN or * in the list of capabilities of the allowed policies for your given service account. Unless a Kubernetes service account is specified in your pods’ deployment, the pods run as the default service account in the namespace in which the pods are deployed. To check which capabilities are allowed for the service account of your pods, run the following command:

$ for psp in $(kubectl get psp -o
     jsonpath="{range .items[*]}{@.metadata.name}{'
'}{end}");
     do if [ $(kubectl auth can-i use psp/$psp --as=system:serviceaccount:<your
     namespace>:<your service account>) = yes ]; then kubectl get psp/$psp
     --no-headers -o=custom-columns=NAME:.metadata.name,CAPS:.spec.allowed
     Capabilities; fi; done

For more on the NET_ADMIN capability, see Istio’s Required Pod Capabilities page.

Istio Installation, Upgrade, and Uninstall

As with many installation mechanisms and configurable options, there are many choices to be made when considering an installation. More than that, over time you’ll need to upgrade deployments, and at some point, you’ll need to uninstall given that your service mesh deployment follows the life cycle of the application(s) it serves.

Installation

Simply rerun your deployment script. Initial installation challenges are sometimes overcome by reapplying your installation YAML file to your Kubernetes cluster (rerunning your installation commands). Whether due to network traffic loss or scheduled resources (e.g., CRDs) not yet having been fully instantiated by kube-api, sometimes you’ll find that not all Istio components have instantiated successfully the first time you applied them to the cluster.

In that event, you might see a message like this:

unable to recognize "install/kubernetes/istio-demo-auth.yaml": no
      matches for kind

You will need to ensure that CRDs have been applied by running the following command:

$ kubectl apply -f install/kubernetes/helm/istio/templates/crds.yamlt

Upgrade

Istio upgrades can follow various paths. Let’s look at two of those paths: Helm with Tiller, and without Tiller.

Helm with Tiller

If your installation was performed using helm install (with Tiller), like this

$ helm install install/kubernetes/helm/istio --name istio
       --namespace istio-system

You might choose to use helm upgrade to upgrade your Istio deployment, like so:

$ helm upgrade istio install/kubernetes/helm/istio
       --namespace istio-system

Helm without Tiller

If you installed Istio using a Helm template (without Tiller), understand that the Helm upgrade command works only when Tiller is installed. From here, you have two choices. You can use Helm to install the source version, installing Tiller in the process, and then use the Hhelm upgrade command. Or you can use the same Helm template process you used to install Istio to also upgrade Istio, like so:

$ helm template install/kubernetes/helm/istio --name istio --namespace
      istio-system > istio.yaml
$ kubectl apply -f istio.yaml

The helm template install (upgrade) process takes advantage of Kubernetes rolling update process and will upgrade all deployments and configmaps to the new version. Using this approach, you can roll back if needed by applying the YAML files from the old version.

Uninstallation

Istio might not uninstall cleanly, leaving residual artifacts of its presence. You might find the following artifacts deposited.

Residual CRDs

If you have uninstalled Istio, but its CRDs remain, you can iteratively delete each individual CRD, like so:

$ for i in install/kubernetes/helm/istio-init/files/crd*yaml;
     do kubectl delete -f $i; done

Depending on your installation profile, your deployment will have a different number of Istio CRDs. Here’s how to verify removal of Istio’s CRDs:

$ kubectl get crds | grep istio

When all CRDs have been successfully removed the result set should be empty.

Troubleshooting Mixer

Following is the command to enable debug logging for Mixer:

$ kubectl edit deployment -n istio-system istio-mixer
# Add to args list:
        - --log_output_level=debug

You can access Mixer logs via a kubectl logs command, as follows:

For the istio-policy service:

kubectl -n istio-system logs $(kubectl -n istio-system get pods -lapp=policy
     -o jsonpath='{.items[0].metadata.name}') -c mixer

For the istio-telemetry service:

kubectl -n istio-system logs $(kubectl -n istio-system get pods
     -lapp=telemetry -o jsonpath='{.items[0].metadata.name}') -c mixer

Using ControlZ (ctrlz)

Alternatively, you can turn on Mixer debugging using ControlZ on Mixer on port 9876. To do so, port forward to ControlZ:

$ kubectl --namespace istio-system port-forward istio-[policy/telemetry]-<pod#>
     9876:9876

Open your browser to http://localhost:9876.

Troubleshooting Pilot

You can query Pilot’s registration API for a list of hosts and IP addresses to retrieve the overall mesh configuration and endpoint information. The response should be a large JSON that looks similar to this:

$ kubectl run -i --rm --restart=never dummy --image=tutum/curl:alpine -n
       istio-system --command 
-- curl -v 'http://istio-pilot.istio-system:8080/v1/registration'

This book’s GitHub repository contains example output from Pilot’s registration API.

To gather Pilot’s logs, run the following:

$ kubectl logs -n istio-system -listio=pilot -c discovery

See this book’s GitHub repository for example logs from Pilot’s discovery container.

Ensure that pilot-discovery discovery service is running:

$ kubectl -n istio-system exec -it istio-pilot-644ff8f78d-p757j -c discovery sh -
# ps -ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ssl   72:49 /usr/local/bin/pilot-discovery discovery...

Debugging Galley

As of Istio v1.1, Galley’s two core areas of responsibility are syntactical and semantic validation of user-authored Istio configurations and acting as the primary configuration registry. Galley uses the MCP to interact with components.

As the primary configuration ingestion and distribution mechanism within Istio, Galley needs to validate user-provided configuration and uses a Kubernetes Admission Controller, as shown in Figure 11-4.

$ kubectl get validatingwebhookconfigurations
NAME                                  CREATED AT
istio-galley                          2019-06-11T15:33:21Z
iuar 1104
Figure 11-4. Galley’s two validating webhooks: pilot.validation.istio.io and mixer.validation.istio.io

The istio-galley service has two webhooks served on port 443:

/admitpilot

Responsible for validating configurations consumed by Pilot (e.g., VirtualService, Authentication)

/admitmixer

Responsible for validating configurations consumed by Mixer

Both Webhooks are scoped to all namespaces and, as such, the namespaceSelector for each pilot.validation.istio.io and mixer.validation.istio.io should be an empty set.

$ kubectl get validatingwebhookconfiguration istio-galley -o yaml

See this book’s GitHub repository for example output of a healthy istio-galley validating Webhook configuration with empty namespace set.

No such hosts for service “istio-galley.” If you’re unable to create or update configuration, you might find that Galley is not functioning correctly. Standard troubleshooting procedure applies to Galley in terms of first verifying the status of its pod:

$ kubectl -n istio-system get pod -listio=galley
NAME                            READY   STATUS    RESTARTS   AGE
istio-galley-74c6547b94-4vw58   1/1     Running   0          14h

No endpoints available for service “istio-galley.” And next, a similar verification of its endpoints is needed:

$ kubectl -n istio-system get endpoints istio-galley
NAME           ENDPOINTS                                         AGE
istio-galley   10.32.0.17:15014,10.32.0.17:443,10.32.0.17:9901   14h

If the pods or endpoints aren’t ready, check the pod logs and status for any indication about why the webhook pod is failing to start and serve traffic:

$ kubectl logs -n istio-system istio-galley-755f8df6cb-zq4p8

See this book’s GitHub repository for example output of an istio-galley pod.

Debugging Envoy

Networking is difficult. Layers of abstraction and indirection make debugging network issues even more difficult.

Envoy’s Administrative Console

Envoy’s administrative interface runs on port 15000 (see Chapter 4 for an exhaustive list of ports). You can access Envoy’s administrative interface by using kubectl to port-forward from your local machine to any given pod sidecarred with Envoy:

$ kubectl port-forward <pod> 15000:15000 &

To stop a port-forwarding job running in the background, run kill %1 (assuming you have no other background jobs running). For a list of active jobs, run jobs.

To see a YAML-formatted printout of that Envoy’s configuration run the following:

$ curl http://localhost:15000/config_dump | yq r -

Or, bring up the full administrative console in your browser at http://localhost:15000. Reference http://localhost:15000/help for a description of administrative actions available. See Envoy’s operations and admin docs for a full description of all its administrative functions.

503 or 404 Requests

If you’re seeing either of these two error codes when trying to access your service, know that although there are different causes, the most common culprits are:

  • There’s a disconnect between Envoy and Pilot. Here are the steps to remediate:

    1. Confirm Pilot is running. See “Troubleshooting Pilot”.

    2. Use istioctl proxy-status to confirm communication status between Pilot and Envoy. During normal operation, each xDS will show a status of SYNCED.

  • There is a missing or incorrect network protocol in your Kubernetes service manifest. To remediate, configure your service manifest with the appropriate name for your service’s exposed ports. For a list of protocol names, see “Workload Preparedness”.

  • The Envoy configuration is capturing the route in the wrong upstream cluster. The VirtualService configuration is incorrect. Here are the steps to remediate:

    1. Identify an edge service (adjacent to an ingress gateway).

    2. Inspect its Envoy logs or use a tool such as Jaeger to confirm where failures are occurring.

Sidecar Injection

If you’re having difficulty with sidecar injection, there may be a few different issues in your environment. Three factors could account for whether a sidecar is injected:

  • The webhook namespaceSelector

  • The default policy

  • The per-pod override annotation

Verify that the following items are true:

Your pods don’t run applications with UID 1337

Application containers should not run with UID 1337, as this is the UID used by the service proxy, Envoy. Currently the Envoy Sidecar Proxy must run as istio-proxy with UID 1337 and is not a centrally configurable deployment option.

Your admission controller is enabled

If you see an error like the following:

error: unable to recognize "istio.yaml": no matches for
      admissionregistration.k8s.io/, Kind=MutatingWebhookConfiguration

you are likely running Kubernetes version 1.9 or earlier, which might not have support for mutating admission webhooks or might not have it enabled and is the reason for the error.

The istio-injection label is present

As discussed in Chapter 5, for pods to benefit from automatic sidecar injection their namespace must bear an istio-injection label. So the injection Webhook will be invoked only for pods created in namespaces with the istio-injection=enabled label. Verify that the affected pods are within a namespace bearing this label:

$ kubectl get namespace -L istio-injection
NAME              STATUS   AGE   ISTIO-INJECTION
default           Active   39d   enabled
istio-system      Active   13h
kube-node-lease   Active   39d
kube-public       Active   39d
kube-system       Active   39d

The scope of the Webhook’s namespaceSelect is correct

The Webhook’s namespaceSelector determines whether the Webhook is scoped to opt-in or opt-out for the target namespace. The namespaceSelector for opt-in looks like this:

$ kubectl get mutatingwebhookconfiguration istio-sidecar-injector -o yaml
       | grep "namespaceSelector:" -A5

  namespaceSelector:
    matchLabels:
      istio-injection: enabled
  rules:
  - apiGroups:
    - ""

The namespaceSelector for opt-out looks like this:

 namespaceSelector:
    matchExpressions:
    - key: istio-injection
      operator: NotIn
      values:
      - disabled
  rules:
  - apiGroups:
    - ""
Debugging Citadel

As is the case when troubleshooting other problems, consulting Citadel’s logs and events can be insightful to diagnosing issues it might have.

$ kubectl logs -l istio=citadel -n istio-system
$ kubectl describe pod -l istio=citadel -n istio-system

It could be that the case that the istio-citadel pod isn’t running, so verify its status:

$ kubectl get pod -l istio=citadel -n istio-system
NAME                             READY   STATUS    RESTARTS   AGE
istio-citadel-678b7c5cd4-ndn4n   1/1     Running   0          13h

Redeploy the istio-citadel pod if it isn’t in a Running state.

Version Compatibility

You can’t mix versions of Istio components, istioctl, and the Bookinfo sample application. For example, don’t simultaneously run Istio v1.1, istioctl v.1.0, and Bookinfo v1.2. The same is generally true for patch versions as well, so for example, avoid running Istio v1.1.4 while using istioctl v1.1.3. To confirm the version of each Istio control-plane component you’re running, run the following:

$ istioctl version --remote -o yaml

Alternatively, you can take a look at the image tag on one of the control-plane components. Using Pilot as an example, you can execute this:

$ kubectl get deployment istio-pilot -o yaml -n istio-system | grep image:
      | cut -d ':' -f3 | head −1

These debugging tools and their example use should serve you well; however, this is far from a complete list. Other failure and troubleshooting scenarios exist, and fortunately, so too, do other tools. With the proliferation of service meshes, the service mesh landscape has seen a number of helpful utilities and management tools emerge. We anticipate the trend of growth of management-plane software to continue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset