Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

11 Running Kubernetes on Multiple Clusters

In this chapter, we’ll take it to the next level and consider options for running Kubernetes and deploying workloads on multiple clouds and multiple clusters. Since a single Kubernetes cluster has limits, once you exceed these limits you must run multiple clusters. A typical Kubernetes cluster is a closely-knit unit where all the components run in relative proximity and are connected by a fast network (typically, a physical data center or cloud provider availability zone). This is great for many use cases, but there are several important use cases where systems need to scale beyond a single cluster or a cluster needs to be stretched across multiple availability zones.

This is a very active area in Kubernetes these days. In the previous edition of the book, this chapter covered Kubernetes Federation and Gardener. Since then, the Kubernetes Federation project was abandoned. There are now many projects that provide different flavors of multi-cluster solutions, such as direct management, Virtual Kubelet solutions, and the gardener.cloud project, which is pretty unique.

The topics we will cover include the following:

Stretched clusters vs multiple clusters
The history of cluster federation in Kubernetes
Cluster API
Karmada
Clusternet
Clusterpedia
Open Cluster Management
Virtual Kubelet solutions
Introduction to the Gardener project

Stretched Kubernetes clusters versus multi-cluster Kubernetes

There are several reasons to run multiple Kubernetes clusters:

You want redundancy in case the geographical zone your cluster runs in has some issues
You need more nodes or pods than a single Kubernetes cluster supports
You want to isolate workloads across different clusters for security reasons

For the first reason it is possible to use a stretched cluster; for the other reasons, you must run multiple clusters.

Understanding stretched Kubernetes clusters

A stretched cluster (AKA wide cluster) is a single Kubernetes cluster where the control plane nodes and the work nodes are provisioned across multiple geographical availability zones or regions. Cloud providers offer this model for HA-managed Kubernetes clusters.

Pros of a stretched cluster

There are several benefits to the stretched cluster model:

Your cluster, with proper redundancy, is protected from data center failures as a SPOF (single point of failure)
The simplicity of operating against a single Kubernetes cluster is a huge win (logging, metrics, and upgrades)
When you run your own unmanaged stretched cluster you can extend it transparently to additional locations (on-prem, edge, and other cloud providers)

Cons of a stretched cluster

However, the stretched model has its downsides too:

You can’t exceed the limits of a single Kubernetes cluster
Degraded performance due to cross-zone networking
On the cloud cross-zone, networking costs might be substantial
Cluster upgrades are an all-or-nothing affair

In short, it’s good to have the option for stretched clusters, but be prepared to switch to the multi-cluster model if some of the downsides are unacceptable.

Understanding multi-cluster Kubernetes

Multi-cluster Kubernetes means provisioning multiple Kubernetes clusters. Large-scale systems often can’t be deployed on a single cluster for various reasons mentioned earlier. That means you need to provision multiple Kubernetes clusters and then figure out how to deploy your workloads on all these clusters and how to handle various use cases, such as some clusters being unavailable or having degraded performance. There are many more degrees of freedom.

Pros of multi-cluster Kubernetes

Here are some of the benefits of the multi-cluster model:

Scale your system arbitrarily – there are no inherent limits on the number of clusters
Provide cluster-level isolation to sensitive workloads at the RBAC level
Utilize multiple cloud providers without incurring excessive costs (as long as most traffic remains within the same cloud provider region)
Upgrade and perform incremental operations, even for cluster-wide operations

Cons of multi-cluster Kubernetes

However, there are some non-trivial downsides to the multi-cluster level:

The very high complexity of provisioning and managing a fleet of clusters
Need to figure out how to connect all the clusters
Need to figure out how to store and provide access to data across all the clusters
A lot of options to shoot yourself in the foot when designing multi-cluster deployments
Need to work hard to provide centralized observability across all clusters

There are solutions out there for some of these problems, but at this point in time, there is no clear winner you can just adopt and easily configure for your needs. Instead, you will need to adapt and solve problems depending on the specific issues raised with your organization’s multi-cluster structure.

The history of cluster federation in Kubernetes

In the previous editions of the book, we discussed Kubernetes Cluster Federation as a solution to managing multiple Kubernetes clusters as a single conceptual cluster. Unfortunately, this project has been inactive since 2019, and the Kubernetes multi-cluster Special Interest Group (SIG) is considering archiving it. Before we describe more modern approaches let’s get some historical context. It’s funny to talk about the history of a project like Kubernetes that didn’t even exist before 2014, but the pace of development and the large number of contributors took Kubernetes through an accelerated evolution. This is especially relevant for the Kubernetes Federation.

In March 2015, the first revision of the Kubernetes Cluster Federation proposal was published. It was fondly nicknamed “Ubernetes” back then. The basic idea was to reuse the existing Kubernetes APIs to manage multiple clusters. This proposal, now called Federation V1, went through several rounds of revision and implementation but never reached general availability, and the main repo has been retired: https://github.com/kubernetes-retired/federation.

The SIG multi-cluster workgroup realized that the multi-cluster problem is more complicated than initially perceived. There are many ways to skin this particular cat and there is no one-size-fits-all solution. The new direction for cluster federation was to use dedicated APIs for federation. A new project and a set of tools were created and implemented as Kubernetes Federation V2: https://github.com/kubernetes-sigs/kubefed.

Unfortunately, this didn’t take off either, and the consensus of the multi-cluster SIG is that since the project is not being maintained, it needs to be archived.

See the notes for the meeting from 2022-08-09: https://tinyurl.com/sig-multicluster-notes.

There are a lot of projects out there moving fast to try to solve the multi-cluster problem, and they all operate at different levels. Let’s look at some of the prominent ones. The goal here is just to introduce these projects and what makes them unique. It is beyond the scope of this chapter to fully explore each one. However, we will dive deeper into one of the projects – the Cluster API – in Chapter 17, Running Kubernetes in Production.

Cluster API

The Cluster API (AKA CAPI) is a project from the Cluster Lifecycle SIG. Its goal is to make provisioning, upgrading, and operating multiple Kubernetes clusters easy. It supports both kubeadm-based clusters as well as managed clusters via dedicated providers. It has a cool logo inspired by the famous “It’s turtles all the way down” story. The idea is that the Cluster API uses Kubernetes to manage Kubernetes clusters.

Figure 11.1: The Cluster API logo

Cluster API architecture

The Cluster API has a very clean and extensible architecture. The primary components are:

The management cluster
The work cluster
The bootstrap provider
The infrastructure provider
The control plane
Custom resources

Figure 11.2: Cluster API architecture

Let’s understand the role of each one of these components and how they interact with each other.

Management cluster

The management cluster is a Kubernetes cluster that is responsible for managing other Kubernetes clusters (work clusters). It runs the Cluster API control plane and providers, and it hosts the Cluster API custom resources that represent the other clusters.

The clusterctl CLI can be used to work with the management cluster. The clusterctl CLI is a command-line tool with a lot of commands and options, if you want to experiment with the Cluster API through its CLI, visit https://cluster-api.sigs.k8s.io/clusterctl/overview.html.

Work cluster

A work cluster is just a regular Kubernetes cluster. These are the clusters that developers use to deploy their workloads. The work clusters don’t need to be aware that they are managed by the Cluster API.

Bootstrap provider

When CAPI creates a new Kubernetes cluster, it needs certificates before it can create the work cluster’s control plane and, finally, the worker nodes. This is the job of the bootstrap provider. It ensures all the requirements are met and eventually joins the worker nodes to the control plane.

Infrastructure provider

The infrastructure provider is a pluggable component that allows CAPI to work in different infrastructure environments, such as cloud providers or bare-metal infrastructure providers. The infrastructure provider implements a set of interfaces as defined by CAPI to provide access to compute and network resources.

Check out the current providers’ list here: https://cluster-api.sigs.k8s.io/reference/providers.html.

Control plane

The control plane of a Kubernetes cluster consists of the API server, the etcd stat store, the scheduler, and the controllers that run the control loops to reconcile the resources in the cluster. The control plane of the work clusters can be provisioned in various ways. CAPI supports the following modes:

Machine-based – the control plane components are deployed as static pods on dedicated machines
Pod-based – the control plane components are deployed via Deployments and StatefulSet, and the API server is exposed as a Service
External – the control plane is provisioned and managed by an external provider (typically, a cloud provider)

Custom resources

The custom resources represent the Kubernetes clusters and machines managed by CAPI as well as additional auxiliary resources. There are a lot of custom resources, and some of them are still considered experimental. The primary CRDs are:

Cluster
ControlPlane (represents control plane machines)
MachineSet (represents worker machines)
MachineDeployment
Machine
MachineHealthCheck

Some of these generic resources have references to corresponding resources offered by the infrastructure provider.

The following diagram illustrates the relationships between the control plane resources that represent the clusters and machine sets:

Figure 11.3: Cluster API control plane resources

CAPI also has an additional set of experimental resources that represent a managed cloud provider environment:

MachinePool
ClusterResourceSet
ClusterClass

See https://github.com/kubernetes-sigs/cluster-api for more details.

Karmada

Karmada is a CNCF sandbox project that focuses on deploying and running workloads across multiple Kubernetes clusters. Its claim to fame is that you don’t need to make changes to your application configuration. While CAPI was focused on the lifecycle management of clusters, Karmada picks up when you already have a set of Kubernetes clusters and you want to deploy workloads across all of them. Conceptually, Karmada is a modern take on the abandoned Kubernetes Federation project.

It can work with Kubernetes in the cloud, on-prem, and on the edge.

See https://github.com/karmada-io/karmada.

Let’s look at Karmada’s architecture.

Karmada architecture

Karmada is heavily inspired by Kubernetes. It provides a multi-cluster control plane with similar components to the Kubernetes control plane:

Karmada API server
Karmada controller manager
Karmada scheduler

If you understand how Kubernetes works, then it is pretty easy to understand how Karmada extends it 1:1 to multiple clusters.

The following diagram illustrates the Karmada architecture:

Figure 11.4: Karmada architecture

Karmada concepts

Karmada is centered around several concepts implemented as Kubernetes CRDs. You define and update your applications and services using these concepts and Karmada ensures that your workloads are deployed and run in the right place across your multi-cluster system.

Let’s look at these concepts.

ResourceTemplate

The resource template looks just like a regular Kubernetes resource such as Deployment or StatefulSet, but it doesn’t actually get deployed to the Karmada control plane. It only serves as a blueprint that will eventually be deployed to member clusters.

PropagationPolicy

The propagation policy determines where a resource template should be deployed. Here is a simple propagation policy that will place the nginx Deployment into two clusters, called member1 and member2:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: cool-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - member1
        - member2

OverridePolicy

Propagation policies operate across multiple clusters, but sometimes, there are exceptions. The override policy lets you apply fine-grained rules to override existing propagation policies. There are several types of rules:

ImageOverrider: Dedicated to overriding images for workloads
CommandOverrider: Dedicated to overriding commands for workloads
ArgsOverrider: Dedicated to overriding args for workloads
PlaintextOverrider: A general-purpose tool to override any kind of resources

Additional capabilities

There is much more to Karmada, such as:

Multi-cluster de-scheduling
Re-scheduling
Multi-cluster failover
Multi-cluster service discovery

Check the Karmada documentation for more details: https://karmada.io/docs/.

Clusternet

Clusternet is an interesting project. It is centered around the idea of managing multiple Kubernetes clusters as “visiting the internet” (hence the name “Clusternet”). It supports cloud-based, on-prem, edge, and hybrid clusters. The core features of Clusternet are:

Kubernetes multi-cluster management and governance
Application coordination
A CLI via the kubectl plugin
Programmatic access via a wrapper to the Kubernetes Client-Go library

Clusternet architecture

The Clusternet architecture is similar to Karmada but simpler. There is a parent cluster that runs the Clusternet hub and Clusternet scheduler. On each child cluster, there is a Clusternet agent. The following diagram illustrates the structure and interactions between the components:

Figure 11.5: Clusternet architecture

Clusternet hub

The hub has multiple roles. It is responsible for approving cluster registration requests and creating namespaces, service accounts, and RBAC resources for all child clusters. It also serves as an aggregated API server that maintains WebSocket connections to the agent on child clusters. The hub also provides a Kubernetes-like API to proxy requests to each child cluster. Last but not least, the hub coordinates the deployment of applications and their dependencies to multiple clusters from a single set of resources.

Clusternet scheduler

The Clusternet scheduler is the component that is responsible for ensuring that resources (called feeds in Clusternet terminology) are deployed and balanced across all the child clusters according to policies called SchedulingStrategy.

Clusternet agent

The Clusternet agent runs on every child cluster and communicates with the hub. The agent on a child cluster is the equivalent of the kubelet on a node. It has several roles. The agent registers its child cluster with the parent cluster. The agent provides a heartbeat to the hub that includes a lot of information, such as the Kubernetes version, running platform, health, readiness, and liveness of workloads. The agent also sets up the WebSocket connection to the hub on the parent cluster to allow full-duplex communication channels over a single TCP connection.

Multi-cluster deployment

Clusternet models multi-cluster deployment as subscriptions and feeds. It provides a Subscription custom resource that can be used to deploy a set of resources (called feeds) to multiple clusters (called subscribers) based on different criteria. Here is an example of a Subscription that deploys a Namespace, a Service, and a Deployment to multiple clusters with a label of clusters.clusternet.io/cluster-id:

# examples/dynamic-dividing-scheduling/subscription.yaml
apiVersion: apps.clusternet.io/v1alpha1
kind: Subscription
metadata:
  name: dynamic-dividing-scheduling-demo
  namespace: default
spec:
  subscribers: # filter out a set of desired clusters
    - clusterAffinity:
        matchExpressions:
          - key: clusters.clusternet.io/cluster-id
            operator: Exists
  schedulingStrategy: Dividing
  dividingScheduling:
    type: Dynamic
    dynamicDividing:
      strategy: Spread # currently we only support Spread dividing strategy
  feeds: # defines all the resources to be deployed with
    - apiVersion: v1
      kind: Namespace
      name: qux
    - apiVersion: v1
      kind: Service
      name: my-nginx-svc
      namespace: qux
    - apiVersion: apps/v1 # with a total of 6 replicas
      kind: Deployment
      name: my-nginx
      namespace: qux

See https://clusternet.io for more details.

Clusterpedia

Clusterpedia is a CNCF sandbox project. Its central metaphor is Wikipedia for Kubernetes clusters. It has a lot of capabilities around multi-cluster search, filtering, field selection, and sorting. This is unusual because it is a read-only project. It doesn’t offer to help with managing the clusters or deploying workloads. It is focused on observing your clusters.

Clusterpedia architecture

The architecture is similar to other multi-cluster projects. There is a control plane element that runs the Clusterpedia API server and ClusterSynchro manager components. For each observed cluster, there is a dedicated component called cluster syncro that synchronizes the state of the clusters into the storage layer of Clusterpedia. One of the most interesting aspects of the architecture is the Clusterpedia aggregated API server, which makes all your clusters seem like a single huge logical cluster. Note that the Clusterpedia API server and the ClusterSynchro manager are loosely coupled and don’t interact directly with each other. They just read and write from a shared storage layer.

Figure 11.6: Clusterpedia architecture

Let’s look at each of the components and understand what their purpose is.

Clusterpedia API server

The Clusterpedia API server is an aggregated API server. That means that it registers itself with the Kubernetes API server and, in practice, extends the standard Kubernetes API server via custom endpoints. When requests come to the Kubernetes API server, it forwards them to the Clusterpedia API server, which accesses the storage layer to satisfy them. The Kubernetes API server serves as a forwarding layer for the requests that Clusterpedia handles.

This is an advanced aspect of Kubernetes. We will discuss API server aggregation in Chapter 15, Extending Kubernetes.

ClusterSynchro manager

Clusterpedia observes multiple clusters to provide its search, filter, and aggregation features. One way to implement it is that whenever a request comes in, Clusterpedia would query all the observed clusters, collect the results, and return them. This approach is very problematic, as some clusters might be slow to respond and similar requests will require returning the same information, which is wasteful and costly. Instead, the ClusterSynchro manager collectively synchronizes the state of each observed cluster into Clusterpedia storage, where the Clusterpedia API server can respond quickly.

Storage layer

The storage layer is an abstraction layer that stores the state of all observed clusters. It provides a uniform interface that can be implemented by different storage components. The Clusterpedia API server and the ClusterSynchro manager interact with the storage layer interface and never talk to each other directly.

Storage component

The storage component is an actual data store that implements the storage layer interface and stores the state of observed clusters. Clusterpedia was designed to support different storage components to provide flexibility for their users. Currently, supported storage components include MySQL, Postgres, and Redis.

Importing clusters

To onboard clusters into Clusterpedia, you define a PediaCluster custom resource. It is pretty straightforward:

apiVersion: cluster.clusterpedia.io/v1alpha2
kind: PediaCluster
metadata:
  name: cluster-example
spec:
  apiserver: "https://10.30.43.43:6443"
  kubeconfig:
  caData:
  tokenData:
  certData:
  keyData:
  syncResources: []

You need to provide credentials to access the cluster, and then Clusterpedia will take over and sync its state.

Advanced multi-cluster search

This is where Clusterpedia shines. You can access the Clusterpedia cluster via an API or through kubectl. When accessing it through a URL it looks like you hit the aggregated API server endpoint:

kubectl get --raw="/apis/clusterpedia.io/v1beta1/resources/apis/apps/v1/deployments?clusters=cluster-1,cluster-2"

You can specify the target cluster as a query parameter (in this case, cluster-1 and cluster-2).

When accessing through kubectl, you specify the target clusters as a label (in this case, "search.clusterpedia.io/clusters in (cluster-1,cluster-2)"):

kubectl --cluster clusterpedia get deployments -l "search.clusterpedia.io/clusters in (cluster-1,cluster-2)"

Other search labels and queries exist for namespaces and resource names:

search.clusterpedia.io/namespaces (query parameter is namespaces)
search.clusterpedia.io/names (query parameter is names)

There is also an experimental fuzzy search label, internalstorage.clusterpedia.io/fuzzy-name, for resource names, but no query parameter. This is useful as often, resources have generated names with random suffixes.

You can also search by creation time:

search.clusterpedia.io/before (query parameter is before)
search.clusterpedia.io/since (query parameter is since)

Other capabilities include filtering by resource labels or field selectors as well as organizing the results using OrderBy and Paging.

Resource collections

Another important concept is resource collections. The standard Kubernetes API offers a straightforward REST API where you can list or get one kind of resource at a time. However, often, users would like to get multiple types of resources at the same time. For example, the Deployment, Service, and HorizontalPodAutoscaler with a specific label. This requires multiple calls via the standard Kubernetes API, even if all these resources are available on one cluster.

Clusterpedia defines a CollectionResource that groups together resources that belong to the following categories:

any (all resources)
workloads (Deployments, StatefulSets, and DaemonSets)
kuberesources (all resources other than workloads)

You can search for any combination of resources in one API call by passing API groups and resource kinds:

kubectl get --raw "/apis/clusterpedia.io/v1beta1/collectionresources/any?onlyMetadata=true&groups=apps&resources=batch/jobs,batch/cronjobs"

See https://github.com/clusterpedia-io/clusterpedia for more details.

Open Cluster Management

Open Cluster Management (OCM) is a CNCF sandbox project for multi-cluster management, as well as multi-cluster scheduling and workload placement. Its claim to fame is closely following many Kubernetes concepts, extensibility via addons, and strong integration with other open source projects, such as:

Submariner
Clusternet (that we covered earlier)
KubeVela

The scope of OCM covers cluster lifecycle, application lifecycle, and governance.

Let’s look at OCM’s architecture.

OCM architecture

OCM’s architecture follows the hub and spokes model. It has a hub cluster, which is the OCM control plane that manages multiple other clusters (the spokes).

The control plane’s hub cluster runs two controllers: the registration controller and the placement controller. In addition, the control plane runs multiple management addons, which are the foundation for OCM’s extensibility. On each managed cluster, there is a so-called Klusterlet that has a registration-agent and work-agent that interact with the registration controller and placement controller on the hub cluster. Then, there are also addon agents that interact with the addons on the hub cluster.

The following diagram illustrates how the different components of OCM communicate:

Figure 11.7: OCM architecture

Let’s look at the different aspects of OCM.

OCM cluster lifecycle

Cluster registration is a big part of OCM’s secure multi-cluster story. OCM prides itself on the secure double opt-in handshake registration. Since a hub-and-spoke cluster may have different administrators, this model provides protection for each side from undesired requests. Each side can terminate the relationship at any time.

The following diagram demonstrates the registration process (CSR means certificate signing request):

Figure 11.8: OCM registration process

OCM application lifecycle

The OCM application lifecycle supports creating, updating, and deleting resources across multiple clusters.

The primary building block is the ManifestWork custom resource that can define multiple resources. Here is an example that contains only a single Deployment:

apiVersion: work.open-cluster-management.io/v1
kind: ManifestWork
metadata:
  namespace: <target managed cluster>
  name: awesome-workload
spec:
  workload:
    manifests:
      - apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: hello
          namespace: default
        spec:
          selector:
            matchLabels:
              app: hello
          template:
            metadata:
              labels:
                app: hello
            spec:
              containers:
                - name: hello
                  image: quay.io/asmacdo/busybox
                  command:
                    ["sh", "-c", 'echo "Hello, Kubernetes!" && sleep 3600']

The ManifestWork is created on the hub cluster and is deployed to the target cluster according to the namespace mapping. Each target cluster has a namespace representing it in the hub cluster. A work agent running on the target cluster will monitor all ManifestWork resources on the hub cluster in their namespace and sync changes.

OCM governance, risk, and compliance

OCM provides a governance model based on policies, policy templates, and policy controllers. The policies can be bound to a specific set of clusters for fine-grained control.

Here is a sample policy that requires the existence of a namespace called Prod:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: policy-namespace
  namespace: policies
  annotations:
    policy.open-cluster-management.io/standards: NIST SP 800-53
    policy.open-cluster-management.io/categories: CM Configuration Management
    policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
spec:
  remediationAction: enforce
  disabled: false
  policy-templates:
    - objectDefinition:
        apiVersion: policy.open-cluster-management.io/v1
        kind: ConfigurationPolicy
        metadata:
          name: policy-namespace-example
        spec:
          remediationAction: inform
          severity: low
          object-templates:
            - complianceType: MustHave
              objectDefinition:
                kind: Namespace # must have namespace 'prod'
                apiVersion: v1
                metadata:
                  name: prod

See https://open-cluster-management.io/ for more details.

Virtual Kubelet

Virtual Kubelet is a fascinating project. It impersonates a kubelet to connect Kubernetes to other APIs such as AWS Fargate or Azure ACI. The Virtual Kubelet looks like a node to the Kubernetes cluster, but the compute resources backing it up are abstracted away. The Virtual Kubelet looks like just another node to the Kubernetes cluster:

Figure 11.9: Virtual Kubelet, which looks like a regular node to the Kubernetes cluster

The features of the Virtual Kubelet are:

Creating, updating, and deleting pods
Accessing container logs and metrics
Getting a pod, pods, and pod status
Managing capacity
Accessing node addresses, node capacity, and node daemon endpoints
Choosing the operating system
Supporting your own virtual network

See https://github.com/virtual-kubelet/virtual-kubelet for more details.

This concept can be used to connect multiple Kubernetes clusters too, and several projects follow this approach. Let’s look briefly at some projects that use Virtual Kubelet for multi-cluster management such as tensile-kube, Admiralty, and Liqo.

Tensile-kube

Tensile-kube is a sub-project of the Virtual Kubelet organization on GitHub.

Tensile-kube brings the following to the table:

Automatic discovery of cluster resources
Async notification of pod modifications
Full access to pod logs and kubectl exec
Global scheduling of pods
Re-scheduling of pods using descheduler
PV/PVC
Service

Tensile-kube uses the terminology of the upper cluster for the cluster that contains the Virtual Kubelets, and the lower clusters for the clusters that are exposed as virtual nodes in the upper cluster.

Here is the tensile-kube architecture:

Figure 11.10: Tensile-kube architecture

See https://github.com/virtual-kubelet/tensile-kube for more details.

Admiralty

Admiralty is an open source project backed by a commercial company. Admiralty takes the Virtual Kubelet concept and builds a sophisticated solution for multi-cluster orchestration and scheduling. Target clusters are represented as virtual nodes in the source cluster. It has a pretty complicated architecture that involves three levels of scheduling. Whenever a pod is created on a proxy, pods are created on the source cluster, candidate pods are created on each target cluster, and eventually, one of the candidate pods is selected and becomes a delegate pod, which is a real pod that actually runs its containers. This is all supported by custom multi-cluster schedulers built on top of the Kubernetes scheduling framework. To schedule workloads on Admiralty, you need to annotate any pod template with multicluster.admiralty.io/elect="" and Admiralty will take it from there.

Here is a diagram that demonstrates the interplay between different components:

Figure 11.11: Admiralty architecture

Admiralty provides the following features:

Highly available
Live disaster recovery
Dynamic CDN (content delivery network)
Multi-cluster workflows
Support for edge computing, IoT, and 5G
Governance
Cluster upgrades
Clusters as cattle abstraction
Global resource federation
Cloud bursting and arbitrage

See https://admiralty.io for more details.

Liqo

Liqo is an open source project based on the liquid computing concept. Let your tasks and data float around and find the best place to run. Its scope is very impressive, as it targets not only the compute aspect of running pods across multiple clusters but also provides network fabric and storage fabric. These aspects of connecting clusters and managing data across clusters are often harder problems to solve than just running workloads.

In Liqo’s terminology, the management cluster is called the home cluster and the target clusters are called foreign clusters. The virtual nodes in the home cluster are called “Big” nodes, and they represent the foreign clusters.

Liqo utilizes IP address mapping to achieve a flat IP address space across all foreign clusters that may have internal IP conflicts.

Liqo filters and batches events from the foreign clusters to reduce pressure on the home cluster.

Here is a diagram of the Liqo architecture:

Figure 11.12: Liqo architecture

See https://liqo.io for more details.

Let’s move on and take an in-depth look at the Gardener project, which takes a different approach.

Introducing the Gardener project

The Gardener project is an open source project developed by SAP. It lets you manage thousands (yes, thousands!) of Kubernetes clusters efficiently and economically. Gardener solves a very complex problem, and the solution is elegant but not simple. Gardener is the only project that addresses both the cluster lifecycle and application lifecycle.

In this section, we will cover the terminology of Gardener and its conceptual model, dive deep into its architecture, and learn about its extensibility features. The primary theme of Gardener is to use Kubernetes to manage Kubernetes clusters. A good way to think about Gardener is Kubernetes-control-plane-as-a-service.

See https://gardener.cloud for more details.

Understanding the terminology of Gardener

The Gardener project, as you may have guessed, uses botanical terminology to describe the world. There is a garden, which is a Kubernetes cluster responsible for managing seed clusters. A seed is a Kubernetes cluster responsible for managing a set of shoot clusters. A shoot cluster is a Kubernetes cluster that runs actual workloads.

The cool idea behind Gardener is that the shoot clusters contain only the worker nodes. The control planes of all the shoot clusters run as Kubernetes pods and services in the seed cluster.

The following diagram describes in detail the structure of Gardener and the relationships between its components:

Figure 11.13: The Gardener project structure

Don’t panic! Underlying all this complexity is a crystal clear conceptual model.

Understanding the conceptual model of Gardener

The architecture diagram of Gardener can be overwhelming. Let’s unpack it slowly and surface the underlying principles. Gardener really embraces the spirit of Kubernetes and offloads a lot of the complexity of managing a large set of Kubernetes clusters to Kubernetes itself. At its heart, Gardener is an aggregated API server that manages a set of custom resources using various controllers. It embraces and takes full advantage of Kubernetes’ extensibility. This approach is common in the Kubernetes community. Define a set of custom resources and let Kubernetes manage them for you. The novelty of Gardener is that it takes this approach to the extreme and abstracts away parts of Kubernetes infrastructure itself.

In a “normal” Kubernetes cluster, the control plane runs in the same cluster as the worker nodes. Typically, in large clusters, control plane components like the Kubernetes API server and etcd run on dedicated nodes and don’t mix up with the worker nodes. Gardener thinks in terms of many clusters and it takes all the control planes of all the shoot clusters and has a seed cluster to manage them. So the Kubernetes control plane of the shoot clusters is managed in the seed cluster as regular Kubernetes Deployments, which automatically provides replication, monitoring, self-healing, and rolling updates by Kubernetes.

So, the control plane of a Kubernetes shoot cluster is analogous to a Deployment. The seed cluster, on the other hand, maps to a Kubernetes node. It manages multiple shoot clusters. It is recommended to have a seed cluster per cloud provider. The Gardener developers actually work on a gardenlet controller for seed clusters that is similar to the kubelet on nodes.

If the seed clusters are like Kubernetes nodes, then the Garden cluster that manages those seed clusters is like a Kubernetes cluster that manages its worker nodes.

By pushing the Kubernetes model this far, the Gardener project leverages the strengths of Kubernetes to achieve robustness and performance that would be very difficult to build from scratch.

Let’s dive into the architecture.

Diving into the Gardener architecture

Gardener creates a Kubernetes namespace in the seed cluster for each shoot cluster. It manages the certificates of the shoot clusters as Kubernetes secrets in the seed cluster.

Managing the cluster state

The etcd data store for each cluster is deployed as a StatefulSet with one replica. In addition, events are stored in a separate etcd instance. The etcd data is periodically snapshotted and stored in remote storage for backup and restore purposes. This enables very fast recovery of clusters that lost their control plane (e.g., when an entire seed cluster becomes unreachable). Note that when a seed cluster goes down, the shoot cluster continues to run as usual.

Managing the control plane

As mentioned before, the control plane of a shoot cluster X runs in a separate seed cluster, while the worker nodes run in a shoot cluster. This means that pods in the shoot cluster can use internal DNS to locate each other, but communication to the Kubernetes API server running in the seed cluster must be done through an external DNS. This means the Kubernetes API server runs as a Service of the LoadBalancer type.

Preparing the infrastructure

When creating a new shoot cluster, it’s important to provide the necessary infrastructure. Gardener uses Terraform for this task. A Terraform script is dynamically generated based on the shoot cluster specification and stored as a ConfigMap within the seed cluster. To facilitate this process, a dedicated component (Terraformer) runs as a job, performs all the provisioning, and then writes the state into a separate ConfigMap.

Using the Machine controller manager

To provision nodes in a provider-agnostic manner that can work for private clouds too, Gardener has several custom resources such as MachineDeployment, MachineClass, MachineSet, and Machine. They work with the Kubernetes Cluster Lifecycle group to unify their abstractions because there is a lot of overlap. In addition, Gardener takes advantage of the cluster auto-scaler to offload the complexity of scaling node pools up and down.

Networking across clusters

The seed cluster and shoot clusters can run on different cloud providers. The worker nodes in the shoot clusters are often deployed in private networks. Since the control plane needs to interact closely with the worker nodes (mostly the kubelet), Gardener creates a VPN for direct communication.

Monitoring clusters

Observability is a big part of operating complex distributed systems. Gardener provides a lot of monitoring out of the box using best-of-class open source projects like a central Prometheus server, deployed in the garden cluster that collects information about all seed clusters. In addition, each shoot cluster gets its own Prometheus instance in the seed cluster. To collect metrics, Gardener deploys two kube-state-metrics instances for each cluster (one for the control plane in the seed and one for the worker nodes in the shoot). The node-exporter is deployed too to provide additional information on the nodes. The Prometheus AlertManager is used to notify the operator when something goes wrong. Grafana is used to display dashboards with relevant data on the state of the system.

The gardenctl CLI

You can manage Gardener using only kubectl, but you will have to switch profiles and contexts a lot as you explore different clusters. Gardener provides the gardenctl command-line tool that offers higher-level abstractions and can operate on multiple clusters at the same time. Here is an example:

$ gardenctl ls shoots
projects:
- project: team-a
  shoots:
  - dev-eu1
  - prod-eu1
$ gardenctl target shoot prod-eu1
[prod-eu1]
$ gardenctl show prometheus
NAME           READY     STATUS    RESTARTS   AGE       IP              NODE
prometheus-0   3/3       Running   0          106d      10.241.241.42   ip-10-240-7-72.eu-central-1.compute.internal
URL: https://user:[email protected]

One of the most prominent features of Gardener is its extensibility. It has a large surface area and it supports many environments. Let’s see how extensibility is built into its design.

Extending Gardener

Gardener supports the following environments:

AliCloud
AWS
Azure
Equinix Metal
GCP
OpenStack
vSphere

It started, like Kubernetes itself, with a lot of provider-specific support in the primary Gardener repository. Over time, it followed the Kubernetes example that externalized cloud providers and migrated the providers to separate Gardener extensions. Providers can be specified using a CloudProfile CRD such as:

apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
metadata:
  name: aws
spec:
  type: aws
  kubernetes:
    versions:
    - version: 1.24.3
    - version: 1.23.8
      expirationDate: "2022-10-31T23:59:59Z"
  machineImages:
  - name: coreos
    versions:
    - version: 2135.6.0
  machineTypes:
  - name: m5.large
    cpu: "2"
    gpu: "0"
    memory: 8Gi
    usable: true
  volumeTypes:
  - name: gp2
    class: standard
    usable: true
  - name: io1
    class: premium
    usable: true
  regions:
  - name: eu-central-1
    zones:
    - name: eu-central-1a
    - name: eu-central-1b
    - name: eu-central-1c
  providerConfig:
    apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
    kind: CloudProfileConfig
    machineImages:
    - name: coreos
      versions:
      - version: 2135.6.0
        regions:
        - name: eu-central-1
          ami: ami-034fd8c3f4026eb39
          # architecture: amd64 # optional

Then, a shoot cluster will choose a provider and configure it with the necessary information:

apiVersion: gardener.cloud/v1alpha1
kind: Shoot
metadata:
  name: johndoe-aws
  namespace: garden-dev
spec:
  cloudProfileName: aws
  secretBindingName: core-aws
  cloud:
    type: aws
    region: eu-west-1
    providerConfig:
      apiVersion: aws.cloud.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vpc: # specify either 'id' or 'cidr'
        # id: vpc-123456
          cidr: 10.250.0.0/16
        internal:
        - 10.250.112.0/22
        public:
        - 10.250.96.0/22
        workers:
        - 10.250.0.0/19
      zones:
      - eu-west-1a
    workerPools:
    - name: pool-01
    # Taints, labels, and annotations are not yet implemented. This requires interaction with the machine-controller-manager, see
    # https://github.com/gardener/machine-controller-manager/issues/174. It is only mentioned here as future proposal.
    # taints:
    # - key: foo
    #   value: bar
    #   effect: PreferNoSchedule
    # labels:
    # - key: bar
    #   value: baz
    # annotations:
    # - key: foo
    #   value: hugo
      machineType: m4.large
      volume: # optional, not needed in every environment, may only be specified if the referenced CloudProfile contains the volumeTypes field
        type: gp2
        size: 20Gi
      providerConfig:
        apiVersion: aws.cloud.gardener.cloud/v1alpha1
        kind: WorkerPoolConfig
        machineImage:
          name: coreos
          ami: ami-d0dcef3
        zones:
        - eu-west-1a
      minimum: 2
      maximum: 2
      maxSurge: 1
      maxUnavailable: 0
  kubernetes:
    version: 1.11.0
    ...
  dns:
    provider: aws-route53
    domain: johndoe-aws.garden-dev.example.com
  maintenance:
    timeWindow:
      begin: 220000+0100
      end: 230000+0100
    autoUpdate:
      kubernetesVersion: true
  backup:
    schedule: "*/5 * * * *"
    maximum: 7
  addons:
    kube2iam:
      enabled: false
    kubernetes-dashboard:
      enabled: true
    cluster-autoscaler:
      enabled: true
    nginx-ingress:
      enabled: true
      loadBalancerSourceRanges: []
    kube-lego:
      enabled: true
      email: [email protected]

But, the extensibility goals of Gardener go far beyond just being provider agnostic. The overall process of standing up a Kubernetes cluster involves many steps. The Gardener project aims to let the operator customize each and every step by defining custom resources and webhooks. Here is the general flow diagram with the CRDs, mutating/validating admission controllers, and webhooks associated with each step:

Figure 11.14: Flow diagram of CRDs mutating and validating admission controllers

Here are the CRD categories that comprise the extensibility space of Gardener:

Providers for DNS management, such as Route53 and CloudDNS
Providers for blob storage, including S3, GCS, and ABS
Infrastructure providers like AWS, GCP, and Azure
Support for various operating systems such as CoreOS Container Linux, Ubuntu, and FlatCar Linux
Network plugins like Calico, Flannel, and Cilium
Optional extensions, such as Let’s Encrypt’s certificate service

We have covered Gardener in depth, which brings us to the end of the chapter.

Summary

In this chapter, we’ve covered the exciting area of multi-cluster management. There are many projects that tackle this problem from different angles. The Cluster API project has a lot of momentum for solving the sub-problem of managing the lifecycle of multiple clusters. Many other projects take on the resource management and application lifecycle. These projects can be divided into two categories: projects that explicitly manage multiple clusters using a management cluster and managed clusters, and projects that utilize the Virtual Kubelet where whole clusters appear as virtual nodes in the main cluster.

The Gardener project has a very interesting approach and architecture. It tackles the problem of multiple clusters from a different perspective and focuses on the large-scale management of clusters. It is the only project that addresses both cluster lifecycle and application lifecycle.

At this point, you should have a clear understanding of the current state of multi-cluster management and what the different projects offer. You may decide that it’s still too early or that you want to take the plunge.

In the next chapter, we will explore the exciting world of serverless computing on Kubernetes. Serverless can mean two different things: you don’t have to manage servers for your long-running workloads, and also, running functions as a service. Both forms of serverless are available for Kubernetes, and both of them are extremely useful.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Running Kubernetes on Multiple Clusters

Create new playlist

Sign In

Sign Up

11

Running Kubernetes on Multiple Clusters