Chapter 7: Deploying a Hosted Cluster with Rancher

For teams that don't want to manage any servers, Rancher provides the ability to deploy and manage hosted Kubernetes services such as Google Kubernetes Engine (GKE), Amazon Elastic Container Service for Kubernetes (Amazon EKS), or Azure Kubernetes Service (AKS). This chapter will cover the pros and cons of using a hosted cluster versus an RKE cluster. Then, we'll cover the requirements and limitations of this type of cluster. At that point, we'll go through the process of prepping the cloud provider. Then, we'll go over setting up EKS, GKE, and AKS clusters using Rancher. Finally, we'll cover the maintenance tasks needed for ongoing cluster management.

In this chapter, we're going to cover the following main topics:

  • How can Rancher manage a hosted cluster?
  • Requirements and limitations
  • Rules for architecting a solution
  • Prepping the cloud provider
  • Installation steps
  • Ongoing maintenance tasks

How can Rancher manage a hosted cluster?

One of the first questions I get is, what is a hosted cluster? The short answer is that it's a Kubernetes cluster created and managed by a cloud provider such as Google, Amazon, or Azure but with Rancher managing the configuration of the cluster. Rancher uses a cloud provider's API and their SDK to create the cluster the same way you would as an end user through their web console or a command-line tool. As of Rancher v2.6, the current list of supported cloud providers is as follows:

  • GKE
  • Amazom EKS
  • AKS
  • Alibaba Cloud Container Service for Kubernetes) (Alibaba ACK)
  • Tencent Kubernetes Engine (Tencent TKE)
  • Huawei Cloud Container Engine (Huawei CCE)

Rancher does this by having a set of controllers in the Rancher leader pod. Each cloud provider has its controller, and each controller uses a Go library for communicating with the cloud provider. Rancher uses a process wherein Rancher stores the cluster's configuration as a specification in the cluster object. For example, EKS is stored under Spec.EKSConfig. For this section, we will go over the v1 controllers first and then the new v2 controllers.

With the original v1 controllers, which are found in Rancher v2.0–2.4, the cluster config was stored in this object and was only updated when Rancher or a user changed. If you were to create an EKS cluster in Rancher and then make a change in the AWS console, that change wouldn't be reflected in Rancher, which would overwrite your change during the next update event. This means the source of truth for these types of clusters is Rancher, and at the time of writing, these clusters cannot be detached from Rancher and managed externally.

The new v2 controllers are only available for EKS and GKE, added to Rancher v2.5.8 and later. The idea of configuration synchronization was added to allow changes made outside Rancher to be synced to it. This is done by two operators called eks-operator and gke-operator. The operator stores the configuration for the cloud provider as Spec.EKSStatus and Spec.GKEStatus. These objects are refreshed every 5 minutes from the cloud provider. The local configuration of the cluster is stored as Spec.EKSConfig and Spec.GKEConfig, which represent the desired state of the cluster with most of the fields in the config section being NULL. Rancher keeps these values NULL until they are set in Rancher. Once the value has been set in Rancher, the operator uses the cloud provider's SDK to update the cluster. Once the cloud has been updated, the Status specs will get updated. If you change the cluster outside Rancher, that change will get picked up by it, and if the managed field is different, it will get overwritten.

One question that always comes up is, what is the difference between building a hosted cluster in Rancher and building it outside Rancher and then importing it? The answer to this question depends on the type of cluster. If it's an EKS or GKE cluster, you'll import the cluster, and Rancher will detect the cluster type. Then, assuming Rancher has the correct permissions, Rancher will convert this cluster into a hosted cluster. At that point, the cluster can be managed in the same manner it would be if Rancher created it. We will be covering more about importing clusters into Rancher in the next chapter.

Requirements and limitations

Now that we understand what a hosted cluster is and how it works in Rancher, we will move on to the requirements and limitations of a hosted cluster in Rancher, along with the design limitations and constraints when choosing a hosted cluster.

Basic requirements

Rancher needs permissions from a cloud provider to be able to create a cluster and its related services. The required permissions will vary, depending upon the cloud provider. The links to the official Rancher documentation for each cloud provider type are listed as follows:

It is recommended that Rancher be configured using a dedicated service account with the least permissions possible.

Rancher will need access to a cloud provider's API endpoint, which means that Rancher will need internet access directly or via an HTTP(S) proxy. If you are using a private API such as AWS's API gateway, that will need to be configured in Rancher.

Rancher will need access to the Kubernetes-API endpoint for the cluster from the Rancher servers.

It is recommended that cloud service accounts are configured in Rancher under a dedicated service account such as local admin, and this account should be the admin permissions in Rancher.

Design limitations and considerations

Some settings such as the available regions are hardcoded in Rancher, meaning that if a cloud provider adds a new region, it might not be available in the Rancher UI until you upgrade Rancher.

Important Note

For the v2 controllers, you can work around the limitations in the Rancher UI by creating the cluster outside Rancher and then importing it.

The Kubernetes versions that are available in the Rancher UI may not match what the cloud provider allows. For example, if you are running an older version of Rancher, you might have v1.13 available in the drop-down menu, but because Amazon no longer supports this version, you will get an error in Rancher stating that the cluster creation failed.

More cloud providers will assume that the cluster being built will have public internet access and public IP addresses assigned to the nodes, load balancers, and a Kube-API endpoint if you want to set up an air-gapped or private IP-only cluster. You will need to work with the cloud provider to configure the additional firewall rules, routes, and other settings required for this cluster. The following are the documentations for using the private endpoints in Rancher:

  • For EKS private-only endpoints, Rancher provides documentation for the additional steps needed, which are located at https://rancher.com/docs/rancher/v2.5/en/cluster-admin/editing-clusters/eks-config-reference/#private-only-api-endpoints.
  • For the GKE private endpoint, you can find the documentation at https://rancher.com/docs/rancher/v2.5/en/cluster-admin/editing-clusters/gke-config-reference/#private-cluster.

    Note

    At the time of writing, this type of configuration is not very mature and has several bugs.

Snapshots and backups are not a thing. Unlike an RKE/2 cluster, most of the hosted clusters do not provide you access to the etcd backup and do not provide an etcd backup option. If the cluster is lost or a user makes a mistake (for example, deleting the wrong namespace), your only option is to redeploy. There are third-party tools such as Velero that can address this shortcoming, and we will cover them later on in this chapter.

The permissions Rancher requires can be too great for some security teams to approve. Rancher does provide a list of the minimum EKS permissions, located at https://rancher.com/docs/rancher/v2.5/en/cluster-provisioning/hosted-kubernetes-clusters/eks/permissions/. It is important to note that some features may not work with a lower set of permissions, and it may require tuning.

The cost of load balancers with hosted clusters can be greater than an RKE/2 cluster. This is because most cloud providers will deploy an external load balancer instead of the shared load balancer, the Ingress NGINX Controller, that RKE/2 uses. Note that you can work around this limitation by deploying nginx-ingress with an external load balancer in front of it.

In this section, we have covered the requirements and limitations. In the next section, we are going to use that knowledge along with additional rules and example designs to help us architect a solution that meets your needs.

Rules for architecting a solution

In this section, we'll cover some of the standard designs and the pros and cons of each. It is important to note that each environment is unique and will require tuning for the best performance and experience. It's also important to note that all CPU, memory, and storage sizes are recommended starting points and may need to be increased or decreased by your workloads and deployment processes. Also, we'll be covering designs for the major infrastructure providers (Amazon EKS and GKE), but you should be able to translate the core concepts for other infrastructure providers.

Before designing a solution, you should be able to answer the following questions:

  • Will multiple environments be sharing the same cluster?
  • Will production and non-production workloads be on the same cluster?
  • What level of availability does this cluster require?
  • Will this cluster be spanning multiple data centers in a metro cluster environment?
  • How much latency will there be between nodes in the cluster?
  • How many pods will be hosted in the cluster?
  • What are the average and maximum size of pods for deployment in the cluster?
  • Will you need GPU support for some of your applications?
  • Will you need to provide storage to your applications?
  • If you need storage, do you need only ReadWriteOnce (RWO) or will you need ReadWriteMany (RWX)?

Let's start with Amazon EKS.

Amazon EKS

EKS is the most mature cloud provider when it comes to Kubernetes as a Service (KaaS). Because of this, EKS is one of the most flexible solutions, but some limitations and rules need to be followed when creating an EKS cluster in Rancher.

The pros of Amazon EKS are as follows:

  • EKS supports enormous clusters, with the current limits being 3,000 nodes per cluster with 737 pods per node (depending on node size).
  • EKS supports third-party Container Network Interface (CNI) providers such as Calico.
  • EKS natively supports Elastic Block Store (EBS) for high-speed ReadWriteOnce storage. The provisioner comes pre-installed. You can find more details about this storage class at https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html.
  • For workloads that require ReadWriteMany, EKS supports Elastic File System (EFS), managed by NFS share from Amazon. You can find more details about this at https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html.
  • Because Amazon controls both the cloud networking and the cluster overlay network, you can assign IP addresses from your Virtual Private Cloud (VPC) directly to pods inside your cluster. This allows other Amazon services to communicate with pods directly. You can find more details about this at https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html.
  • EKS has direct integration between EKS and AWS load balancers. This allows you to deploy both an Application Load Balancer (ALB) as a layer 7/HTTP(S) load balancer and a Network Load Balancer (NLB) as a layer 4/TCP load balancer.

The cons of Amazon EKS are as follows:

Now, let's talk about GKE.

GKE

GKE is the second-most mature cloud provider when it comes to KaaS. This is because Google created Kubernetes and still drives a lot of the integration and development work for core Kubernetes.

The pros of GKE are as follows:

The cons of GKE are as follows:

  • GKE will only provide 99.95% of the service-level agreement (SLA) if you use regional clusters, which costs extra. The details about this cost can be found at https://cloud.google.com/kubernetes-engine/pricing#cluster_management_fee_and_free_tier.
  • At the time of writing, GKE does not have a government cloud option. All the currently supported regions can be found at https://cloud.google.com/compute/docs/regions-zones.

Lastly, we'll talk about AKS.

Microsoft Azure Kubernetes Service (AKS)

AKS is the new kid on the block when it comes to KaaS, but Microsoft has been investing a lot in AKS and has closed the feature gap very quickly.

The pros of AKS are as follows:

  • AKS follows Microsoft's standard monthly patch schedule as they do with their OSes. They also publish their releases on their GitHub page, which is located at https://github.com/Azure/AKS/releases.
  • AKS has automatic node repair whereas Microsoft Azure uses both node agents and the node status in the cluster to trigger a repair. Azure's restoration process is less advanced than the other cloud providers, as it will try rebooting the node before reimaging it and then giving up. Finally, if that fails, an Azure engineer will investigate the issue. You read more about this process at https://docs.microsoft.com/en-us/azure/aks/node-auto-repair.
  • AKS fully supports integration with Azure Active Directory (Azure AD). This allows you to assign permissions inside your cluster using Azure AD users and groups. For more details, visit https://docs.microsoft.com/en-us/azure/aks/managed-aad.
  • AKS has Visual Studio Code extensions that allow developers to run and debug their code directly on their laptop as if it was part of the AKS cluster. Bridge to Kubernetes is basically like creating a VPN connection in your cluster so that pods running on your computer can directly communicate with the clusters and other pods running in the cluster. You can learn more about how this works at https://docs.microsoft.com/en-us/visualstudio/bridge/overview-bridge-to-kubernetes?view=vs-2019.

The cons of AKS are as follows:

Now that we understand the pros and cons of each of the major hosted providers, we are going to dive into getting everything set up in the cloud provider and in Rancher so that we can start creating clusters.

Prepping the cloud provider

Before creating a hosted cluster in Rancher, we need to prepare the cloud provider for Rancher. In this section, we'll be covering setting up permissions in the three major hosted Kubernetes clusters, which are EKS, GKE, and AKS.

We'll start with Amazon EKS.

Amazon EKS

The prerequisites are as follows:

  • You should already have an AWS subscription created and available to use.
  • You'll need permissions in AWS to be able to create Identity and Access Management (IAM) policies.
  • Your Rancher server(s) should be able to reach AWS API public or private endpoints. You can read more about Amazon's API Gateway private endpoint at https://aws.amazon.com/blogs/compute/introducing-amazon-api-gateway-private-endpoints/.
  • EKS will require a VPC to be created, and you should work with your networking team to make it. Amazon has a tutorial located at https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html that covers creating a VPC.
  • You should have a dedicated service account in AWS for Rancher.
  • You should have a dedicated service account in Rancher, and this account should have admin-level permissions. You can use the local admin account for this role. For this section, we will assume that you will be using the local admin account.

Setup permissions

Here are the setup permissions for Rancher:

  1. If you do not already have a dedicated service account in AWS, you should follow the steps at https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html. For this section, we are going to use the name rancher for this service account.
  2. Now that we have the service account, we will assign an IAM policy to that account. This policy gives Rancher the permissions it needs to create an EKS cluster. The minimum required permissions can be found at https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/hosted-kubernetes-clusters/eks/#minimum-eks-permissions, and the steps for creating an IAM policy and attaching it to a service account can be found at https://docs.aws.amazon.com/eks/latest/userguide/EKS_IAM_user_policies.html.
  3. We now need to create access and secret key pair, and the process for doing this can be found at https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey. It is important to note that as per Amazon's best practices guide for access keys, you should set an expiration time for the access key. This will require you to rotate it though. The best practices guide can be found at https://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html, and you can find the documentation for rotating access keys at https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#rotating_access_keys_console. You should also store this key in a safe place if it is needed in the future.
  4. At this point, you should log into the Rancher web UI as the local admin or your dedicated service account.
  5. For the web UI, navigate to Cluster Management and then to Cloud Credentials.
  6. Then, click on the Create button and select Amazon from the list.
  7. Fill in the following form. You should give this credential a name that lets you know it's for Amazon and what subscription it's a part of – for example, you might call it AWS-Prod. The Rancher UI will test whether the credentials are correct but will not validate that the account has all the permissions that Rancher will need. Also, the default region doesn't matter and can be changed at any time. It is also important to note that the access key will be visible, but the secret key is encrypted and cannot be quickly recovered from Rancher:
Figure 7.1 – The Cloud Credential setup wizard for Amazon

Figure 7.1 – The Cloud Credential setup wizard for Amazon

For more details about the cloud credentials, please go to https://rancher.com/docs/rancher/v2.5/en/user-settings/cloud-credentials/.

Now, let's move on to GKE.

GKE

The prerequisites are as follows:

Setup permissions

Here are the setup permissions for Rancher:

  1. If you do not already have a dedicated service account in GCP, you should follow the steps located at https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances. For this section, we are going to use the name rancher for this service account.
  2. Now that we have the service account, we will assign the following default roles to the rancher service account: compute.viewer, viewer, container.admin, and iam.serviceAccountUser.
  3. Instead of an API key pair, GCP uses a private key for service accounts. You'll need to save the key in JSON format. You can find a detailed set of instructions at https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys. You must keep this key for future use.
  4. At this point, you should log into the Rancher web UI as the local admin or your dedicated service account.
  5. Navigate to Cluster Management for the web UI and then to Cloud Credentials.
  6. Then, click on the Create button and select Google from the list.
  7. Fill in the following form. You should give this credential a name that lets you know it's for Google and what project it's a part of – for example, you might call it GCP-Prod. The Rancher UI will test whether the credentials are correct but will not validate that the account has all the permissions that Rancher will need:
Figure 7.2 – The Cloud Credential setup wizard for Google

Figure 7.2 – The Cloud Credential setup wizard for Google

Lastly, let's delve into AKS.

AKS

The prerequisites are as follows:

  • You should already have an Azure subscription created and available to use.
  • You'll need permissions in Azure AD to be able to create an app registration.
  • Your Rancher server(s) should be able to reach the Azure API public or private endpoints. You can read more about private access options for services at https://docs.microsoft.com/en-us/azure/api-management/api-management-using-with-internal-vnet?tabs=stv2.
  • Azure doesn't need a dedicated service account, but as with AWS and GCP, Rancher should have one.
  • You should have the Azure command-line tool already installed.
  • You should have a resource group created for the AKS clusters and related services.

Setup permissions

Here are the setup permissions for Rancher:

  • Run the following command. You'll want to document the output, as we'll need it later:

    az ad sp create-for-rbac --skip-assignment

  • We now want to assign the contributor role to the service principal using the following command. Please note that you'll need the app and subscription ID from the previous command:

    az role assignment create --assignee $appId --scope /subscriptions/$<SUBSCRIPTION-ID>/resourceGroups/$<GROUP> --role Contributor

  • At this point, you should log into the Rancher web UI as the local admin or your dedicated service account.
  • Navigate to Cluster Management for the web UI and then to Cloud Credentials.
  • Then, click on the Create button and select Azure from the list.
  • Fill in the following form. You should give this credential a name that lets you know it's for Azure and what project it's a part of – for example, you might call it AZ-Prod. The Rancher UI will test that the credentials are correct but will not validate that the account has all the permissions that Rancher will need. For the Environment field, AzurePublicCloud is the most common option unless you are using a government subscription:
Figure 7.3 – The Cloud Credential setup wizard for Azure

Figure 7.3 – The Cloud Credential setup wizard for Azure

For the other cloud providers, you can find the steps at https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/hosted-kubernetes-clusters/. At this point, Rancher should have access to the cloud provider. In the next section, we will go through creating some hosted clusters.

Installation steps

In this section, we're going to create a hosted cluster, mainly using the default settings. For the examples, we will be continuing to use EKS, GKE, and AKS. Most of these settings can be translated for other cloud providers. It is important to note that you must already have the cloud credentials for each provider and environment you want to configure. It is also recommended that you keep Rancher up to date as cloud providers are constantly changing, and you might run into a bug simply because you are on an older version of Rancher. The latest stable versions can be found at https://github.com/rancher/rancher#latest-release.

We'll start with Amazon EKS.

Amazon EKS

The following steps show you how to set up EKS using Rancher:

  1. Log into Rancher using the service account that we used during the cloud credentials creation step.
  2. Browse the Cluster Management page, click on Clusters, and then click the Create button.
  3. Then, from the list, select Amazon EKS, at which point you should be prompted with a cluster setup wizard.
  4. You'll want to give the cluster a name. This name can be changed later, but it is recommended not to change it, as that can lead to a name mismatch, which would then lead to a user deleting the wrong resource. Also, the description field is a freeform field that can provide additional information such as who owns this cluster or who should be contacted about this cluster; some users will use this field to post maintenance messages such as Scheduled maintenance every Friday at 7 PM CDT. This can be changed at any time. The bottom section assigns the cloud credential to this cluster:
Figure 7.4 – The cluster creation wizard for Amazon EKS

Figure 7.4 – The cluster creation wizard for Amazon EKS

  1. The rest of the wizard will fill in the default values. You can change them as you see fit, but you should know what you are changing.
  2. The final step is to define the node groups. This includes settings such as the size of the node, the Amazon Machine Image (AMI) image, and the pool size. After defining the cluster, you should click the Create button, at which point Rancher will start the cluster creation process.
  3. The details for all the different settings can be found at https://rancher.com/docs/rancher/v2.6/en/cluster-admin/editing-clusters/eks-config-reference/.
  4. The cluster will go into the Updating status, depending on the cluster's size and Amazon's request queue. This process can take anywhere from 2 to 60 minutes. Please note that the wait is primarily dependent on Amazon and how busy they are.

Let's move on to GKE.

GKE

Now, let's look at the installation steps for GKE:

  1. Follow the same steps as you did for EKS, but this time, select Google GKE from the options menu.
  2. The main difference is the Account access section, as it may ask you to re-enter the cloud credentials and Google project ID.
  3. The details for all the different settings can be found at https://rancher.com/docs/rancher/v2.6/en/cluster-admin/editing-clusters/gke-config-reference/.
  4. Again, the final step of clicking the Create button will cause Rancher to start the cluster creation process.
  5. The cluster will go into the Updating status, depending on the cluster's size and Google request queue. This process usually takes around 15 minutes.

Lastly, let's look into AKS.

AKS

Lastly, the installation procedure for AKS is as follows:

  1. Follow the same steps for EKS and GKE, but this time, select Azure AKS from the options menu.
  2. The details for all the different settings can be found at https://rancher.com/docs/rancher/v2.6/en/cluster-admin/editing-clusters/aks-config-reference/.
  3. It is important to note that the network policy is a setting that can only be enabled when creating clusters. You can find details about the different options at https://docs.microsoft.com/en-us/azure/aks/use-network-policies#differences-between-azure-and-calico-policies-and-their-capabilities.
  4. Again, the final step of clicking the Create button will cause Rancher to start the cluster creation process.
  5. The cluster will go into the Updating status, depending on the cluster's size and Microsoft's request queue. This process usually takes around 60 minutes. From experience, the first cluster in a subscription takes the longest, with additional clusters being faster.

At this point, we should have a Kubernetes cluster from one or more of the cloud providers and be able to easily repeat this process for as many different clusters as we need. This leads us into the final section on what do you do after your cluster is up and running.

Ongoing maintenance tasks

After creating a cluster, a few ongoing maintenance tasks need to be done to keep it running in a healthy state.

The first recommended task is setting up backups. But because these are hosted clusters, we can't take an etcd backup as we would with an RKE1/2 cluster. So, we'll need to use a third-party tool such as Velcro or Kasten. These tools follow the same basic process of querying the Kube-API endpoint to grab a list of objects in the cluster. Then, they will export the different types of Kubernetes objects, IE deployments, ConfigMaps, secrets, and so on as a JSON or YAML file, the idea being that the restore process is running kubectl apply on the backup files. We will be covering these tools in an upcoming chapter.

The second recommended task is testing and documenting how an upgrade impacts your applications. As most cloud providers will do a force drain of a node during a scheduled upgrade, you'll want to test how your application handles this kind of drain. For example, if you are using a multi-master database such as MariaDB Galera Cluster, do your database pods rebuild faster than the worker nodes are drained? A typical way to test this is by changing the node image to simulate the effects of a Kubernetes upgrade. This is because most providers don't allow you to downgrade your clusters. So, being able to repeat this test over and over again is not possible.

Summary

In this chapter, we learned about the different types of hosted clusters that Rancher can deploy, including the requirements and limitations of each. We then covered the rules of architecting each type of cluster, including some of the pros and cons of each solution. We finally went into detail about the steps for creating each type of cluster. We ended the chapter by going over the major ongoing maintenance tasks.

The next chapter will cover importing an externally managed cluster into Rancher.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset