In the previous chapter, you learned the process for installing Pachyderm locally to get started quickly and test Pachyderm on your computer.
Production use cases require additional compute resources and scalability that can be efficiently achieved using cloud platforms and managed Kubernetes platform services provided by the popular cloud vendors. Pachyderm can run on a Kubernetes cluster, irrespective of whether it is deployed manually on cloud instances or as a managed Kubernetes service. We will discuss the most popular and easy-to-configure methods on cloud providers.
This chapter walks you through the cloud-based installation of Pachyderm and explains the software requirements needed to run a Pachyderm cluster in production. This chapter will cover the installation on the following most popular cloud platforms: Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Microsoft Azure Kubernetes Service (AKS).
In this chapter, we're going to cover the following main topics:
If you are on macOS, verify that you have an up-to-date version of macOS. If you are using Linux, you must be on 64-bit versions of recent distributions of CentOS, Fedora, or Ubuntu. If you are on Windows, run all the commands described in this section from Windows Subsystem for Linux (WSL). You should have the following tools installed from the previous chapters:
We will need to install the following tools:
We will go into the specifics regarding the installation and configuration of these tools as we go through this chapter. If you already know how to do this, you can go ahead and set them up now.
In this section, we will cover the installation of the system tools that we will use to prepare our environment before deploying a Kubernetes cluster and installing Pachyderm on cloud platforms.
The AWS Command Line Interface, aws-cli, is required to execute commands in your AWS account. For additional information, you can refer to the AWS Command Line Interface official documentation at https://docs.aws.amazon.com/cli/latest/userguide/. Let's install aws-cli on your computer:
If you are using macOS:
$ curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
$ sudo installer -pkg AWSCLIV2.pkg -target
If you are on Linux (x86) or WSL on Windows:
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install
$ aws --version
The output of the preceding command should look as follows:
aws-cli/2.4.7 Python/3.8.8 Linux/5.11.0-41-generic exe/x86_64.ubuntu.20 prompt/off
$ aws configure
The output of the preceding command should look as follows:
AWS Access Key ID [None]: YOURACCESSKEYHERE
AWS Secret Access Key [None]: YOURSECRETACCESSKEYHERE
Default region name [None]: us-east-1
Default output format [None]: json
$ export AWS_ACCESS_KEY_ID=YOURACCESSKEY2HERE
$ export AWS_SECRET_ACCESS_KEY=YOURSECRETACCESS2KEYHERE
$ export AWS_DEFAULT_REGION=us-west-2
Now that you have installed the AWS Command Line Interface on your computer, let's install the AWS IAM authenticator for Kubernetes.
Amazon EKS leverages AWS IAM to provide access to the Kubernetes clusters created through EKS. To be able to make the kubectl command work with Amazon EKS IAM roles, the Amazon IAM authenticator for Kubernetes needs to be installed. Let's install the IAM authenticator on your computer:
If you are using macOS:
$ brew install aws-iam-authenticator
If you are using Linux (x86) or WSL on Windows:
$ curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.19.6/2021-01-05/bin/linux/amd64/aws-iam-authenticator
$ chmod +x ./aws-iam-authenticator
$ mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
$ echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
$ aws-iam-authenticator version
The output of the preceding command should look as follows. To be able to perform the following recipes, the aws-iam-authenticator version should be 0.5.0 or later:
{"Version":"v0.5.0","Commit":"1cfe2a90f68381eacd7b6dcfa2 bf689e76eb8b4b"}
Now you have installed aws-iam-authenticator on your computer.
Amazon EKS is a managed Kubernetes service on Amazon EC2. To manage Amazon EKS over the terminal and execute commands, the official CLI for Amazon EKS, eksctl, is used. For additional information, you can refer to the AWS eksctl official documentation at https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html. Let's install eksctl on your computer:
If you are using macOS:
$ brew tap weaveworks/tap
$ brew install weaveworks/tap/eksctl
If you are using Linux (x86) or WSL on Windows:
$ curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
$ sudo mv /tmp/eksctl /usr/local/bin
The output of the preceding command should look as follows:
0.77.0
To be able to perform the following steps, the eksctl version should be 0.77.0 or later. Now you have installed eksctl to manage Amazon EKS on your computer.
The Google Cloud SDK, gcloud, is required to execute commands in your Google Cloud account. The following instructions assume that you have an active GCP account with billing enabled. If you don't have an account already, go to https://console.cloud.google.com and create an account. Let's install gcloud on your computer:
If you are using macOS:
$ brew tap weaveworks/tap
$ brew install weaveworks/tap/eksctl
If you are using Linux (x86) or WSL on Windows:
$ curl https://sdk.cloud.google.com | bash
$ exec -l $SHELL
$ gcloud version
The output of the preceding command should look as follows. To be able to perform the following recipes, the Google Cloud SDK version should be 339.0.0 or later:
Google Cloud SDK 367.0.0
bq 2.0.72
core 2021.12.10
gsutil 5.5
$ gcloud init
$ gcloud config set compute/zone us-central1-a
Now you have installed gcloud to manage GKE on your computer.
The Azure CLI, az, is required to execute commands in your Microsoft Azure account. The following instructions assume that you have an active Azure account with billing enabled. If you don't have an account already, go to https://portal.azure.com and create an account. Let's install the Azure CLI on your computer:
If you are using macOS:
$ brew update && brew install azure-cli
$ brew install jq
If you are using Linux (x86) or WSL on Windows:
$ curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
$ sudo apt-get install jq
$ az version
The output of the preceding command should look as follows. To be able to perform the following steps, the Azure CLI version should be 2.0.1 or later:
{
"azure-cli": "2.31.0",
"azure-cli-core": "2.31.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {}
}
$ az login
$ az group create --name="pachyderm-group" --location=centralus
Now you have installed the Azure CLI to manage AKS on your computer.
Kubernetes is an open source container orchestration platform and, by itself, is a large topic to cover. In this section, we take the topic of containerization from a data scientist's perspective and will only focus on running our workload, Pachyderm, on the most common managed platforms available in the market. There are various ways and tools to provision and manage the life cycle of production-grade Kubernetes clusters on the AWS cloud platform, such as kOps, kubespray, k3s, Terraform, and others. For additional configuration details, you can refer to Kubernetes' official documentation at https://kubernetes.io/docs/setup/production-environment/. Let's learn the simplest way to get the services required by Pachyderm up and running on AWS's managed Kubernetes service, Amazon EKS.
Follow these steps to provision an Amazon EKS cluster using eksctl. Initially developed as a third-party open source tool, eksctl is now the official tool for deploying and managing EKS clusters via a CLI. You will need to have the AWS CLI and the AWS IAM authenticator for Kubernetes installed and their credentials configured. If you have a cluster, you can skip these instructions and jump to the Deploying Pachyderm on Amazon EKS section. Also, you can refer to the Amazon EKS official documentation at https://eksctl.io/introduction/:
$ eksctl create cluster
The output of the preceding command should return output similar to this:
...
kubectl command should work with "/home/norton/.kube/config", try 'kubectl get nodes'
[✔] EKS cluster "exciting-badger-1620255089" in "us-east-1" region is ready
Important note
To customize the EKS cluster configuration, you can pass additional parameters to eksctl as follows:
eksctl create cluster --name <name> --version <version>
--nodegroup-name <name> --node-type <vm-flavor>
--nodes <number-of-nodes> --nodes-min <min-number-nodes>
--nodes-max <max-number-nodes> --node-ami auto
$ kubectl cluster-info && kubectl get nodes
The output of the preceding command should look as follows:
Kubernetes control plane is running at https://ABCD.gr7.us-east-1.eks.amazonaws.com
CoreDNS is running at https://ABCD.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
NAME STATUS ROLES AGE VERSION
ip-192-168-17-133.ec2.internal Ready <none> 21m v1.21.5-eks-bc4871b
ip-192-168-63-179.ec2.internal Ready <none> 21m v1.21.5-eks-bc4871b
Now, your Amazon EKS cluster is provisioned and ready to deploy Pachyderm.
Pachyderm uses S3-compliant object storage to store data. Follow these steps to create an S3 object storage bucket:
$ export S3_BUCKET_NAME=s3.pachyderm
$ export EBS_STORAGE_SIZE=200
$ export AWS_REGION=us-east-1
$ aws s3api create-bucket --bucket ${S3_BUCKET_NAME}
--region ${AWS_REGION}
$ aws s3api list-buckets --query 'Buckets[].Name'
The output of the preceding command should look as follows:
[
"s3.pachyderm",
]
Now that we have an S3 bucket created, we are ready to deploy Pachyderm on Amazon EKS.
When you start learning Pachyderm, it is recommended to run experiments in a small local cluster. We have previously covered the local deployment of Pachyderm in Chapter 4, Installing Pachyderm Locally. In this chapter, we focus on a scalable production-grade deployment of Pachyderm using IAM roles on Amazon EKS clusters.
Follow these steps to install Pachyderm on your Amazon EKS cluster:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<s3-bucket>",
"arn:aws:s3:::*/*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
$ helm repo add pach https://helm.pachyderm.com
$ helm repo update
$ helm install pachd pach/pachyderm
--set deployTarget=AMAZON
--set pachd.storage.amazon.bucket="AWS_S3_BUCKET"
--set pachd.storage.amazon.id="AWS_ACCESS_KEY"
--set pachd.storage.amazon.secret="AWS_SECRET"
--set pachd.storage.amazon.region="us-east-1"
--set pachd.externalService.enabled=true
If you have an enterprise key and would like to deploy it with Pachyderm's console user interface, execute the following command:
$ helm install pachd pach/pachyderm
--set deployTarget=AMAZON
--set pachd.storage.amazon.bucket="AWS_S3_BUCKET"
--set pachd.storage.amazon.id="AWS_ACCESS_KEY"
--set pachd.storage.amazon.secret="AWS_SECRET"
--set pachd.storage.amazon.region="us-east-1"
--set pachd.enterpriseLicenseKey=$(cat license.txt)
--set console.enabled=true
Once the console is deployed successfully, follow the instructions under the Accessing the Pachyderm console section to access the console.
The commands return the following output:
Optional: Customizing Installation Parameters
You can also download and customize the values.yaml file in the Helm Chart repository, https://github.com/pachyderm/pachyderm/tree/master/etc/helm/pachyderm, to further optimize the components needed to run Pachyderm.
Execute the following command to create a local copy of the values.yaml file:
$ wget https://raw.githubusercontent.com/pachyderm/pachyderm/master/etc/helm/pachyderm/values.yaml
Once customized, you can use the same YAML file and install your Helm Chart by executing the following command instead:
$ helm install pachyderm -f values.yaml pach/pachyderm
$ kubectl get deployments
The output of the preceding command should look as follows:
$ kubectl get pods
The output of the preceding command should look as follows:
$ kubectl get pv
The output of the preceding command should look as follows:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-cab1f435-02fb-42df-85d9-d49f6151b281 200Gi RWO Delete Bound default/etcd-storage-etcd-0 etcd-storage-class 81m
Important note
When Pachyderm is deployed using the --dynamic-etcd-nodes flag, it creates an etcd deployment to manage administrative metadata. Block storage used by etcd Pods is provisioned using the default AWS StorageClass, gp2. To use a different StorageClass during deployment, you will need to deploy an Amazon EBS CSI driver to your cluster and update the etcd.storageclass parameter to gp3 during the Helm Chart deployment.
The output of the preceding command should look as follows:
COMPONENT VERSION
pachctl 2.0.0
pachd 2.0.0
Now that we have installed Pachyderm on our AWS EKS cluster, we are ready to create our first pipeline.
If you need to delete your Pachyderm deployment or start afresh, you can wipe out your environment and start over again from the Preparing an EKS cluster to run Pachyderm instructions. Let's perform the following steps to delete your existing Pachyderm deployment:
$ helm ls | grep pachyderm
The output of the preceding command should look as follows:
pachd default 1 2021-12-27 20:20:33.168535538 -0800 PST deployed pachyderm-2.0.0 2.0.0
$ helm uninstall pachd
$ eksctl get cluster
The output of the preceding command should look as follows:
2021-05-05 21:53:56 [ℹ] eksctl version 0.47.0
2021-05-05 21:53:56 [ℹ] using region us-east-1
NAME REGION EKSCTL CREATED
exciting-badger-1620255089 us-east-1 True
$ eksctl delete cluster --name <name>
The output of the preceding command should complete similar to the following:
...
2021-05-05 22:00:54 [ℹ] will delete stack "eksctl-exciting-badger-1620255089-cluster"
2021-05-05 22:00:54 [✔] all cluster resources were deleted
Now you have completely removed Pachyderm and your EKS cluster from your AWS account.
If you use Google Cloud, a managed Kubernetes service can be deployed on Google Cloud Platform (GCP) using automation and command-line tools with the help of kOps, kubespray, Terraform, and others. For additional configuration details, you can refer to Kubernetes' official documentation at https://kubernetes.io/docs/setup/production-environment/. Let's learn the simplest way to get the services required by Pachyderm up and running on Google Cloud's managed Kubernetes service, GKE.
Follow these steps to provision a GKE cluster using the Google Cloud SDK. You will need to have the Google Cloud SDK installed and its credentials configured. If you have a cluster, you can skip these instructions and jump to the Deploying the cluster section. Also, you can refer to the Google Cloud SDK official documentation at https://cloud.google.com/sdk/docs/install:
$ gcloud container clusters create pachyderm-cluster
--scopes compute-rw,storage-rw,service-management,service-control,logging-write,monitoring
--machine-type n2-standard-4
The output of the preceding command should complete similar to the following:
...
kubeconfig entry generated for pachyderm-cluster.
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
pachyderm-cluster us-central1-a 1.18.16-gke.2100 35.238.200.52 n2-standard-4 1.18.16-gke.2100 1 RUNNING
Important note
To simply customize the GKE cluster parameters, you can use the GCP console and the Kubernetes Engine creation wizard. After you configure the parameters on the wizard, click on the command-line button in the wizard to convert the configuration into a CLI command to use with the gcloud command.
$ kubectl cluster-info && kubectl get nodes
The output of the preceding command should look as follows:
Kubernetes control plane is running at https://<IP_ADDRESS>
GLBCDefaultBackend is running at https://<IP_ADDRESS>/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
KubeDNS is running at https://<IP_ADDRESS>/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://<IP_ADDRESS>/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
NAME STATUS ROLES AGE VERSION
gke-pachyderm-cluster-default-pool-26cf3a77-1vr1 Ready <none> 12 v1.18.16-gke.2100
gke-pachyderm-cluster-default-pool-26cf3a77-5sgs Ready <none> 12m v1.18.16-gke.2100
gke-pachyderm-cluster-default-pool-26cf3a77-lkr4 Ready <none> 12m v1.18.16-gke.2100
Now, your GKE cluster is provisioned and ready to deploy Pachyderm.
Pachyderm uses object storage to store data. Follow these steps to create a Google Cloud object storage bucket:
$ export GCS_BUCKET_NAME=pachyderm-bucket
$ export GKE_STORAGE_SIZE=200
$ gsutil mb gs://${GCS_BUCKET_NAME}
$ gsutil ls
The output of the preceding command should look as follows:
gs://pachyderm-bucket/
Now, you have a GCS bucket created to store Pachyderm data. We are ready to deploy Pachyderm on GKE.
When you start learning Pachyderm, it is suggested to run experiments in a small local cluster. We have previously covered the local deployment of Pachyderm in Chapter 4, Installing Pachyderm Locally. In this chapter, we are going to focus on a scalable production-grade deployment of Pachyderm using IAM roles on GKE clusters.
Follow these steps to install Pachyderm on your GKE cluster:
$ gcloud iam service-accounts create my-service-account --display-name=my-account
$ gcloud projects add-iam-policy-binding
pachyderm-book –role roles/owner --member
serviceAccount:[email protected]
$ helm repo add pach https://helm.pachyderm.com
$ helm repo update
$ helm install pachd pach/pachyderm
--set deployTarget=GOOGLE
--set pachd.storage.google.bucket="GOOGLE_BUCKET"
--set pachd.storage.google.cred="GOOGLE_CRED"
--set pachd.externalService.enabled=true
If you have an enterprise key and would like to deploy it with Pachyderm's console user interface, execute the following command:
$ helm install pachd pach/pachyderm
--set deployTarget=GOOGLE
--set pachd.storage.google.bucket="GOOGLE_BUCKET"
--set pachd.storage.google.cred="GOOGLE_CRED"
--set pachd.enterpriseLicenseKey=$(cat license.txt)
--set console.enabled=true
Once the console is deployed successfully, follow the instructions under the Accessing the Pachyderm console section to access the console.
$ kubectl get deployments
The output of the preceding command should look as follows:
NAME READY UP-TO-DATE AVAILABLE AGE
dash 1/1 1 1 44s
pachd 1/1 1 1 45s
$ kubectl get Pods
The output of the preceding command should look as follows:
NAME READY STATUS RESTARTS AGE
dash-cf6f47d7d-xpvvp 2/2 Running 0 104s
etcd-0 1/1 Running 0 104s
pachd-6c99f6fb7-dnjhn 1/1 Running 0 104s
$ kubectl get pv
The output of the preceding command should look as follows:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-c4eac147-8571-4ccb-8cd0-7c1cb68a627d 200Gi RWO Delete Bound default/etcd-storage-etcd-0 etcd-storage-class 3m4s
$ pachctl version
The output of the preceding command should look as follows:
COMPONENT VERSION
pachctl 2.0.0
pachd 2.0.0
Now we have installed Pachyderm on your GKE cluster, you are ready to create your first pipeline.
If you need to delete your deployment and start afresh, you can wipe out your environment and start over again using the Preparing a GKE cluster to run Pachyderm instructions. Let's perform the following steps to delete your existing Pachyderm deployment:
$ helm ls | grep pachyderm
The output of the preceding command should look as follows:
pachd default 1 2021-12-27 20:20:33.168535538 -0800 PST deployed pachyderm-2.0.0 2.0.0
$ helm uninstall pachd
$ gcloud container clusters list
The output of the preceding command should look as follows:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
pachyderm-cluster us-central1-a 1.18.16-gke.2100 35.238.200.52 n2-standard-2 1.18.16-gke.2100 3 RUNNING
$ gcloud container clusters delete <name>
The output of the preceding command should complete similar to the following:
...
Deleting cluster pachyderm-cluster...done.
Deleted [https://container.googleapis.com/v1/projects/pachydermbook/zones/us-central1-a/clusters/pachyderm-cluster].
Now you have completely removed Pachyderm and your Kubernetes cluster from your GCP account.
If you use Microsoft Azure, a managed Kubernetes service can be deployed on the Azure platform using automation and command-line tools with the help of kOps, kubespray, Terraform, and others. For additional configuration details, you can refer to Kubernetes' official documentation at https://kubernetes.io/docs/setup/production-environment/. Let's learn the simplest way to get the services required by Pachyderm up and running on AKS.
Follow these steps to provision an AKS cluster using the Azure CLI. You will need to have the Azure CLI installed and its credentials configured. If you have a cluster, you can skip these instructions and jump to the Deploying Pachyderm on Microsoft AKS section. Also, you can refer to the Azure CLI official documentation at https://docs.microsoft.com/en-us/cli/azure/:
$ az aks create --resource-group pachyderm-group --name pachyderm-cluster --generate-ssh-keys --node-vm-size Standard_DS4_v2
The output of the preceding command should complete similar to the following:
...
"privateFqdn": null,
"provisioningState": "Succeeded",
"resourceGroup": "pachyderm-group",
"servicePrincipalProfile": {
"clientId": "msi",
"secret": null
},...
Important note
If you don't remember your resource group name, you can use the az group list command to list the previously created resource groups.
$ az aks get-credentials --resource-group pachyderm-group --name pachyderm-cluster
$ kubectl get nodes
The output of the preceding command should look as follows:
NAME STATUS ROLES AGE VERSION
aks-nodepool1-34139531-vmss000000 Ready agent 5m57s v1.19.9
aks-nodepool1-34139531-vmss000001 Ready agent 5m58s v1.19.9
aks-nodepool1-34139531-vmss000002 Ready agent 5m58s v1.19.9
Now your AKS cluster is provisioned and ready to deploy Pachyderm.
Pachyderm uses blob storage to store data and block storage for metadata. It is recommended to use SSDs rather than the Standard HDD-based slower storage option.
Follow these steps to create a Premium LRS Block blobs storage container:
$ export RESOURCE_GROUP=pachyderm-group
$ export STORAGE_ACCOUNT=pachydermstorageaccount
$ export CONTAINER_NAME=pachydermblobcontainer
$ export AZURE_STORAGE_SIZE=200
$ az storage account create
--resource-group="${RESOURCE_GROUP}"
--location="centralus"
--sku=Premium_LRS
--name="${STORAGE_ACCOUNT}"
--kind=BlockBlobStorage
$ az storage account list
The output of the preceding command should look as follows:
...
"web": "https://pachydermstorageaccount.z19.web.core.windows.net/"
},
"primaryLocation": "centralus",
"privateEndpointConnections": [],
"provisioningState": "Succeeded",
"resourceGroup": "pachyderm-group",
...
$ STORAGE_KEY="$(az storage account keys list
--account-name="${STORAGE_ACCOUNT}"
--resource-group="${RESOURCE_GROUP}"
--output=json
| jq '.[0].value' -r
)"
$ az storage container create --name ${CONTAINER_NAME}
--account-name ${STORAGE_ACCOUNT}
--account-key "${STORAGE_KEY}"
The output of the preceding command should look as follows:
{
"created": true
}
Now, you have an Azure data storage container created in your Azure storage account to store Pachyderm data.
When you start learning Pachyderm, it is suggested to run experiments in a small local cluster. We have previously covered the local deployment of Pachyderm in Chapter 4, Installing Pachyderm Locally. In this chapter, we are going to focus on a scalable production-grade deployment of Pachyderm on an AKS cluster.
Follow these steps to install Pachyderm on your AKS cluster:
$ az aks get-credentials --resource-group pachyderm-group --name pachyderm-cluster
$ helm install pachd pach/pachyderm
--set deployTarget=MICROSOFT
--set pachd.storage.microsoft.container="CONTAINER_NAME"
--set pachd.storage.microsoft.id="AZURE_ID"
--set pachd.storage.microsoft.secret="AZURE_SECRET"
--set pachd.externalService.enabled=true
If you have an enterprise key and you would like to deploy it with Pachyderm's console user interface, execute the following command:
$ helm install pachd pach/pachyderm
--set deployTarget=MICROSOFT
--set pachd.storage.microsoft.container="CONTAINER_NAME"
--set pachd.storage.microsoft.id="AZURE_ID"
--set pachd.storage.microsoft.secret="AZURE_SECRET"
--set pachd.enterpriseLicenseKey=$(cat license.txt)
--set console.enabled=true
Once the console is deployed successfully, follow the instructions under the Accessing the Pachyderm console section to access the console.
$ kubectl get deployments
The output of the preceding command should look as follows:
NAME READY UP-TO-DATE AVAILABLE AGE
dash 1/1 1 1 39s
pachd 1/1 1 1 39s
$ kubectl get pods
The output of the preceding command should look as follows:
NAME READY STATUS RESTARTS AGE
dash-866fd997-z79jj 2/2 Running 0 54s
etcd-0 1/1 Running 0 54s
pachd-8588c44f56-skmkl 1/1 Running 0 54 s
$ kubectl get pv
The output of the preceding command should look as follows:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-9985a602-789d-40f3-9249-7445a9c15bc3 200Gi RWO Delete Bound default/etcd-storage-etcd-0 default 89s
Important note
When Pachyderm is deployed using the --dynamic-etcd-nodes flag, it creates an etcd deployment to manage administrative metadata. In Azure, block storage used by etcd Pods is provisioned using the default StorageClass. This Storage Class uses the azure-disk provisioner with StandardSSD_LRS volumes. To use a different StorageClass during the deployment, you can customize the values.yaml file and update the etcd.storageClass parameter prior to the deployment.
The output of the preceding command should look as follows:
COMPONENT VERSION
pachctl 2.0.0
pachd 2.0.0
Now that we have installed Pachyderm on your AKS cluster, you are ready to create your first pipeline.
If you need to delete your deployment or start afresh, you can wipe out your environment and start over again using the Preparing an AKS cluster to run Pachyderm instructions.
Let's perform the following steps to delete your existing Pachyderm deployment:
$ helm ls | grep pachyderm
The output of the preceding command should look as follows:
pachd default 1 2021-12-27 20:20:33.168535538 -0800 PST deployed pachyderm-2.0.0 2.0.0
$ helm uninstall pachd
$ az aks list
The output of the preceding command should look as follows:
...
"location": "centralus",
"maxAgentPools": 100,
"name": "pachyderm-cluster",
"networkProfile": {
…
$ az aks delete --name <name> --resource-group pachyderm-group
The output of the preceding command should complete similar to the following:
...
Deleting cluster pachyderm-cluster...done.
Deleted [https://container.googleapis.com/v1/projects/pachydermbook/zones/us-central1-a/clusters/pachyderm-cluster].
Now you have completely removed Pachyderm and your Kubernetes cluster from your AKS account.
Pachyderm Enterprise Edition offers a graphical user interface where you can see pipelines and repositories. Accessing the Pachyderm console using port forwarding was covered in Chapter 4, Installing Pachyderm Locally.
In addition, for cloud deployments, you can deploy a Kubernetes ingress to access the Pachyderm console securely. For more information, refer to the official Pachyderm documentation.
In this chapter, we learned the software prerequisites for getting Pachyderm up and running on managed Kubernetes services from major cloud providers including AWS, Google Cloud, and Microsoft Azure.
We acquired basic knowledge of cloud providers' command-line tools and learned how to install and operate them on your local machine to provide production-grade Kubernetes clusters.
We created an object storage bucket and also deployed highly available multi-node managed Kubernetes clusters using the most common configuration options. And finally, we deployed a Pachyderm instance.
In the next chapter, we will learn in detail about creating your first pipeline. You will learn a simple data science example and a pipeline creation workflow.
You can refer to the following links for more information on the topics covered in this chapter: