Overview
In this self-paced exercise, you will learn how to configure, deploy and test Karpenter within an Amazon EKS cluster using add-ons, blueprints and workloads. You will configure Karpenter to deploy Spot instances to save costs.
Karpenter helps improve your application availability and cluster efficiency by rapidly launching right-sized compute resources in response to changing application loads. When Karpenter is installed in your cluster, Karpenter observes the aggregate resource requests of unscheduled pods and makes decisions to launch new nodes and terminate them to reduce scheduling latencies and infrastructure costs.
Karpenter is faster provisioning instances and scheduling pods than Cluster Autoscaler. Along with enhanced speed, Karpenter will choose the best suitable instance type for the required pod workloads. This leads to more efficient node sizing and reduced administrative configuration. Unlike Cluster Autoscaler, there is no need to configure and manage node groups for each instance type to be used, Karpenter has access to all instance types natively.
As of version 0.33 Karpenter drift detection is enabled by default, this gives us the ability to detect a node's AMI has drifted from its expected state as defined in the SSM Parameter Store. Once the Karpenter controller has detected AMI drift the affected node will be rolled out. This allows us to upgrade our Karpenter-managed nodes during our normal upgrade window.
Upgrading a cluster with Karpenter nodes is quite simple. When you upgrade the cluster with your preferred interface Rafay will kick off the control plane and node group (s) upgrade. The control plane upgrade will run first and this will be managed by EKS. Once the upgrade of the control plane has been completed there will be a drift between the control plane K8s version and the node's K8s version. When the drift detection is detected Karpenter will pull the AMIs from the AWS Parameter Store and perform a rolling upgrade of the Karpenter managed nodes. For nodes managed by a cluster's node group, a rolling upgrade is also used to upgrade the K8s version of the nodes in the node group. The upgrade of the Karpenter-managed nodes and the nodegroup-managed nodes is done in parallel and utilizes a rolling upgrade where new nodes are brought online. The older nodes will be drained of all pods and the pods will be scheduled on the updated nodes. Once this is complete the older nodes are terminated and the cluster upgrade completes.
What Will You Do by Part¶
Part | What will you do? |
---|---|
1 | Setup and Configuration |
2 | Provision an Amazon EKS Cluster |
3 | Cluster Blueprint with Karpenter |
4 | Deploy a test Workload to the EKS cluster to activate Karpenter |
5 | Upgrade a cluster |
6 | Deprovision the EKS cluster |
Architecture & Design¶
- It is recommended to create a dedicated managed node group for system level resources and spot instance based nodes managed by Karpenter to run non-system level workloads.
- This configuration allows system level resources to take advantage of the consistency of on-demand or reserved instances while optimizing workload cost with performance.
- Multi-tenant clusters can utilize multiple NodePools, NodePools should be mutually exclusive and can be selected using taints and tolerations.
- When using Spot instances, it is a best practice to implement an instance diversification strategy.
- For node disruption by default Karpenter will use WhenEmptyOrUnderutilized for the consolidation policy which means if a node has no running non-daemon pods it will be marked as empty and will be deleted. Karpenter can also replace underutilized nodes with a smaller instance type if enabled.
- Try to define accurate requests and limits so Karpenter can allocate the instance type that fits the requirements.
- Exclude instance types that do not fit your workloadds
In this exercise, we will create an EKS cluster that contains one managed node groups for system resources. The spot instances will be managed by Karpenter. A visual representation of this cluster design is shown below..
flowchart LR
subgraph dataplane[EKS Data Plane]
direction LR
subgraph NG1[System Node Group]
direction RL
i1[Rafay Operator]
karpenter[Karpenter]
end
node1[EC2 Spot Instance]
node2[EC2 Spot Instance]
karpenter-->node1
karpenter-->node2
end
controlplane[EKS Control Plane] <===> dataplane
Assumptions¶
- You have access to an Amazon AWS account with privileges to create an IAM Role with the default Full IAM Policy to allow provisioning of resources on your behalf as part of the EKS cluster lifecycle.
- You have the AWS CLI installed and configured