An EKS cluster consists of two VPCs:
- The first VPC managed by AWS that hosts the Kubernetes control plane and
- The second VPC managed by customers that hosts the Kubernetes worker nodes (EC2 instances) where containers run, as well as other AWS infrastructure (like load balancers) used by the cluster.
All worker nodes need the ability to connect to the managed API server endpoint. This connection allows the worker node to register itself with the Kubernetes control plane and to receive requests to run application pods.The worker nodes connect through the EKS-managed elastic network interfaces (ENIs) that are placed in the subnets that you provide when you create the cluster.
Amazon EKS node groups are immutable by design i.e. once created, it is not possible to change its type (managed/unmanaged), the AMI or instance type. Node groups can be scaled up/down anytime. The same EKS cluster can have "multiple" node groups to accommodate different type of workloads. A node group can have mixed instance types when configured to use Spot.
Users can use the Controller to provision Amazon EKS Clusters with either "Self Managed" or "AWS Managed" node groups.
Comparing Node Group Types¶
|Feature||Self Managed||AWS Managed|
|Custom Security Group Rules||Yes||Limited|
|Custom SSH Auth||Yes||Limited|
Users can select from multiple Node AMI family types for the node group. In addition, users can also bring their own "Custom AMI".
|OS||Node AMI Family|
|Linux||Amazon Linux2, Ubuntu18.04, Ubuntu 20.04, Bottlerocket|
|Windows||Windows Server 2019 Full, Windows Server 2019 Core, Windows Server 1909 Core, Windows Server 2004 Core|
Self Managed Node Groups¶
Self Managed node groups are essentially user provisioned EC2 instances or Auto Scaling Groups that are registered as worker nodes to the EKS control plane. To provision EC2 instances as EKS workers, you need to ensure that the following criteria is satisfied:
- The AMI has all the components installed to act as Kubernetes Nodes (i.e. kubelet, container engine at min)
- The associated Security Group needs to allow communication with the Control Plane and other Workers in the cluster.
- User data or boot scripts of the instances need to include a step to register with the EKS control plane.
- The IAM role used by the worker nodes are registered users in the cluster.
On EKS optimized AMIs, the user data is handled by the bootstrap.sh script installed on the AMI.
The Controller streamlines and automates all these steps as part of the provisioning process essentially providing a custom, managed experience for users.
Self managed node groups do not benefit from any managed services provided by AWS. The user needs to configure everything including the AMI to use, Kubernetes API access on the node, registering nodes to EKS, graceful termination, etc. The Controller helps streamline and automate the entire workflow. On the flip side, self managed node groups give users the most flexibility in configuring their worker nodes. Users have complete control over the underlying infrastructure and can customize all the nodes to suit their preference.
Managed Node Groups¶
Managed Node Groups automate the provisioning and lifecycle management of the EKS cluster's worker nodes. With this configuration, AWS takes on the operational burden for the following items:
- Running the latest EKS optimized AMI.
- Gracefully draining nodes before termination during a scale down event.
- Gracefully rotate nodes to update the underlying AMI.
- Apply labels to the resulting Kubernetes Node resources.
While Managed Node Groups provides a managed experience for the provisioning and lifecycle of EC2 instances, they do not configure horizontal auto-scaling or vertical auto-scaling. Managed Node Groups also do not automatically update the underlying AMI to handle OS patches or Kubernetes version updates. The user still needs to manually trigger a Managed Node Group update.
With Managed Node Groups
- Users do not have control over the underlying AMI.
- Only the EKS optimized Amazon Linux 2 AMIs are supported.
- SSH access is possible only with an EC2 Key Pair i.e. you have to use a single, shared key pair for all SSH access.
Users do not have the ability to set a user data script, or update the underlying packages installed in the AMI as the instances are booting
Users have limited control over the security group rules for remote access.
i.e. when you specify an EC2 key pair on the Managed Node Group, by default the security group automatically opens access to port 22 to the whole world (0.0.0.0/0). You can further restrict access by specifying source security group IDs, but you do not have the option to restrict CIDR blocks. This makes it hard to expose access over a peered VPC connection or Direct Connect, where the security group may not live in the same account.
Windows Node Groups¶
Amazon EKS supports Windows Nodes that allow running Windows containers.
A Linux node group with active linux nodes is required to run the VPC resource controller and CoreDNS (Microsoft Windows does not support host-networking mode). Since the Linux node group is critical to the functioning of the cluster, it is recommended to have at least two t2.large Linux nodes to ensure High Availability.
Add Windows Node Group¶
Users can add a Windows Node Group exactly like how they add a Linux node group.
The self service wizard ensures that users will not be shown/allowed to add a Windows node group until there is at least one Linux based node group attached to the EKS cluster.
There are two primary release channels for Windows Server. The Amazon EKS optimized AMIs for Windows are built on top of Windows Server 2019, and are configured to serve as the base image for Amazon EKS nodes. The AMI includes Docker and kubelet out of the box.
Long-Term Servicing Channel (LTSC)
- Currently Windows Server 2019.
- A new major version of Windows Server is released every 2-3 years
- 5 years of mainstream support and 5 years of extended support.
- Currently Windows Server 2004.
- New releases available twice a year, in spring and fall.
- Each release in this channel will be supported for 18 months from the initial release.
VPC Resource Controller¶
The controller automatically installs and configures the VPC resource controller as part of the cluster provisioning process.
Visibility and Monitoring¶
Users can use the console to view details about their Windows Node Groups and scale it up/down as required.
Scale Node Group¶
The process to scale a Windows node group using the controller is identical to the process for Linux node groups.
There are a number of considerations that need to be factored in to use Windows worker nodes on Amazon EKS.
Ensure that the workloads use the correct "node selectors" to ensure they are scheduled on the correct nodes (Windows or Linux).
For Windows workloads
nodeSelector: kubernetes.io/os: windows kubernetes.io/arch: amd64
For Linux workloads
nodeSelector: kubernetes.io/os: linux kubernetes.io/arch: amd64
AWS Fargate is a managed serverless compute engine for containers that works with Amazon EKS. Fargate removes the need to provision and manage servers. Fargate allows developers to specify and pay for resources per application. The use of Fargate can also improve security because applications are isolated by design.
EKS clusters require a Fargate profile that contains information needed to instantiate pods in Fargate. These are:
- Pod Execution Role: Defines the permissions required to run the pod and the networking location (subnet) to run the pod. This allows the same networking and security permissions to be applied to multiple Fargate pods and makes it easier to migrate existing pods on a cluster to Fargate.
- Selector: Define which pods should run on Fargate (namespace and labels)
Amazon EKS clusters can contain managed/self managed node groups and Fargate at the same time.
Node Group Lifecycle¶
Amazon EKS Clusters provisioned by the Controller starts life with one node group. Additional node groups can be added after initial provisioning. Users can also use the Controller to perform actions on node groups.
View Node Group Details¶
Click on the nodegroup to view all the nodegroups and their details. In the example below, as you can see, the EKS cluster has one nodegroup.
Scale Node Group¶
Click on the gear on the far right on a node group to view available actions for a node group.
This will present the user with a prompt for "desired" number of worker nodes. Depending on what is entered, the node group will be either "Scaled Up" or "Scaled Down"
Scaling a node group can take ~5 minutes to ensure that the ec2 instances are provisioned, fully operational and attached to the cluster. The user is provided with feedback and status. Illustrative screenshot below
Scaling down a node group does not explicitly drain the node before removing the nodes from the Auto Scaling Group (ASG). Pods running on the node are terminated and will be restarted by Kubernetes on available nodes.
Add Node Group¶
Click on "Add Node Group" on the far right. The user will be presented with a configuration screen for nodegroups. Enter the required details and Click on Add.
Adding a new nodegroup can take ~5 minutes to ensure that the ec2 instances are provisioned, fully operational and attached to the cluster. The user is provided with feedback and status. Illustrative screenshot below
Drain Node Group¶
When the user drains a node group, the nodes are cordoned. This ensures that existing pods are relocated from these nodes and new pods cannot be scheduled on these nodes.
The user is provided a warning before the node group is drained.
Draining a node group can take a few minutes. The user is provided with feedback and status once this is completed. Illustrative screenshot below
Users can leave a node group in a "drained" state for extended periods of time.
Delete Node Group¶
When the user deletes a node group, the Controller ensures that the node group is drained first before it is deleted.
Deleting a node group can take ~5 minutes to ensure that the ec2 instances are deprovisioned and the CF templates appropriately reconciled. The user is provided with feedback and status during this process. Illustrative screenshot below