Day-2 Operations
Day-2 Operations Support¶
Users are allowed to perform only the below operations on Day 2
Supported Day-2 Operations¶
- Add Cluster to OIDC Provider (without Service accounts)
- Add/scale/delete Node groups
- Add/update Managed Addons
- Add/update Secret Encryption
- Update Blueprint and Blueprint versions
- Update Cloud provider
- Upgrade Cluster K8s version
- Update Cluster labels
- Update Control plane endpoints
- Update Managed Node-group Labels, Taints, and Tags
- Update Spot properties
- Upgrade select nodegroup with Custom AMI
- Upgrade select nodegroup's K8s version
- Update Toleration, Node-Selectors, and Daemonset override
- Update the security group for a managed node group
Important Notes
During the process of updating the security group for a managed node group, please note the following:
-
The update operation may cause temporary unavailability of the control plane if AWS system pods and Rafay edge-client are scheduled on the node group being edited. This is because the current nodes will be replaced with newer nodes having the updated security groups, which can lead to temporary issues with control plane reachability.
-
When attempting to delete a cluster from the Rafay console after editing one or more managed node groups with new security groups, the deletion may fail. This is because the deletion process requires removing the newly created security groups before deleting the cloudformation stack. Only after deleting the new security groups will the deletion of the cloudformation stack succeed.
-
Note that for node groups created by Rafay, specifying a security group during creation will add it on top of the cluster's default security group. However, if you explicitly provide the cluster security group for Rafay-created node groups, an error will be generated.
-
For imported or taken-over node groups, it is mandatory to provide a security group during the edit node group operation. If no security group is provided, an error will occur. On the other hand, for Rafay-created node groups, the cluster security group is implicitly added, and the operation will proceed accordingly.
Provision Taskset Reaper Limitations¶
Tasksets represent the actions users undertake on each cluster, such as scaling or upgrading nodes. Occasionally, certain tasks within a taskset may encounter delays or failures, preventing users from executing subsequent tasks. To address this, Reaper runs periodic scans for tasks in progress that exceed a 5-hour duration. If any taskset has been on hold for over 5 hours, Reaper terminates the ongoing tasks within that set. This enables users to re-initiate any failed tasks, ensuring smoother cluster management.
-
Taskset Reaper manages Amazon Elastic Kubernetes Service (EKS) clusters and it facilitates Day 2 operations on managed nodegroups
-
Managed Nodegroup Creation
- Managed Nodegroup Deletion
- Control Plane Upgrade
- Managed Nodegroup Upgrade
-
Cluster Upgrade with control plane and one or more managed nodegroup(s)
-
Taskset reaper is supported for the listed Day 2 operations via terraform apply, rctl apply, system sync, and v3 APIs. Taskset reaper is not supported for UI operations
- Taskset reaper actions are limited to managed nodegroups only. Not supported for self-managed nodegroups. Although the target cluster can have self-managed nodegroups, reaper actions are limited to the above listed managed nodegroup operations and not applicable to the cluster's self-managed nodegroups
- Reaper will successfully reap the listed operations when edgesrv crashes. Not supported for crashes seen in other edge services
- Expiry time of provision taskset reaper is set to 5 hours, after which, task-specific reaping actions will get triggered