Troubleshooting
This section explains the frequently occurred errors during cluster provision
Resource Provisioning Failures¶
Scenario 1: Instance Type Not supported¶
The below error is an example that might occur at the time of cluster provision or adding a new nodegroup to the existing cluster
Validation
To overcome this issue, perform the below validations for instance types in a region:
- Check your Cloud Credentials (roles based or access id or secret) has the required permission to call ec2 AWS APIs. If the Cloud Credentials are role based, ensure all the appropriate IAM Policies are met
- Check whether the configuration has an instance type that is not available in the selected region
Scenario 2: Availability Zones¶
The below error is an example that might occur when the Cloud credentials does not have permission to create resources in the selected region during EKS cluster provision
Validation
Validate the permissions of the cloud credentials used for cluster provisioning to create the resources in that configured region
Scenario 3: Instance Type Permission¶
The below error is an example that might occur when the cloud credentials do not have permission to use a particular instance type, used in the EKS cluster configuration
Validation
- Check for permission and use the right instance type for the cloud credentials
- Rectify the permission on AWS to use the required configured instance type
Scenario 4: K8s version upgrade¶
During the k8s version upgrade to 1.25, the below error occurs if the aws-load-balancer-controller version is 2.4.6. The upgrade gets halted and the preflight check fails
Validation
Update the aws-load-balancer-controller to version v2.4.7 and then upgrade the k8s version to 1.25
Scenario 5: Removal of PSPs¶
The below error is an example that might occur when PSPs are found during the k8s version upgrade to 1.25.
Validation
PSPs are no longer supported in k8s v1.25, hence remove the PSPs and upgrade again
AWS Cloud Errors¶
When provisioning an EKS cluster, it might fail due to various AWS Cloud errors. These errors can stem from resource limitations, network connectivity issues, misconfigurations in the provisioning process, insufficient permissions, service outages impacting required AWS services, software bugs, and region-specific constraints. These factors can disrupt the EKS cluster provisioning process and necessitate troubleshooting to identify and resolve the underlying issues for successful deployment.
To gain insight into the failure and its underlying cause, click on Provision Status of the failed cluster
Expand the Cloud Error(s) section to access detailed information about AWS CloudFormation errors. This action will provide specific details regarding the encountered issues during the cluster provisioning process, enabling you to identify the root cause and take appropriate remedial actions for successful deployment.