Troubleshooting
This section explains the frequently occurred errors during GKE cluster provision
Scenario 1: Invalid credentials/project details¶
The below error is an example that might occur at the time of provisioning of a GKE cluster without enabling Compute Engine API in the newly created GCP project
Validation¶
To overcome this issue, perform the below validations for instance types in a region:
- Ensure the credentials are valid via the controller in the Cloud Credentials page
- Rectify the project name created in the GCP console
Scenario 2: Cluster's control plane IP range¶
The below error is an example that might occur when the cluster's control plane IP range is not 28-bit.
Validation¶
- On setting the cluster privacy to Private, specify the control plane IP range of 28-bit
- Cross-verify the Control plane IP range field and specify CIDR of a 28 bit. This field indicates an internal IP address range for the control plane
Scenario 3: Invalid Region or Zone¶
The below error is an example that might occur when providing an invalid region or zone details
Validation¶
Edit and rectify the region and zone details. Ensure to specify valid zones in the chosen region
Scenario 4: Mismatch Between GCP Reservation and Requested Cluster¶
The following failure error occurs with a warning when there is a mismatch between the GCP Reservation, which is in a different zone, and the requested cluster, with the location type specified as zonal in a different zone.
Validation¶
- Review the specified zones for the GCP Reservation and the requested cluster
- Adjust either the GCP Reservation or the cluster's specified location to ensure they are in the same zone
- Ensure consistency in the specified zones to prevent the mismatch error
Scenario 5: Insufficient Capacity in GCP Reservation¶
Also, the below failure error occurs with a warning when the capacity of VMs in the GCP reservation is insufficient to meet the requested number of nodes in a node pool.
Validation¶
- Increase the capacity of VMs in the GCP Reservation to accommodate the requested number of nodes
- Review and adjust the configurations of the node pool, ensuring it aligns with the available capacity in the GCP Reservation
- Consider optimizing the usage of resources or upgrading the GCP Reservation for increased capacity
Scenario 6: Provisioning Halting at 'Cluster Control Plane Ready' Phase¶
This error occurs when the GKE Cluster Provisioning state becomes unresponsive and remains stuck at the 'Cluster Control Plane Ready' phase. This indicates that the target cluster has been established on GKE. However, the controller is currently awaiting feedback to confirm the readiness of the Control Plane on the end cluster.
Validation¶
When creating a private GKE Cluster, - ensure 'Access Control Plane ExternalIP' is disabled, and 'Control Plane Authorized Networks' is enabled - provide a CIDR with all IPs in that range requiring access to your private cluster
Scenario 7: Initialization of Cluster Provider Infrastructure Stage¶
The below error occurs when the infra-agent installed on the bootstrap VM is unable to establish a connection back to the controller.
Validation¶
- Ensure the successful installation of the Infra-Agent on the GKE bootstrap VM by checking logs at
/var/log/infra_agent.log
- If the agent is not installed, review Google startup script logs and, if necessary, create a Google Cloud NAT for internet connectivity
- If the agent is installed, address specific errors in the logs, such as expired certificates, to maintain proper functionality
- Regularly monitor and troubleshoot issues to ensure a healthy connection between the Infra-Agent and the controller
Scenario 8: Provisioning GKE Cluster with Shared VPC¶
The following error occurs when service projects are not attached to the host project during the creation of a GKE cluster with a shared VPC
Validation¶
Attach the service project to the Host project and confirm it is attached correctly as given below
$ gcloud compute shared-vpc associated-projects add <serviceproject_id> --host-project <hostproject_id>
Updated [https://www.googleapis.com/compute/v1/projects/<hostproject_id>].
$ gcloud compute shared-vpc list-associated-resources <hostproject_id>
RESOURCE_ID RESOURCE_TYPE
demoproject-1234 PROJECT
defaultproject-5678 PROJECT
Scenario 9: Kubernetes Engine Host Service Agent User Role to IAM Permissions¶
The following error occurs when the Kubernetes Engine Host Service Agent user role is not added to the IAM permissions:
Validation¶
- In the host project, navigate to IAM and click the checkbox at the top right corner labeled "Include Google-provided role grants
- Add the role Kubernetes Engine Host Service Agent User to grant permission for the Principal of the Service project in the host project