Troubleshooting
This section explains the frequently occurred errors during cluster provision
Scenario 1: Valid Cloud Credential¶
The below error is an example that might occur when an invalid cloud credential is selected during the cluster provisioning
Validation¶
Ensure to provide valid tenant ID, Client ID, subscription ID and secret while creating the cloud credential
Scenario 2: Resource Group existence¶
The below error is an example that might occur when an resource group does not exist in azure cloud
Validation¶
To overcome this issue, create one resource group in the Azure Cloud console or through CLI
Scenario 3: Role Permission¶
The below error is an example that might occur when the user does not have permission to perform the action ’Microsoft.Authorization/roleAssignments/write
error while getting guture deployment, Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"BadRequest","message":"{\r\n \"error\": {\r\n \"code\": \"InvalidTemplateDeployment\",\r\n \"message\": \"The template deployment failed with error: 'Authorization failed for template resource 'shobhit-azure-1/Microsoft.Authorization/0be8b9b7-15ce-5aa4-98af-4fdb8424f279' of type 'Microsoft.ContainerService/managedClusters/providers/roleAssignments'. The client 'de893500-3bd0-40a1-9773-8cb226b084de' with object id 'de893500-3bd0-40a1-9773-8cb226b084de' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/a2252eb2-7a25-432b-a5ec-e18eba6f26b1/resourceGroups/shobhit-central-playground/providers/Microsoft.ContainerService/managedClusters/shobhit-azure-1/providers/Microsoft.Authorization/roleAssignments/0be8b9b7-15ce-5aa4-98af-4fdb8424f279'.'.\"\r\n }\r\n}"}]
Validation¶
The Service Principal must have the basic contributor role permission across the subscription
Scenario 4: PVC Mount Failure with Customer Managed Encryption Keys¶
The below error occurs when using Customer-Managed Keys (CMK) for volume encryption in an AKS cluster, and the identity associated with the node pool lacks the required permissions on the diskEncryptionSets
:
AttachVolume.Attach failed for volume "pvc-..." :
rpc error: code = Internal desc = Attach volume ... failed with
HTTPStatusCode: 403, RawError: {"error":{"code":"LinkedAuthorizationFailed",
"message":"The client '...' with object id '...' has permission to perform action
'Microsoft.Compute/virtualMachineScaleSets/virtualMachines/write' but does not have
permission to perform action(s) 'Microsoft.Compute/diskEncryptionSets/read' on the
linked scope ..."}}
Root Cause
This error occurs when the User Assigned Managed Identity (UAMI) of the AKS node pool does not have the Microsoft.Compute/diskEncryptionSets/read permission on the diskEncryptionSets
. As a result, attaching an encrypted disk fails due to authorization issues.
Solution
To avoid this issue when enabling CMK for persistent volumes in AKS:
- Pre-create the required User Assigned Managed Identities
- Assign the
Microsoft.Compute/diskEncryptionSets/read
role on the Disk Encryption Set - Reference the Disk Encryption Set ID during cluster creation
Required Components
- Control Plane Identity: Grants the control plane access to Azure resources
- Kubelet Identity: Allows worker nodes to interact with Azure services (e.g., pulling images, accessing Key Vault)
- Disk Encryption Set: Defines the customer-managed encryption key used for disk encryption
Note: Ensure that all user-assigned identities are created ahead of time and are granted the required permissions to the
diskEncryptionSets
. The Disk Encryption Set ID must be referenced during cluster creation to enable CMK-backed PVC support. For more information, refer to Encrypt AKS cluster data disk with customer-managed keys
When creating an AKS cluster using Terraform, make sure to configure user-assigned identities and assign the necessary permissions.
Example: Terraform Configuration
resource "rafay_aks_cluster" "example_aks_cluster" {
apiversion = "rafay.io/v1alpha1"
kind = "Cluster"
metadata {
name = "example-aks-cluster"
project = "<your-project-name>"
}
spec {
type = "aks"
blueprint = "<your-blueprint-name>"
cloudprovider = "<your-cloud-provider-name>"
cluster_config {
apiversion = "rafay.io/v1alpha1"
kind = "aksClusterConfig"
metadata {
name = "example-aks-cluster"
}
spec {
resource_group_name = "<your-resource-group>"
managed_cluster {
apiversion = "2022-07-01"
location = "<azure-region>"
identity {
type = "UserAssigned"
user_assigned_identities = {
"/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<control-plane-identity>" = "{}"
}
}
properties {
api_server_access_profile {
enable_private_cluster = true
}
dns_prefix = "<your-dns-prefix>"
kubernetes_version = "<k8s-version>"
network_profile {
network_plugin = "kubenet"
}
identity_profile {
kubelet_identity {
resource_id = "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<kubelet-identity>"
}
}
disk_encryption_set_id = "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.Compute/diskEncryptionSets/<disk-encryption-set>"
enable_rbac = true
}
type = "Microsoft.ContainerService/managedClusters"
}
node_pools {
apiversion = "2022-07-01"
name = "system-nodepool"
location = "<azure-region>"
properties {
count = 2
enable_auto_scaling = true
max_count = 3
min_count = 1
max_pods = 40
mode = "System"
orchestrator_version = "<k8s-version>"
os_type = "Linux"
type = "VirtualMachineScaleSets"
vm_size = "<vm-size>"
}
type = "Microsoft.ContainerService/managedClusters/agentPools"
}
}
}
}
}