Skip to content

Troubleshooting

This section explains the frequently occurred errors during cluster provision


Scenario 1: Valid Cloud Credential

The below error is an example that might occur when an invalid cloud credential is selected during the cluster provisioning

Error 1

Validation

Ensure to provide valid tenant ID, Client ID, subscription ID and secret while creating the cloud credential


Scenario 2: Resource Group existence

The below error is an example that might occur when an resource group does not exist in azure cloud

Error 2

Validation

To overcome this issue, create one resource group in the Azure Cloud console or through CLI


Scenario 3: Role Permission

The below error is an example that might occur when the user does not have permission to perform the action ’Microsoft.Authorization/roleAssignments/write

error while getting guture deployment, Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"BadRequest","message":"{\r\n \"error\": {\r\n \"code\": \"InvalidTemplateDeployment\",\r\n \"message\": \"The template deployment failed with error: 'Authorization failed for template resource 'shobhit-azure-1/Microsoft.Authorization/0be8b9b7-15ce-5aa4-98af-4fdb8424f279' of type 'Microsoft.ContainerService/managedClusters/providers/roleAssignments'. The client 'de893500-3bd0-40a1-9773-8cb226b084de' with object id 'de893500-3bd0-40a1-9773-8cb226b084de' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/a2252eb2-7a25-432b-a5ec-e18eba6f26b1/resourceGroups/shobhit-central-playground/providers/Microsoft.ContainerService/managedClusters/shobhit-azure-1/providers/Microsoft.Authorization/roleAssignments/0be8b9b7-15ce-5aa4-98af-4fdb8424f279'.'.\"\r\n }\r\n}"}]

Validation

The Service Principal must have the basic contributor role permission across the subscription


Scenario 4: PVC Mount Failure with Customer Managed Encryption Keys

The below error occurs when using Customer-Managed Keys (CMK) for volume encryption in an AKS cluster, and the identity associated with the node pool lacks the required permissions on the diskEncryptionSets:

AttachVolume.Attach failed for volume "pvc-..." :
rpc error: code = Internal desc = Attach volume ... failed with
HTTPStatusCode: 403, RawError: {"error":{"code":"LinkedAuthorizationFailed",
"message":"The client '...' with object id '...' has permission to perform action
'Microsoft.Compute/virtualMachineScaleSets/virtualMachines/write' but does not have
permission to perform action(s) 'Microsoft.Compute/diskEncryptionSets/read' on the
linked scope ..."}}

Root Cause

This error occurs when the User Assigned Managed Identity (UAMI) of the AKS node pool does not have the Microsoft.Compute/diskEncryptionSets/read permission on the diskEncryptionSets. As a result, attaching an encrypted disk fails due to authorization issues.

Solution

To avoid this issue when enabling CMK for persistent volumes in AKS:

  • Pre-create the required User Assigned Managed Identities
  • Assign the Microsoft.Compute/diskEncryptionSets/read role on the Disk Encryption Set
  • Reference the Disk Encryption Set ID during cluster creation

Required Components

  • Control Plane Identity: Grants the control plane access to Azure resources
  • Kubelet Identity: Allows worker nodes to interact with Azure services (e.g., pulling images, accessing Key Vault)
  • Disk Encryption Set: Defines the customer-managed encryption key used for disk encryption

Note: Ensure that all user-assigned identities are created ahead of time and are granted the required permissions to the diskEncryptionSets. The Disk Encryption Set ID must be referenced during cluster creation to enable CMK-backed PVC support. For more information, refer to Encrypt AKS cluster data disk with customer-managed keys

When creating an AKS cluster using Terraform, make sure to configure user-assigned identities and assign the necessary permissions.

Example: Terraform Configuration

resource "rafay_aks_cluster" "example_aks_cluster" {
  apiversion = "rafay.io/v1alpha1"
  kind       = "Cluster"

  metadata {
    name    = "example-aks-cluster"
    project = "<your-project-name>"
  }

  spec {
    type          = "aks"
    blueprint     = "<your-blueprint-name>"
    cloudprovider = "<your-cloud-provider-name>"

    cluster_config {
      apiversion = "rafay.io/v1alpha1"
      kind       = "aksClusterConfig"

      metadata {
        name = "example-aks-cluster"
      }

      spec {
        resource_group_name = "<your-resource-group>"

        managed_cluster {
          apiversion = "2022-07-01"
          location   = "<azure-region>"

          identity {
            type = "UserAssigned"
            user_assigned_identities = {
              "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<control-plane-identity>" = "{}"
            }
          }

          properties {
            api_server_access_profile {
              enable_private_cluster = true
            }

            dns_prefix         = "<your-dns-prefix>"
            kubernetes_version = "<k8s-version>"

            network_profile {
              network_plugin = "kubenet"
            }

            identity_profile {
              kubelet_identity {
                resource_id = "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<kubelet-identity>"
              }
            }

            disk_encryption_set_id = "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.Compute/diskEncryptionSets/<disk-encryption-set>"
            enable_rbac            = true
          }

          type = "Microsoft.ContainerService/managedClusters"
        }

        node_pools {
          apiversion = "2022-07-01"
          name       = "system-nodepool"
          location   = "<azure-region>"

          properties {
            count                 = 2
            enable_auto_scaling  = true
            max_count             = 3
            min_count             = 1
            max_pods              = 40
            mode                  = "System"
            orchestrator_version  = "<k8s-version>"
            os_type               = "Linux"
            type                  = "VirtualMachineScaleSets"
            vm_size               = "<vm-size>"
          }

          type = "Microsoft.ContainerService/managedClusters/agentPools"
        }
      }
    }
  }
}