Skip to content

Dynamic Resource Allocation for GPU Allocation on Rafay's MKS (Kubernetes 1.34)

This blog demonstrates how to leverage Dynamic Resource Allocation (DRA) for efficient GPU allocation using Multi-Instance GPU (MIG) strategy on Rafay's Managed Kubernetes Service (MKS) running Kubernetes 1.34.

In our previous blog series, we covered various aspects of Dynamic Resource Allocation (DRA) in Kubernetes:

DRA is GA in Kubernetes 1.34

With Kubernetes 1.34, Dynamic Resource Allocation (DRA) is Generally Available (GA) and enabled by default on MKS clusters. This means you can immediately start using DRA features without additional configuration.

Prerequisites

Before we begin, ensure you have:

  • A Rafay MKS cluster running Kubernetes 1.34 (see MKS v1.34 Blog)
  • GPU nodes with compatible NVIDIA GPUs (A100, H100, or similar MIG-capable GPUs)
  • Container Device Interface (CDI) enabled (automatically enabled in MKS for Kubernetes 1.34)
  • Basic understanding of Dynamic Resource Allocation concepts (covered in our previous blog series)
  • Active Rafay account with appropriate permissions to manage MKS clusters and addons

Setting Up DRA and GPU Support on MKS

This section walks you through installing the necessary components to enable DRA for GPU allocation on your MKS cluster. We'll use Rafay's blueprint workflow to deploy both the DRA driver and GPU operator.

Understanding the Components

Before we begin, let's understand what we're installing:

  • DRA Driver: Enables dynamic resource allocation for GPUs
  • GPU Operator: Manages NVIDIA GPU drivers and MIG configuration
  • Blueprint: Rafay's way to manage cluster addons

Current DRA Limitations

The current DRA driver implementation doesn't include all GPU functionality. Advanced features like MIG management require the GPU operator to be deployed separately. This will be integrated in future DRA releases.

Step 1: Create Addons for DRA and GPU Support

  1. Navigate to Infrastructure > Addons in the Rafay console
  2. Create the DRA Driver Addon:
  3. Search for "dra-driver" in the catalog
  4. This addon provides the core DRA functionality for resource allocation

DRA Catalog

  1. Create the GPU Operator Addon:
  2. Search for "gpu-operator" in the catalog
  3. This addon manages NVIDIA GPU drivers and MIG configuration

GPU Operator Catalog

  1. Review and confirm both addons are ready for deployment

Addons List

Step 2: Create a Blueprint

  1. Navigate to Infrastructure > Blueprints
  2. Create a new blueprint that includes both addons:
  3. DRA Driver addon
  4. GPU Operator addon

Blueprint Configuration

Step 3: Apply Blueprint to MKS Cluster

  1. Navigate to Infrastructure > Clusters
  2. Select your MKS cluster and click the gear icon
  3. Choose "Update Blueprint" from the dropdown menu
  4. Select the blueprint you created in Step 2
  5. Apply the blueprint to deploy the addons

Cluster Blueprint Update

Step 4: Configure Node Labels for DRA Driver

The DRA driver has specific node affinity requirements that determine which nodes it will be deployed to. By default, the DRA kubelet plugin will only be scheduled on nodes that meet one of these criteria:

Default Node Affinity Requirements:

  • Nodes with feature.node.kubernetes.io/pci-10de.present=true (NVIDIA PCI vendor ID detected by NFD)
  • Nodes with feature.node.kubernetes.io/cpu-model.vendor_id=NVIDIA (Tegra-based systems)
  • Nodes with nvidia.com/gpu.present=true (manually labeled GPU nodes)

For this blog, we'll manually label our GPU node:

  1. Identify your GPU node:

    kubectl get nodes
    

  2. Add the required label to your GPU node:

    kubectl label node <gpu-node-name> nvidia.com/gpu.present=true
    

  3. Verify the label was applied:

    kubectl get nodes --show-labels | grep nvidia.com/gpu.present
    

Step 5: Verify DRA Driver Installation

After the blueprint is applied and nodes are properly labeled, verify that the DRA components are running correctly:

  1. Access your cluster using ZTKA (Zero Trust Kubectl Access)
  2. Check DRA controller pods:
    kubectl get pods -n kube-system | grep dra
    
    You should see two main DRA components:
  3. dra-controller: Manages resource allocation requests
  4. dra-kubelet-plugin: Handles resource allocation on each node

DRA Driver Pods

  1. Verify DRA resources are available:
    kubectl get deviceclass
    kubectl get resourceslices
    
    This confirms that DRA has created the necessary device classes and resource slices for GPU allocation.

DRA Device Classes

Configuring Multi-Instance GPU (MIG)

Before deploying workloads, we need to configure MIG on our GPU nodes. MIG allows us to split a single GPU into multiple smaller instances for better resource utilization.

Step 6: Enable MIG Configuration

  1. Label your GPU nodes to enable MIG with the "all-balanced" configuration:

    kubectl label nodes <node-name> nvidia.com/mig.config=all-balanced
    

  2. Verify MIG is enabled:

    kubectl get nodes -l nvidia.com/mig.config=all-balanced
    

Static MIG Configuration

The current DRA driver doesn't support dynamic MIG configuration. You need to statically configure MIG devices using the GPU operator, similar to the traditional device plugin approach.

  1. Check available MIG instances:
    kubectl describe node <node-name> | grep nvidia.com/mig
    

Deploying Workloads with DRA and MIG

Now let's deploy a workload that demonstrates how to use DRA with MIG for efficient GPU resource allocation.

Example: Multi-Container Pod with MIG Allocation

This example demonstrates how to create a single pod with multiple containers, each requesting different MIG instances from the same physical GPU. This showcases the power of DRA + MIG for efficient resource sharing.

What this example does: - Creates 4 containers in a single pod - Each container requests a different MIG profile - All containers share the same physical GPU through MIG - Uses ResourceClaimTemplate for declarative resource allocation

---
# ResourceClaimTemplate defines the GPU resources we want to allocate
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  namespace: gpu-test4
  name: mig-devices
spec:
  spec:
    devices:
      requests:
      - name: mig-1g-5gb-0
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
      - name: mig-1g-5gb-1
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
      - name: mig-2g-10gb
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '2g.10gb'"
      - name: mig-3g-20gb
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"

      constraints:
      - requests: []
        matchAttribute: "gpu.nvidia.com/parentUUID"

---
# Deployment with multiple containers sharing GPU resources
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: gpu-test4
  name: gpu-workload
  labels:
    app: gpu-workload
spec:
  replicas: 2
  selector:
    matchLabels:
      app: gpu-workload
  template:
    metadata:
      labels:
        app: gpu-workload
    spec:
      resourceClaims:
      - name: mig-devices
        resourceClaimTemplateName: mig-devices
      containers:
      - name: small-workload-1
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: mig-devices
            request: mig-1g-5gb-0
      - name: small-workload-2
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: mig-devices
            request: mig-1g-5gb-1
      - name: medium-workload
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: mig-devices
            request: mig-2g-10gb
      - name: large-workload
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: mig-devices
            request: mig-3g-20gb
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"

Deploy the Workload

  1. Create the namespace:

    kubectl create namespace gpu-test4
    

  2. Apply the configuration:

    kubectl apply -f gpu-workload.yaml
    

  3. Verify the deployment:

    kubectl get pods -n gpu-test4
    kubectl get resourceclaims -n gpu-test4
    

Resource Claims

Running Pods

Verifying GPU Allocation

After deploying your workload, it's important to verify that GPU resources are properly allocated and accessible to your containers.

Step 7: Verify Resource Claims

  1. Check ResourceClaims status:
    kubectl get resourceclaims -n gpu-test4
    

Look for the Resource Claims section to confirm GPU allocation.

Pod Resource Allocation

Step 8: Test GPU Access in Containers

Execute into a container to test GPU access:

kubectl exec -it -n gpu-test4 <pod-name> -c small-workload-1 -- nvidia-smi

GPU Device Access

Troubleshooting

DRA Kubelet Plugin Not Scheduled

If you don't see the DRA kubelet plugin running on your GPU nodes, check that the node has the required label:

# Check if the label is present
kubectl get nodes --show-labels | grep nvidia.com/gpu.present

# If missing, add the label
kubectl label node <gpu-node-name> nvidia.com/gpu.present=true

The DRA kubelet plugin requires one of these node labels to be scheduled: - nvidia.com/gpu.present=true (manually added) - feature.node.kubernetes.io/pci-10de.present=true (detected by NFD) - feature.node.kubernetes.io/cpu-model.vendor_id=NVIDIA (Tegra systems)

Best Practices

Resource Planning

  1. Plan MIG profiles based on your workload requirements
  2. Monitor GPU utilization to optimize MIG configurations
  3. Use appropriate tolerations for GPU nodes

Conclusion

This blog demonstrated how to leverage Dynamic Resource Allocation (DRA) with Multi-Instance GPU (MIG) on Rafay's MKS clusters running Kubernetes 1.34.

Additional Resources

With DRA now GA in Kubernetes 1.34 and available on MKS, you can start implementing more efficient GPU resource management strategies for your AI/ML workloads today!