Skip to content

Custom GPU Resource Classes in Kubernetes

In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like

nvidia.com/gpu: 1

In this blog post, we will explore what custom GPU resource classes are, why they matter, and when to use them for maximum impact. Custom GPU resource classes are a powerful technique for fine-grained GPU management in multi-tenant, cost-sensitive, and performance-critical environments.

Info

If you are new to GPU sharing approaches, we recommend reading the following introductory blogs: Demystifying Fractional GPUs in Kubernetes and Choosing the Right Fractional GPU Strategy.


What Are Custom GPU Resource Classes?

By default, Kubernetes exposes GPUs through a single resource name: nvidia.com/gpu. As an end user, you have no idea how the underlying GPU is setup and configured. For example, the GPU type you will use may fall into one of the following:

  1. Full exclusive GPUs
  2. Time-sliced shared GPUs
  3. MIG (Multi-Instance GPU) slices
  4. Fractional (e.g., ¼) allocations

Custom resource classes allow administrators to define new GPU resource names that are more obvious and apparent for users. These names are configured by the GPU device plugin (typically via the NVIDIA GPU Operator) and allow you to expose multiple logical GPU types from the same physical hardware.

Some examples are shown below.

1. nvidia.com/gpu-time-slice

As the custom resource class name suggests, this is a time sliced GPU

2. nvidia.com/gpu-mig-1g.5gb

As the custom resource class name suggests, this is a MIG GPU instance with 1g.5gb of memory.

3. nvidia.com/gpu-fraction-0.25

As the custom resource class name suggests, this is a fractional (0.25) GPU


Why Custom Resource Classes Matter?

We sometimes get asked by customers as to why does custom resource classes matter? Here are some common reasons we can think of:

Better Scheduling and Workload Matching

Different workloads can have vastly different GPU requirements. For example,

  • Dev notebooks or small inference tasks only need a fraction of a GPU.
  • Real-time inference needs isolated and predictable performance.
  • Training jobs require full, exclusive access.

Custom classes can help align GPU access mode with application intent, improving performance and minimizing waste.


Enabling Multi-Tenancy

In shared environments—such as internal ML platforms, GPU clouds, or research clusters—custom classes allow administrators to achieve the following:

  • Partition GPU usage across teams
  • Enforce resource quotas per class
  • Prevent one user from monopolizing all full GPUs

This ensures fair access, cost visibility, and clear accountability.


Cost Optimization

As we all know, GPU costs add up quickly. Using full GPUs for lightweight jobs is inefficient. Custom classes enable the following:

  • Time-sliced sharing for low-duty jobs
  • MIG slices for sandbox or model testing
  • Fine-grained billing per resource type

By aligning consumption with actual needs, you reduce idle capacity and lower cloud or on-prem GPU costs.


Transparency and Observability

Custom resource names make GPU usage explicit in YAMLs and dashboards. For example, when you use the following YAML

resources:
  limits:
    nvidia.com/gpu-time-slice: 1

It tells the user (and the platform) exactly what type of resource is requested. This clarity supports better monitoring, debugging, and user education.


How to Set It Up?

Custom resource classes need to be defined in the NVIDIA GPU Operator’s Helm values.yaml file. You can use an override such as the following:

devicePlugin:
  config:
    custom-time-slicing:
      strategy: shared
      resources:
        - name: nvidia.com/gpu-time-slice
          replicas: 4

In this example, the configuration exposes each physical GPU as 4 logical time-sliced units. Users can then request for a time sliced unit with the following YAML.

resources:
  limits:
    nvidia.com/gpu-time-slice: 1

Conclusion

Custom GPU resource classes offer the flexibility, cost-efficiency, and isolation required for scalable and sustainable GPU operations in Kubernetes. Whether you’re a platform engineer, ML researcher, or infrastructure architect, adopting this pattern can dramatically improve your cluster’s GPU utilization and user experience.