Skip to content

2026

When Do You Need an NVIDIA AI Enterprise License with GPU Virtualization?

If you're deploying NVIDIA GPUs in a virtualized or cloud-native environment, understanding NVIDIA AI Enterprise (NVAIE) licensing can save you from unexpected compliance issues — or unnecessary spending. The rules vary significantly depending on how your GPUs are deployed. Here's a practical breakdown.


What Is NVIDIA AI Enterprise?

NVIDIA AI Enterprise (NVAIE) is NVIDIA's end-to-end software platform for AI workloads, covering everything from GPU drivers and Kubernetes operators to NIM microservices, NeMo, Triton Inference Server, and RAPIDS. It's licensed on a per-GPU basis and available as an annual subscription or perpetual license with 5-year support.

The key thing to understand: NVAIE licensing is about software access and support, not raw GPU compute. The GPU itself will generally work without a license — but your access to enterprise-grade software, NGC containers, and SLA-backed support depends on it.


Deployment Models and When a License Is Required

1. Bare Metal (No Hypervisor)

License required? Conditionally.

On bare metal, the NVIDIA driver works without an NVAIE license for standard CUDA workloads. However, a license is required to: - Access NVAIE-gated NGC containers (NIM microservices, enterprise frameworks) - Use the vGPU for Compute guest driver for licensing enforcement - Receive enterprise support with SLA

Some GPUs bundle NVAIE automatically. Each NVIDIA H100 PCIe or NVL and H200 NVL GPU includes a 5-year NVAIE subscription that activates with the GPU serial number. Notably, Blackwell (B200/B300) DGX systems do not include NVAIE — licenses must be purchased separately.


2. GPU Passthrough (PCIe Passthrough / VFIO)

License required? Same as bare metal for the guest VM.

In GPU passthrough, the entire physical GPU is assigned to a single VM. The guest VM behaves identically to a bare metal system from a driver perspective. NVAIE licensing requirements inside the VM mirror bare metal: - Standard CUDA compute works without a license - Access to NVAIE NGC software and NIM microservices requires a license - One license per physical GPU passed through


3. NVIDIA vGPU for Compute (Time-Sliced / MIG-backed vGPU)

License required? Yes — software enforced.

This is where licensing is strictly enforced. NVIDIA vGPU for Compute is licensed exclusively through NVAIE. One license is required per vGPU instance assigned to a VM.

How enforcement works: - When a vGPU VM boots on a supported GPU, it initially operates at full capability - If it fails to obtain a license from the NVIDIA License System (NLS), performance degrades over time - The license is checked out from either a Cloud License Server (CLS) or a Delegated License Server (DLS) for air-gapped environments

vGPU for Compute supports up to 16 vGPU instances per physical GPU, with each requiring its own license. It enables advanced capabilities not available in passthrough: live migration, suspend/resume, warm updates, and multi-tenant GPU sharing.


4. Cloud Deployments (AWS, Azure, GCP, OCI)

License required? Depends on the model.

  • Pay-as-you-go via CSP marketplace: License is included in the hourly per-GPU price. No separate purchase needed.
  • BYOL (Bring Your Own License): Use an on-prem NVAIE license on a certified cloud instance. One license per GPU, checked out from NLS.
  • H100 / H200 NVL instances: May include NVAIE bundled depending on the cloud provider's offering.

5. MIG (Multi-Instance GPU)

License required? Depends on the underlying deployment.

MIG allows a single physical GPU (A100, H100, H200, B200) to be partitioned into up to 7 isolated GPU instances, each with dedicated memory and compute. From a licensing perspective, MIG doesn't change the fundamental rules — what matters is how the host is deployed:

  • MIG on bare metal — same as bare metal, license required for NVAIE software access
  • MIG on vGPU (time-sliced) — each vGPU instance backed by a MIG slice still requires an NVAIE license per vGPU
  • MIG on passthrough — the VM receives a MIG-partitioned GPU, same rules as passthrough

MIG on Kubernetes has an additional operational consideration: the NVIDIA GPU Operator's MIG Manager handles reconfiguration automatically by watching the nvidia.com/mig.config node label. When the label changes, MIG Manager stops GPU pods, applies the new MIG geometry, and restarts them. One license is still required per physical GPU regardless of how many MIG slices are created.


6. Kubernetes (Bare Metal or VMs)

License required? Conditionally, same rules as the underlying deployment.

The NVIDIA GPU Operator automates driver and plugin deployment on Kubernetes. If your K8s nodes are: - Bare metal — NVAIE license needed for NGC-gated software, not for basic CUDA - vGPU-backed VMs — NVAIE license required per vGPU, software enforced - Passthrough VMs — same as bare metal inside the VM

The GPU Operator can also manage license token distribution automatically when configured with NLS credentials.


Quick Reference

Deployment Model License Enforced? When Required
Bare metal (CUDA only) No Optional — for NGC software & support
Bare metal (NVAIE software) No (EULA) Required for NGC access & SLA
GPU Passthrough VM No Same as bare metal in guest
vGPU for Compute Yes (software) Always — per vGPU instance
Cloud pay-as-you-go Included No separate purchase
Cloud BYOL Yes Per GPU on instance
H100/H200 PCIe or NVL Bundled 5-year subscription included
B200/B300 (Blackwell) Not bundled Must purchase separately
GH200 / GB200 Bare metal only vGPU not supported

License Server Options

Cloud License Server (CLS): Hosted by NVIDIA, requires internet connectivity. Easiest to set up.

Delegated License Server (DLS): On-premises, air-gapped environments. You deploy and manage it. Required for secure/sovereign deployments.

Both are managed through the NVIDIA License System portal at nvid.nvidia.com.


Key Takeaways

  1. vGPU for Compute always requires a license — it's software enforced and performance degrades without one.
  2. GPU passthrough and bare metal only require a license for NVAIE software access, not for basic CUDA functionality.
  3. H100 and H200 NVL GPUs bundle 5 years of NVAIE — Blackwell (B200/B300) does not.
  4. GH200 and GB200 are bare metal only — vGPU is not supported on these platforms.
  5. HGX systems only support full board passthrough to a single VM — no partial passthrough.
  6. For production AI workloads, the NVAIE license is almost always worth it — it unlocks NIM microservices, enterprise support SLAs, security patches, and the full NGC catalog.

Accelerating the AI Factory: Rafay & NVIDIA NCX Infra Controller (NICo)

Acquiring GPU hardware is the easy part. Turning it into a productive, multi-tenant AI service with proper isolation, self-service provisioning, and the governance to operate it at scale is where most get stuck. Custom integration work piles up, timelines slip, and the gap between racked hardware and revenue widens.

Rafay is closing that gap through a new integration with the NVIDIA NCX Infrastructure Controller (NICo), NVIDIA's open-source component for automated bare-metal lifecycle management. Together, Rafay and NICo give operators a unified platform to manage their GPU fleet to deliver cloud-like, self-service experiences to end users.

How Rafay and NVIDIA Help Neoclouds Monetize Accelerated Computing with Token Factories

The AI boom has created an unprecedented demand for GPUs. In response, a new generation of GPU-first cloud providers purpose-built for AI workloads—known as neoclouds—has emerged to deliver the AI infrastructure needed to power AI applications.

However, a critical shift is happening in the market. Selling raw GPU infrastructure is no longer enough. The real opportunity lies in turning GPU capacity into AI services. Developers and enterprises don't want GPUs. They want models, APIs, and intelligence on demand.

With Rafay's Token Factory offering, Neoclouds can transform GPU clusters into a self-service AI platform that exposes models through token-metered APIs. The result is a marketplace where neoclouds monetize infrastructure, model developers reach users, and developers build applications, all on the same platform.

This is where Rafay and NVIDIA have come together to unlock a powerful new business model for AI infrastructure providers.

End User Portal Token Factory

Stop Paying for Resources Your Pods Don't Need

If you manage Kubernetes infrastructure at scale, you already know the pattern. Development teams request CPU and memory "just to be safe." Nobody wants their app to OOM. Nobody wants to get paged at 2am because a pod got throttled. So requests get padded and they stay padded.

The result? Clusters are full of pods consuming far less than what they've been allocated. Nodes are running hot on paper but idle in practice. And the platform team responsible for cost governance across dozens of clusters, projects, and namespaces has no easy way to prove it.

Scaling Trust: The Fortanix and Rafay Integration for Enterprise Confidential AI

In the modern enterprise, Artificial Intelligence (AI) has moved from a "nice-to-have" experimental phase to a core business driver. However, for organizations in highly regulated sectors—such as banking, healthcare, and government—the path to AI adoption is fraught with security hurdles.

The primary concern is protecting sensitive data not just at rest or in transit, but in use. In the image below, the app uses a proprietary model which needs to be secured using confidential computing.

Confidential VM

Traditional security measures often fall short when data must be decrypted to be processed by an AI model. This is where Confidential Computing changes the game, and why the joint integration between Fortanix and Rafay is a landmark development for the "AI Factory" of the future.

NVIDIA AICR Generates It. Rafay Runs It. Your GPU Clusters, Finally Under Control

Deploying GPU-accelerated Kubernetes infrastructure for AI workloads has never been simple. Administrators face a relentless compatibility matrix i.e. matching GPU driver versions to CUDA releases, pinning Kubernetes versions to container runtimes, tuning configurations differently for NVIDIA H100s versus A100s, and doing all of it differently again for training versus inference.

One wrong version combination and workloads fail silently, or worse, perform far below hardware capability. For years, the answer was static documentation, tribal knowledge, and hoping that whoever wrote the runbook last week remembered to update it.

NVIDIA's AI Cluster Runtime (AICR) and the Rafay Platform represent a new approach — one where GPU infrastructure configuration is treated as code, generated deterministically, validated against real hardware, and enforced continuously across fleets of clusters.

Together, they cover the full lifecycle from first aicr snapshot to production-grade day-2 operations, with cluster blueprints as the critical bridge between the two.

Baton Pass

From Slurm to Kubernetes: A Guide for HPC Users

If you've spent years submitting batch jobs with Slurm, moving to a Kubernetes-based cluster can feel like learning a new language. The concepts are familiar — resource requests, job queues, priorities — but the vocabulary and tooling are different. This guide bridges that gap, helping HPC veterans understand how Kubernetes handles workloads and what that means day-to-day.

SLurm to k8s

Run nvidia-smi on Remote GPU Kubernetes Clusters Using Rafay Zero Trust Access

Infra operators managing GPU-enabled Kubernetes clusters often need a fast and secure way to validate GPU visibility, driver health, and runtime readiness without exposing the cluster directly or relying on bastion hosts, VPNs, or manually managed kubeconfigs.

With Rafay's zero trust kubectl, operators can securely access remote Kubernetes resources and execute commands inside running pods from the Rafay platform. A simple but powerful example is running nvidia-smi inside a GPU Operator pod to confirm that the NVIDIA driver stack, CUDA runtime, and GPU devices are functioning correctly on a remote cluster.

In this post, we walk through how infra operators can use Rafay's zero trust access workflow to run nvidia-smi on a remote GPU-based Kubernetes cluster.

Nvidia SMI over ZTKA

How Rafay Helps GPU Clouds Run Complex Hackathons at Scale

Running a hackathon is hard. Running a GPU-powered hackathon for thousands of participants — where every developer needs a fully configured environment (notebooks, developer pod etc) with dedicated GPU resources, ready to go the moment the event kicks off — is an entirely different class of problem. This is exactly where Rafay's platform has helped change the game for GPU Cloud providers.

GuardDuty

Interact with Your Rafay Managed Kubernetes Clusters Using MCP-compatible AI clients

The Model Context Protocol (MCP) is an open standard that enables AI assistants to securely interact with external tools and systems. When used with Kubernetes, MCP allows an AI assistant to execute operations (for example, kubectl commands), retrieve live cluster state, and reason about results without requiring users to manually copy and paste output into a chat interface.

This blog uses Claude Desktop as an example AI assistant. The same approach applies to any MCP-compatible AI client.

For platform administrators, this capability enables controlled, auditable, and policy-driven AI-assisted cluster operations.


For production environments, the recommended approach is to run the MCP server locally and connect to your Kubernetes cluster using a Rafay Zero Trust Kubectl Access (ZTKA) kubeconfig.

In this model:

  • The MCP server runs on the administrator’s workstation
  • Cluster access is established through Rafay’s ZTKA secure relay
  • No inbound access to the cluster is required
  • No VPN tunnels or exposed Kubernetes API endpoints are needed

This architecture aligns with zero-trust security principles and enterprise compliance requirements.