2026¶

March 23, 2026
in Product Blog, GPU, NVIDIA
4 min read

Flexible GPU Billing Models for Modern Cloud Providers — Powering the AI Factory with Rafay

The GPU cloud market is evolving fast. At NVIDIA GTC 2026, one theme rang loud and clear: enterprises are no longer experimenting with AI, they are committing to it at scale. Training frontier models, fine-tuning domain-specific LLMs, and running large-scale inference workloads on NVIDIA gear require sustained, predictable access to high-end GPU infrastructure. That kind of commitment demands a billing model to match.

If you are running a GPU cloud business, you already know that a simple pay-as-you-go model doesn't cut it anymore. Your enterprise customers want options and your ability to offer those options is a direct competitive advantage. That's where Rafay comes in.

March 22, 2026
in Product Blog, AI/ML, Developer Pods, Kubernetes, KubeCon EU 2026
6 min read

Developer Pods: A Self-Service GPU Experience That Feels Instant

In Part 1, we discussed the core problem: most organizations still deliver GPU access through the wrong abstraction. Developers do not want tickets, YAML, and long wait times. They want a working environment with the right tools and GPU access, available when they need it.

In this post, let’s look at the other half of the story: the end-user experience. Specifically, what does self-service actually look like for a developer or data scientist using Rafay Developer Pods?

The answer is simple: a familiar UI, a few guided choices, and a running environment they can SSH into in about 30 seconds.

New Developer Pod

March 21, 2026
in Product Blog, AI/ML, Developer Pods, Kubernetes, KubeCon EU 2026
4 min read

Instant Developer Pods: Rethinking GPU Access for AI Teams

It's the week of KubeCon Europe 2026 in Amsterdam. Much of the conversations will be about Kubernetes, AI and GPUs. Let's have a honest discussion.

We are in 2026 and we’re still handing out infrastructure like it’s 2008. The entire workflow is slow, expensive and wildly inefficient. Meanwhile, your most expensive resource—GPUs—sit idle or underutilized.

The way most enterprises deliver GPU access today is completely misaligned with how developers and data scientists actually work. A developer wants to:

Run a PyTorch experiment
Fine-tune a model
Test a pipeline

What do they get instead?

A ticketing system with a multi day wait time and then finally a bloated VM or an entire bare-metal GPU server

There has to be a better way. This is the first part of a blog series on Rafay's Developer Pods. In this, we will describe why and how many of our customers have completely transformed the way they deliver their end users with a self service experience to GPUs.

Dev Pod

March 20, 2026
in Product Blog, Kubernetes, Bare Metal
4 min read

No More SSH: Control Plane Overrides for Rafay MKS Clusters

Customizing a Kubernetes control plane has always been an uncomfortable exercise. You SSH into a master node, carefully edit a static pod manifest, and then hope nothing breaks. With our latest release, we are replacing that workflow entirely. Control Plane Overrides give you a safe, declarative way to customize the API Server, Controller Manager, and Scheduler for MKS (Managed Kubernetes Service) clusters — Rafay's upstream Kubernetes offering for bare metal and VMs — directly from the Rafay Console or cluster specification.

March 20, 2026
in NVAIE, GPU Virtualization
4 min read

When Do You Need an NVIDIA AI Enterprise License with GPU Virtualization?

If you're deploying NVIDIA GPUs in a virtualized or cloud-native environment, understanding NVIDIA AI Enterprise (NVAIE) licensing can save you from unexpected compliance issues — or unnecessary spending. The rules vary significantly depending on how your GPUs are deployed. Here's a practical breakdown.

What Is NVIDIA AI Enterprise?

NVIDIA AI Enterprise (NVAIE) is NVIDIA's end-to-end software platform for AI workloads, covering everything from GPU drivers and Kubernetes operators to NIM microservices, NeMo, Triton Inference Server, and RAPIDS. It's licensed on a per-GPU basis and available as an annual subscription or perpetual license with 5-year support.

The key thing to understand: NVAIE licensing is about software access and support, not raw GPU compute. The GPU itself will generally work without a license — but your access to enterprise-grade software, NGC containers, and SLA-backed support depends on it.

Deployment Models and When a License Is Required

1. Bare Metal (No Hypervisor)

License required? Conditionally.

On bare metal, the NVIDIA driver works without an NVAIE license for standard CUDA workloads. However, a license is required to: - Access NVAIE-gated NGC containers (NIM microservices, enterprise frameworks) - Use the vGPU for Compute guest driver for licensing enforcement - Receive enterprise support with SLA

Some GPUs bundle NVAIE automatically. Each NVIDIA H100 PCIe or NVL and H200 NVL GPU includes a 5-year NVAIE subscription that activates with the GPU serial number. Notably, Blackwell (B200/B300) DGX systems do not include NVAIE — licenses must be purchased separately.

2. GPU Passthrough (PCIe Passthrough / VFIO)

License required? Same as bare metal for the guest VM.

In GPU passthrough, the entire physical GPU is assigned to a single VM. The guest VM behaves identically to a bare metal system from a driver perspective. NVAIE licensing requirements inside the VM mirror bare metal: - Standard CUDA compute works without a license - Access to NVAIE NGC software and NIM microservices requires a license - One license per physical GPU passed through

3. NVIDIA vGPU for Compute (Time-Sliced / MIG-backed vGPU)

License required? Yes — software enforced.

This is where licensing is strictly enforced. NVIDIA vGPU for Compute is licensed exclusively through NVAIE. One license is required per vGPU instance assigned to a VM.

How enforcement works: - When a vGPU VM boots on a supported GPU, it initially operates at full capability - If it fails to obtain a license from the NVIDIA License System (NLS), performance degrades over time - The license is checked out from either a Cloud License Server (CLS) or a Delegated License Server (DLS) for air-gapped environments

vGPU for Compute supports up to 16 vGPU instances per physical GPU, with each requiring its own license. It enables advanced capabilities not available in passthrough: live migration, suspend/resume, warm updates, and multi-tenant GPU sharing.

4. Cloud Deployments (AWS, Azure, GCP, OCI)

License required? Depends on the model.

Pay-as-you-go via CSP marketplace: License is included in the hourly per-GPU price. No separate purchase needed.
BYOL (Bring Your Own License): Use an on-prem NVAIE license on a certified cloud instance. One license per GPU, checked out from NLS.
H100 / H200 NVL instances: May include NVAIE bundled depending on the cloud provider's offering.

5. MIG (Multi-Instance GPU)

License required? Depends on the underlying deployment.

MIG allows a single physical GPU (A100, H100, H200, B200) to be partitioned into up to 7 isolated GPU instances, each with dedicated memory and compute. From a licensing perspective, MIG doesn't change the fundamental rules — what matters is how the host is deployed:

MIG on bare metal — same as bare metal, license required for NVAIE software access
MIG on vGPU (time-sliced) — each vGPU instance backed by a MIG slice still requires an NVAIE license per vGPU
MIG on passthrough — the VM receives a MIG-partitioned GPU, same rules as passthrough

MIG on Kubernetes has an additional operational consideration: the NVIDIA GPU Operator's MIG Manager handles reconfiguration automatically by watching the nvidia.com/mig.config node label. When the label changes, MIG Manager stops GPU pods, applies the new MIG geometry, and restarts them. One license is still required per physical GPU regardless of how many MIG slices are created.

6. Kubernetes (Bare Metal or VMs)

License required? Conditionally, same rules as the underlying deployment.

The NVIDIA GPU Operator automates driver and plugin deployment on Kubernetes. If your K8s nodes are: - Bare metal — NVAIE license needed for NGC-gated software, not for basic CUDA - vGPU-backed VMs — NVAIE license required per vGPU, software enforced - Passthrough VMs — same as bare metal inside the VM

The GPU Operator can also manage license token distribution automatically when configured with NLS credentials.

Quick Reference

Deployment Model	License Enforced?	When Required
Bare metal (CUDA only)	No	Optional — for NGC software & support
Bare metal (NVAIE software)	No (EULA)	Required for NGC access & SLA
GPU Passthrough VM	No	Same as bare metal in guest
vGPU for Compute	Yes (software)	Always — per vGPU instance
Cloud pay-as-you-go	Included	No separate purchase
Cloud BYOL	Yes	Per GPU on instance
H100/H200 PCIe or NVL	Bundled	5-year subscription included
B200/B300 (Blackwell)	Not bundled	Must purchase separately
GH200 / GB200	Bare metal only	vGPU not supported

License Server Options

Cloud License Server (CLS): Hosted by NVIDIA, requires internet connectivity. Easiest to set up.

Delegated License Server (DLS): On-premises, air-gapped environments. You deploy and manage it. Required for secure/sovereign deployments.

Both are managed through the NVIDIA License System portal at nvid.nvidia.com.

Key Takeaways

vGPU for Compute always requires a license — it's software enforced and performance degrades without one.
GPU passthrough and bare metal only require a license for NVAIE software access, not for basic CUDA functionality.
H100 and H200 NVL GPUs bundle 5 years of NVAIE — Blackwell (B200/B300) does not.
GH200 and GB200 are bare metal only — vGPU is not supported on these platforms.
HGX systems only support full board passthrough to a single VM — no partial passthrough.
For production AI workloads, the NVAIE license is almost always worth it — it unlocks NIM microservices, enterprise support SLAs, security patches, and the full NGC catalog.

March 17, 2026
in Product Blog, GPU, NVIDIA, Bare Metal
3 min read

Accelerating the AI Factory: Rafay & NVIDIA NCX Infra Controller (NICo)

Acquiring GPU hardware is the easy part. Turning it into a productive, multi-tenant AI service with proper isolation, self-service provisioning, and the governance to operate it at scale is where most get stuck. Custom integration work piles up, timelines slip, and the gap between racked hardware and revenue widens.

Rafay is closing that gap through a new integration with the NVIDIA NCX Infrastructure Controller (NICo), NVIDIA's open-source component for automated bare-metal lifecycle management. Together, Rafay and NICo give operators a unified platform to manage their GPU fleet to deliver cloud-like, self-service experiences to end users.

March 17, 2026
in Token Factory, GPU
5 min read

How Rafay and NVIDIA Help Neoclouds Monetize Accelerated Computing with Token Factories

The AI boom has created an unprecedented demand for GPUs. In response, a new generation of GPU-first cloud providers purpose-built for AI workloads—known as neoclouds—has emerged to deliver the AI infrastructure needed to power AI applications.

However, a critical shift is happening in the market. Selling raw GPU infrastructure is no longer enough. The real opportunity lies in turning GPU capacity into AI services. Developers and enterprises don't want GPUs. They want models, APIs, and intelligence on demand.

With Rafay's Token Factory offering, Neoclouds can transform GPU clusters into a self-service AI platform that exposes models through token-metered APIs. The result is a marketplace where neoclouds monetize infrastructure, model developers reach users, and developers build applications, all on the same platform.

This is where Rafay and NVIDIA have come together to unlock a powerful new business model for AI infrastructure providers.

End User Portal Token Factory

March 16, 2026
in Product Blog, Cost Management, Pod Resizing
3 min read

Stop Paying for Resources Your Pods Don't Need

If you manage Kubernetes infrastructure at scale, you already know the pattern. Development teams request CPU and memory "just to be safe." Nobody wants their app to OOM. Nobody wants to get paged at 2am because a pod got throttled. So requests get padded and they stay padded.

The result? Clusters are full of pods consuming far less than what they've been allocated. Nodes are running hot on paper but idle in practice. And the platform team responsible for cost governance across dozens of clusters, projects, and namespaces has no easy way to prove it.

March 16, 2026
in Product Blog, Confidential Computing, Fortanix
4 min read

Scaling Trust: The Fortanix and Rafay Integration for Enterprise Confidential AI

In the modern enterprise, Artificial Intelligence (AI) has moved from a "nice-to-have" experimental phase to a core business driver. However, for organizations in highly regulated sectors—such as banking, healthcare, and government—the path to AI adoption is fraught with security hurdles.

The primary concern is protecting sensitive data not just at rest or in transit, but in use. In the image below, the app uses a proprietary model which needs to be secured using confidential computing.

Confidential VM

Traditional security measures often fall short when data must be decrypted to be processed by an AI model. This is where Confidential Computing changes the game, and why the joint integration between Fortanix and Rafay is a landmark development for the "AI Factory" of the future.

March 15, 2026
in GPU, NVIDIA, AICR, Kubernetes
6 min read

NVIDIA AICR Generates It. Rafay Runs It. Your GPU Clusters, Finally Under Control

Deploying GPU-accelerated Kubernetes infrastructure for AI workloads has never been simple. Administrators face a relentless compatibility matrix i.e. matching GPU driver versions to CUDA releases, pinning Kubernetes versions to container runtimes, tuning configurations differently for NVIDIA H100s versus A100s, and doing all of it differently again for training versus inference.

One wrong version combination and workloads fail silently, or worse, perform far below hardware capability. For years, the answer was static documentation, tribal knowledge, and hoping that whoever wrote the runbook last week remembered to update it.

NVIDIA's AI Cluster Runtime (AICR) and the Rafay Platform represent a new approach — one where GPU infrastructure configuration is treated as code, generated deterministically, validated against real hardware, and enforced continuously across fleets of clusters.

Together, they cover the full lifecycle from first aicr snapshot to production-grade day-2 operations, with cluster blueprints as the critical bridge between the two.

Baton Pass