Skip to content

Mohan Atreya

Instant Developer Pods: Rethinking GPU Access for AI Teams

It's the week of KubeCon Europe 2026 in Amsterdam. Much of the conversations will be about Kubernetes, AI and GPUs. Let's have a honest discussion.

We are in 2026 and we’re still handing out infrastructure like it’s 2008. The entire workflow is slow, expensive and wildly inefficient. Meanwhile, your most expensive resource—GPUs—sit idle or underutilized.

The way most enterprises deliver GPU access today is completely misaligned with how developers and data scientists actually work. A developer wants to:

  • Run a PyTorch experiment
  • Fine-tune a model
  • Test a pipeline

What do they get instead?

A ticketing system with a multi day wait time and then finally a bloated VM or an entire bare-metal GPU server

There has to be a better way. This is the first part of a blog series on Rafay's Developer Pods. In this, we will describe why and how many of our customers have completely transformed the way they deliver their end users with a self service experience to GPUs.

Dev Pod

How Rafay and NVIDIA Help Neoclouds Monetize Accelerated Computing with Token Factories

The AI boom has created an unprecedented demand for GPUs. In response, a new generation of GPU-first cloud providers purpose-built for AI workloads—known as neoclouds—has emerged to deliver the AI infrastructure needed to power AI applications.

However, a critical shift is happening in the market. Selling raw GPU infrastructure is no longer enough. The real opportunity lies in turning GPU capacity into AI services. Developers and enterprises don't want GPUs. They want models, APIs, and intelligence on demand.

With Rafay's Token Factory offering, Neoclouds can transform GPU clusters into a self-service AI platform that exposes models through token-metered APIs. The result is a marketplace where neoclouds monetize infrastructure, model developers reach users, and developers build applications, all on the same platform.

This is where Rafay and NVIDIA have come together to unlock a powerful new business model for AI infrastructure providers.

End User Portal Token Factory

Scaling Trust: The Fortanix and Rafay Integration for Enterprise Confidential AI

In the modern enterprise, Artificial Intelligence (AI) has moved from a "nice-to-have" experimental phase to a core business driver. However, for organizations in highly regulated sectors—such as banking, healthcare, and government—the path to AI adoption is fraught with security hurdles.

The primary concern is protecting sensitive data not just at rest or in transit, but in use. In the image below, the app uses a proprietary model which needs to be secured using confidential computing.

Confidential VM

Traditional security measures often fall short when data must be decrypted to be processed by an AI model. This is where Confidential Computing changes the game, and why the joint integration between Fortanix and Rafay is a landmark development for the "AI Factory" of the future.

NVIDIA AICR Generates It. Rafay Runs It. Your GPU Clusters, Finally Under Control

Deploying GPU-accelerated Kubernetes infrastructure for AI workloads has never been simple. Administrators face a relentless compatibility matrix i.e. matching GPU driver versions to CUDA releases, pinning Kubernetes versions to container runtimes, tuning configurations differently for NVIDIA H100s versus A100s, and doing all of it differently again for training versus inference.

One wrong version combination and workloads fail silently, or worse, perform far below hardware capability. For years, the answer was static documentation, tribal knowledge, and hoping that whoever wrote the runbook last week remembered to update it.

NVIDIA's AI Cluster Runtime (AICR) and the Rafay Platform represent a new approach — one where GPU infrastructure configuration is treated as code, generated deterministically, validated against real hardware, and enforced continuously across fleets of clusters.

Together, they cover the full lifecycle from first aicr snapshot to production-grade day-2 operations, with cluster blueprints as the critical bridge between the two.

Baton Pass

From Slurm to Kubernetes: A Guide for HPC Users

If you've spent years submitting batch jobs with Slurm, moving to a Kubernetes-based cluster can feel like learning a new language. The concepts are familiar — resource requests, job queues, priorities — but the vocabulary and tooling are different. This guide bridges that gap, helping HPC veterans understand how Kubernetes handles workloads and what that means day-to-day.

SLurm to k8s

Run nvidia-smi on Remote GPU Kubernetes Clusters Using Rafay Zero Trust Access

Infra operators managing GPU-enabled Kubernetes clusters often need a fast and secure way to validate GPU visibility, driver health, and runtime readiness without exposing the cluster directly or relying on bastion hosts, VPNs, or manually managed kubeconfigs.

With Rafay's zero trust kubectl, operators can securely access remote Kubernetes resources and execute commands inside running pods from the Rafay platform. A simple but powerful example is running nvidia-smi inside a GPU Operator pod to confirm that the NVIDIA driver stack, CUDA runtime, and GPU devices are functioning correctly on a remote cluster.

In this post, we walk through how infra operators can use Rafay's zero trust access workflow to run nvidia-smi on a remote GPU-based Kubernetes cluster.

Nvidia SMI over ZTKA

How Rafay Helps GPU Clouds Run Complex Hackathons at Scale

Running a hackathon is hard. Running a GPU-powered hackathon for thousands of participants — where every developer needs a fully configured environment (notebooks, developer pod etc) with dedicated GPU resources, ready to go the moment the event kicks off — is an entirely different class of problem. This is exactly where Rafay's platform has helped change the game for GPU Cloud providers.

GuardDuty

GPU Cloud Billing: From Usage Metering to Billing

Cloud providers building GPU or Neo Cloud services face a universal challenge: how to turn resource consumption into revenue with accuracy, automation, and operational efficiency. In our previous blog, we demonstrated how to programmatically retrieve usage data from Rafay’s Usage Metering APIs and generate structured CSVs for downstream processing in an external billing platform.

In this follow-up blog, we take the next step toward a complete billing workflow—automatically transforming usage into billable cost using SKU-specific pricing. With GPU clouds scaling faster than ever and enterprise AI workloads becoming increasingly dynamic, providers must ensure their billing engine is consistent, transparent, and tightly integrated with their platform. The enhancements described in this blog are designed exactly for that.

Architecture

Goodbye to Ingress NGINX – What Happens Next?

The Kubernetes community has officially started the countdown to retire Ingress NGINX, one of the most widely used ingress controllers in the ecosystem.

SIG Network and the Security Response Committee have announced that Ingress NGINX will move to best-effort maintenance until March 2026, after which there will be no new releases, no bug fixes, and no security updates. 

At the same time, the broader networking story in Kubernetes is evolving: Gateway API is now positioned as the successor to Ingress. In this blog, we describe why this is happening, when a replacement make sense, and how/when you should migrate.

How GPU Clouds Deliver NVIDIA Run:ai as Self-Service with Rafay GPU PaaS

As the demand for AI training and inference surges, GPU Clouds are increasingly looking to offer their users higher-level, turnkey AI services, not just raw GPU instances. Some customers may be familiar with NVIDIA Run:ai as an AI workload and GPU orchestration platform.

Delivering NVIDIA Run:ai as a scalable, repeatable managed service—something customers can select and provision with a few clicks—requires deep automation, lifecycle management, and tenant isolation capabilities. This is exactly what Rafay provides.

With Rafay, GPU Clouds, including NVIDIA Cloud Partners, can deliver NVIDIA Run:ai as a managed service with self-service provisioning, ensuring customers receive a fully configured NVIDIA Run:ai environment automatically, complete with GPU infrastructure, a Kubernetes cluster, necessary operators, and a ready-to-use NVIDIA Run:ai tenant. This post explains how Rafay enables cloud providers to industrialize NVIDIA Run:ai provisioning into a consistent, production-ready managed service.

Run:AI via Self Service