How GPU Clouds can deliver Run:AI via Self Service using Rafay GPU PaaS¶

As the demand for AI training and inference surges, GPU Clouds are increasingly looking to offer higher-level, turnkey AI services—not just raw GPU instances. Some customers may be familiar with Run:AI from Nvidia as a AI workload orchestration and optimization platform. Delivering Run:AI as a scalable, repeatable SKU—something customers can select and provision with a few clicks—requires deep automation, lifecycle management, and tenant isolation capabilities. This is exactly what Rafay provides.

With Rafay, GPU Clouds can deliver Run:AI as a self-service SKU, ensuring customers receive a fully configured Run:AI environment—complete with GPU infrastructure, a Kubernetes cluster, necessary operators, and a ready-to-use Run:AI tenant—all deployed automatically. This blog explains how Rafay enables cloud providers to industrialize Run:AI provisioning into a consistent, production-ready SKU.

Rationale¶

For GPU Clouds, SKU-based managed services offer tremendous benefits:

Predictable, standardized offerings for customers
Reduced complexity, since the SKU hides all underlying infrastructure
Faster onboarding, enabling customers to begin using Run:AI in minutes
Higher margins, by offering value-added services instead of raw compute
Scalability, allowing dozens or hundreds of customers/tenants to onboard seamlessly

In short, turning Run:AI into a cloud SKU transforms it from a complex integration into a consumption-ready product. The experience begins in the GPU Cloud provider’s marketplace or self-service portal. Customers simply choose the Run:AI SKU, which can come in variants such as:

Run:AI Standard — 4 GPUs (e.g., L40S or A100)
Run:AI Enterprise — 8 GPUs (e.g., H100)
Multi-node Run:AI SKU (e.g., 2× H100 nodes)
Bare metal or VM-backed Infrastructure

Each SKU is pre-defined by the cloud provider and backed by Rafay which will perform sophisticated automation behind the scenes orchestrating required infrastucture, deploying and configuring required sofware etc. An illustrative example

Seamless Orchestration under the Covers¶

A number of steps are automatically performed underneath the covers once the user deploys the SKU. The sequence diagram below describes the high level steps.

Once deployment is complete, Rafay will present the user with the following details:

Run:AI Administrative Portal URL
Run:AI tenant admin credentials

Users now have a complete Run:AI deployment—delivered via a single SKU request. The Run:AI administrator can now add end users via the Run:AI console and have them use the available GPU resources.

Infrastructure¶

Rafay automatically provisions the specified GPU Infrastructure in the GPU Cloud's datacenter.

Provisions physical GPU servers or GPU-enabled VMs
Sets up networking, storage, security groups, and VPC isolation
Provisions a production-grade Kubernetes cluster (e.g. Rafay MKS) with the control plane and worker nodes
Deploys and configures cluster add-ons, monitoring, logging, and observability components

Run:AI Tenant¶

To truly deliver Run:AI “as a SKU,” the creation of the Run:AI tenant on the Run:AI Control Plane also needs to be automated. Rafay handles this end-to-end as well:

Creates a Run:AI Tenant via API
Registers the newly provisioned Kubernetes cluster
Verifies Run:AI operator status and successful onboarding
Ensures isolation for the tenant’s environment

In a nutshell, customers get a dedicated Run:AI instance without ever touching infrastructure.

Conclusion¶

Rafay transforms Run:AI from a manually deployed platform into a self-service SKU that GPU Cloud providers can expose to customers with confidence. By automating everything—from provisioning GPU infrastructure to tenant creation to cluster onboarding—Rafay ensures that customers can begin using Run:AI within minutes of selecting a SKU.

For customers, it means instant access to Run:AI. For cloud operators, this means:

Higher operational efficiency
Scalable onboarding of new customers
Stronger differentiation in the GPU Cloud market
A future-proof platform for expanding GPU-accelerated services

Free Org

Sign up for a free Org if you want to try this yourself with our Get Started guides.

Free Org
Live Demo

Schedule time with us to watch a demo in action.

Schedule Demo