Skip to content

How GPU Clouds can deliver Run:AI via Self Service using Rafay GPU PaaS

As the demand for AI training and inference surges, GPU Clouds are increasingly looking to offer higher-level, turnkey AI services—not just raw GPU instances. Some customers may be familiar with Run:AI from Nvidia as a AI workload orchestration and optimization platform. Delivering Run:AI as a scalable, repeatable SKU—something customers can select and provision with a few clicks—requires deep automation, lifecycle management, and tenant isolation capabilities. This is exactly what Rafay provides.

With Rafay, GPU Clouds can deliver Run:AI as a self-service SKU, ensuring customers receive a fully configured Run:AI environment—complete with GPU infrastructure, a Kubernetes cluster, necessary operators, and a ready-to-use Run:AI tenant—all deployed automatically. This blog explains how Rafay enables cloud providers to industrialize Run:AI provisioning into a consistent, production-ready SKU.

Run:AI via Self Service


Rationale

For GPU Clouds, SKU-based managed services offer tremendous benefits:

  1. Predictable, standardized offerings for customers
  2. Reduced complexity, since the SKU hides all underlying infrastructure
  3. Faster onboarding, enabling customers to begin using Run:AI in minutes
  4. Higher margins, by offering value-added services instead of raw compute
  5. Scalability, allowing dozens or hundreds of customers/tenants to onboard seamlessly

In short, turning Run:AI into a cloud SKU transforms it from a complex integration into a consumption-ready product. The experience begins in the GPU Cloud provider’s marketplace or self-service portal. Customers simply choose the Run:AI SKU, which can come in variants such as:

  • Run:AI Standard — 4 GPUs (e.g., L40S or A100)
  • Run:AI Enterprise — 8 GPUs (e.g., H100)
  • Multi-node Run:AI SKU (e.g., 2× H100 nodes)
  • Bare metal or VM-backed Infrastructure

Each SKU is pre-defined by the cloud provider and backed by Rafay which will perform sophisticated automation behind the scenes orchestrating required infrastucture, deploying and configuring required sofware etc. An illustrative example

Run:AI via Self Service Inputs


Seamless Orchestration under the Covers

A number of steps are automatically performed underneath the covers once the user deploys the SKU. The sequence diagram below describes the high level steps.

Run:AI Self Service Sequence

Once deployment is complete, Rafay will present the user with the following details:

  1. Run:AI Administrative Portal URL
  2. Run:AI tenant admin credentials

Users now have a complete Run:AI deployment—delivered via a single SKU request. The Run:AI administrator can now add end users via the Run:AI console and have them use the available GPU resources.


Infrastructure

Rafay automatically provisions the specified GPU Infrastructure in the GPU Cloud's datacenter.

  • Provisions physical GPU servers or GPU-enabled VMs
  • Sets up networking, storage, security groups, and VPC isolation
  • Provisions a production-grade Kubernetes cluster (e.g. Rafay MKS) with the control plane and worker nodes
  • Deploys and configures cluster add-ons, monitoring, logging, and observability components

Run:AI Tenant

To truly deliver Run:AI “as a SKU,” the creation of the Run:AI tenant on the Run:AI Control Plane also needs to be automated. Rafay handles this end-to-end as well:

  • Creates a Run:AI Tenant via API
  • Registers the newly provisioned Kubernetes cluster
  • Verifies Run:AI operator status and successful onboarding
  • Ensures isolation for the tenant’s environment

In a nutshell, customers get a dedicated Run:AI instance without ever touching infrastructure.


Conclusion

Rafay transforms Run:AI from a manually deployed platform into a self-service SKU that GPU Cloud providers can expose to customers with confidence. By automating everything—from provisioning GPU infrastructure to tenant creation to cluster onboarding—Rafay ensures that customers can begin using Run:AI within minutes of selecting a SKU.

For customers, it means instant access to Run:AI. For cloud operators, this means:

  • Higher operational efficiency
  • Scalable onboarding of new customers
  • Stronger differentiation in the GPU Cloud market
  • A future-proof platform for expanding GPU-accelerated services