Skip to content

Critical Capabilities

As demand for AI/ML and GenAI workloads skyrockets, a growing number of cloud providers are stepping up to offer GPU cloud services. These GPU Cloud Providers—whether hyperscalers, regional data center operators, or internal enterprise platform teams require a modern, multi-tenant platform that abstracts infrastructure complexity, accelerates onboarding, and provides secure, policy-governed access to powerful compute resources.

Rafay GPU PaaS is built specifically to empower GPU Cloud Providers to launch and scale fully managed, self-service GPU cloud offerings. It serves as the control plane, developer portal, and orchestration engine for turning bare-metal infrastructure into a turnkey cloud experience optimized for AI, ML, and GenAI use cases. By combining developer self-service, automated infrastructure operations, advanced cloud management, and monetization capabilities, Rafay transforms bare metal GPU infrastructure into a consumable, profitable service layer.

High Level Architecture

Note

GPU Cloud Providers can now compete with hyperscalers—not just in performance, but in agility, experience, and control.


Purpose-Built for GPU Cloud Providers

Rafay GPU PaaS is not just a set of APIs or tools. It is a vertically integrated platform designed to operate and commercialize GPU infrastructure as a service. It delivers the following critical capabilities as shown in the image above.

  1. Self-Service Access for End Users (data scientists, ML engineers, GenAI developers)
  2. Cloud Management for Providers (quotas, metering, policies, visibility)
  3. Infrastructure Automation (bare metal provisioning, network services, inventory)
  4. Multi-Tenancy and Security Isolation (for B2B or internal segmentation)
  5. Marketplace Enablement (application onboarding, white labeling, licensing)

Unified Self-Service Portal

At the heart of Rafay GPU PaaS is a white-label ready, Self-Service Portal that simplifies access to GPU-powered resources. This portal supports multiple personas and can be fully branded and personalized by providers to match their identity and customer experience.

Examples of user roles that use the self service portal are

  • Data Scientists & ML Researchers: Launch notebooks, train models, run inference.
  • GenAI App Developers: Fine-tune LLMs, serve models via token-based APIs, explore foundation models.
  • Application Developers: Integrate ML services into apps or pipelines.
  • Enterprise Admins & FinOps: Manage cost controls, access policies, and consumption quotas.

Info

Cloud Providers can use this portal to offer a curated, role-specific experience to their tenants, all while enforcing policies and quotas under the hood.


Cloud Services Designed for AI/ML Workloads

GPU Cloud Providers need to serve a variety of workloads, from exploratory notebooks to large-scale model training. Rafay’s Cloud Services layer includes the following categories:

Compute Services

Rafay GPU PaaS offers a flexible, high-performance compute layer that caters to a wide spectrum of AI/ML and GenAI workloads. These services are modular, policy-governed, and tenant-aware—enabling GPU Cloud Providers to offer customers the exact compute environment they need, with full operational visibility and control.

Bare Metal Servers

Provides direct, low-latency access to physical GPU servers without the overhead of virtualization. Ideal for workloads that require maximum performance and fine-tuned control over the hardware. Cloud Providers can expose GPUs like NVIDIA A100, H100, L40S, and L4 via Rafay’s orchestration layer, allowing users to run custom frameworks, multi-GPU training jobs, or performance-sensitive inference pipelines.

  • Use Cases: LLM training, computer vision, HPC workloads.
  • Benefits: Maximum GPU throughput, no virtualization tax, full driver/kernel control.
  • Tenant Features: Secure isolation, dynamic provisioning, lifecycle tracking.

Virtual Machines

Fully managed GPU-backed virtual machines provide a familiar environment for users who need traditional OS-based compute instances. VMs can be configured with custom images, storage, and networking, making them ideal for application development, packaging, and hybrid workflows.

  • Use Cases: Enterprise app hosting, pre-ML data pipelines, multi-user environments.
  • Benefits: Fast provisioning, OS-level control, compatibility with legacy tools.
  • Provider Features: SKU-based templates, quota enforcement, snapshotting.

Managed Kubernetes Clusters

Rafay allows GPU Cloud Providers to expose GPU infrastructure via managed Kubernetes clusters. These clusters abstract container orchestration complexity, making it easier for users to run distributed training, serve inference models, and scale services elastically.

  • Use Cases: Microservices, distributed model training, CI/CD pipelines.
  • Benefits: Native scaling, pod-level scheduling, built-in observability.
  • Provider Features: Policy enforcement, role-based access, monitoring, and logging.

vClusters (Virtual Clusters)

Lightweight, tenant-isolated Kubernetes environments hosted on shared infrastructure. Each vCluster behaves like a full Kubernetes cluster but is cost-efficient and ideal for secure experimentation, SaaS multi-tenancy, or CI jobs without the overhead of full cluster provisioning.

  • Use Cases: R&D, tenant sandboxing, customer onboarding for AI services.
  • Benefits: Fast startup, lower resource footprint, namespace isolation.
  • Provider Features: Resource quota enforcement, custom cluster templates, soft-multitenancy controls.

SLURM Clusters

Integrated support for SLURM, a widely-used open-source job scheduler for high-performance computing environments. Enables researchers and engineers to submit large-scale training or simulation jobs with GPU resource requirements, queue management, and dependency control.

  • Use Cases: LLM training, batch model training, parameter sweeps.
  • Benefits: Optimized job orchestration, GPU-aware scheduling, advanced queue management.
  • Provider Features: Multi-user support, per-project accounting, integration with quota and metering.

Serverless Pods

Provides ephemeral, auto-scaling compute for stateless or event-driven workloads. Perfect for model inferencing, image processing, and API-based ML services. Users only consume resources when their pods are active, making it a cost-effective and scalable option for unpredictable workloads.

  • Use Cases: On-demand model serving, webhook handlers, token-based LLM inference.
  • Benefits: Zero-provisioning time, scale-to-zero, per-request billing support.
  • Provider Features: Built-in metering, namespace isolation, auto-scaling policies.

AI/ML Applications/Tools

Notebooks

Pre-built environments for rapid prototyping and experimentation.

Training Pipelines

Support for both interactive and batch-based model training.

Inference Services

Deploy ML models as scalable APIs.


GenAI Capabilities

Model Catalogs

Let tenants browse and use popular open source LLMs.

Fine-Tuning Workflows

Allows end users to customize base models for domain-specific applications.

Serverless Inference

Token-based LLM APIs with metering and autoscaling.

Storage Namespaces

Isolated environments for prompt libraries, model artifacts, and datasets.


App Marketplace & Monetization

Rafay enables GPU Cloud Providers to build a Marketplace of AI/ML applications and services. This creates opportunities for monetization, ecosystem development, and customer stickiness. Key features include:

App Onboarding

Package internal or partner-developed apps for end user self-service based usage.

One-Click Deployment

Blueprints to instantly provision environments like RAG stacks, AutoML tools, or vector DBs.

Metering & Licensing

Support for pay-per-use or subscription-based consumption.

Branding & Personalization

Create a unified and differentiated customer experience with UI, domain, and theme customization.


Cloud Management for Providers

To operate a reliable GPU cloud at scale, providers need advanced management capabilities across infrastructure, usage, access, and policy. Rafay delivers these out of the box.

Multi-Tenancy and Access Control

Tenant Isolation with support for logical and network-level separation between different tenants (i.e. Rafay Orgs). Integrate with SSO and IdPs and manage user roles and access scopes per tenant.

Quota & SKU Management

Quota Enforcement to prevent noisy neighbors and overconsumption with hard/soft quotas. SKU Management by packaging GPU types and configurations into named offerings (e.g., A100-highmem, L40S-standard).

Policy & Workflow Management

Integrated Policy Engine to define constraints on GPU usage, access patterns, image repositories, and more. A sophisticated workflow engine to automate onboarding, training jobs, deployment flows, and approvals.

Cost & Usage Visibility

Metering to provide real-time and historical metrics across GPU-hours, token usage, tenant utilization. Billing Integration to export usage data to external billing systems or present showback dashboards.

Network & Service Isolation

Built-in support for VPCs, NAT, and segmentation to isolate tenants or workloads with sensitive data.


Hardware Automation & Inventory Management

Rafay’s platform includes a dedicated layer for integrating and automating physical infrastructure, giving GPU Cloud Providers full-stack control.

Networking

Turnkey support for Ethernet & Infiniband delivering high-throughput, low-latency training traffic.

Server Management

Bare Metal Inventory Management to track and manage availability of GPU nodes by type, health, and status. Automated provisioning and allocation by integrate with PXE, Redfish, or IPMI to automate node onboarding and imaging.

Storage & Security

Seamless integrations with leading storage providers to deliver object and block storage for model checkpoints, datasets, and fine-tuning artifacts. Seamless integration with firewalls to ensure infrastructure access is secure, isolated and controlled.