AI/ML and GenAI

Core Platform¶

The Core Platform delivers the essential building blocks for constructing a production-grade GPU Cloud, including dynamic GPU partitioning, multi-tenancy isolation, policy enforcement, and automated lifecycle management. These capabilities enable reliable and cost-efficient execution of AI/ML and GenAI workloads across diverse infrastructure environments.

GPU PaaS

Convert a stack of GPUs into a dynamically partitioned, multi-tenant GPU Cloud for data scientists and GenAI developers.

Overview | Administration | End Users | GPU Cloud Providers | Get Started
GPU Sharing

Share your GPU resources with multiple users/applications.

Overview | Time Slicing | Nvidia MIG-Single | Nvidia MIG-Mixed | Get Started

AI Infrastructure SKUs¶

Rafay’s AI infrastructure SKUs deliver turnkey, production-grade compute environments—ranging from bare metal servers and GPU-enabled VMs to Kubernetes-based clusters and serverless execution frameworks. Each SKU is optimized for multi-tenancy, automated lifecycle management, and high-performance AI/ML workloads, enabling teams to build and scale modern AI platforms with ease.

Bare Metal Servers

Instant self-service provisioning of GPU-enabled bare metal servers with a single click.

Overview | Administration | End Users | Videos
Virtual Machines

1-Click, self-service provisioning of GPU-enabled virtual machines.

Overview | Administration | End Users | Videos
Managed Kubernetes

1-Click, self-service provisioning of GPU enabled Managed Kubernetes Clusters.

Overview | Administration | End Users
Virtual Clusters

1-Click, virtual clusters providing users with a lightweight, fully isolated virtual Kubernetes clusters

Overview
Serverless Pods

Deploy custom containers from prebuilt templates in isolated environments enabling users to efficiently run CPU and GPU-intensive tasks without managing physical infrastructure.

Overview | Capabilities | Architecture | Requirements | Architecture
SLURM on Bare Metal Servers

Batch or interactive workloads on SLURM based HPC clusters based on bare metal servers. Run CPU and GPU-intensive tasks efficiently across distributed compute resources.

Administration | End Users
SLURM on Kubernetes

Batch or interactive workloads on SLURM based HPC clusters running on Kubernetes.

Overview | Administration | End Users

AIML App SKUs¶

Rafay’s AIML SKUs provide fully managed, GPU-accelerated environments for interactive development, distributed training, automated MLOps pipelines, and real-time inference. Each SKU is optimized for multi-tenancy, lifecycle automation, and operational consistency, enabling organizations to streamline the entire AI/ML model lifecycle from experimentation to production.

Jupyter Notebooks

1-Click Jupyter Notebook based Interactive Development Environment with instant access to NVIDIA GPUs – start building in minutes.

Overview | Administration | End Users
Inferencing with Hourly Metering

Deploy and operate real-time AI inference for popular LLMs with support for hourly metering/billing.

Overview | Administration | End Users
Serverless Inference with Token based Metering

Deliver a fully managed serverless inference service for popular LLMs with support for token based metering/billing.

Overview | Users | Administration | Videos
MLOps-Kubeflow

Deploy and operate a multi-tenant MLOps platform on your infrastructure based on Kubeflow, MLflow, TensorBoard etc.

Overview | Administration | Users | Get Started | Videos
MLOps-KubeRay

Provide your users with a Ray as a Service multi-tenant offering on your infrastructure.

Overview | Administration | Users | Get Started | Videos
AI Lab

Structured and scalable solution to manage shared lab environments with shared GPU infrastructure.

Overview

App Marketplace¶

The App Marketplace enables service providers to publish open-source or custom applications that can be consumed by end users across hundreds of tenants, with all deployments running on the provider’s managed infrastructure. This delivers a seamless, one-click self-service experience for every tenant while ensuring consistency, security, and operational efficiency.

Overview

Prepackaged or custom applications deployable on Kubernetes clusters using manifests, Helm charts, or other Kubernetes-native formats.

Overview
Kubernetes Apps

Prepackaged or custom applications deployable on Kubernetes clusters using manifests, Helm charts, or other Kubernetes-native formats.

Learn More
Docker Apps

Containerized applications deployable via Docker images from DockerHub or private registries, enabling fast and portable deployments.

Learn More