AI/ML and GenAI
Core Platform¶
The Core Platform delivers the essential building blocks for constructing a production-grade GPU Cloud, including dynamic GPU partitioning, multi-tenancy isolation, policy enforcement, and automated lifecycle management. These capabilities enable reliable and cost-efficient execution of AI/ML and GenAI workloads across diverse infrastructure environments.
-
GPU PaaS
Convert a stack of GPUs into a dynamically partitioned, multi-tenant GPU Cloud for data scientists and GenAI developers.
Overview | Administration | End Users | GPU Cloud Providers | Get Started
-
GPU Sharing
Share your GPU resources with multiple users/applications.
Overview | Time Slicing | Nvidia MIG-Single | Nvidia MIG-Mixed | Get Started
AI Infrastructure SKUs¶
Rafay’s AI infrastructure SKUs deliver turnkey, production-grade compute environments—ranging from bare metal servers and GPU-enabled VMs to Kubernetes-based clusters and serverless execution frameworks. Each SKU is optimized for multi-tenancy, automated lifecycle management, and high-performance AI/ML workloads, enabling teams to build and scale modern AI platforms with ease.
-
Bare Metal Servers
Instant self-service provisioning of GPU-enabled bare metal servers with a single click.
-
Virtual Machines
1-Click, self-service provisioning of GPU-enabled virtual machines.
-
Managed Kubernetes
1-Click, self-service provisioning of GPU enabled Managed Kubernetes Clusters.
-
Virtual Clusters
1-Click, virtual clusters providing users with a lightweight, fully isolated virtual Kubernetes clusters
-
Serverless Pods
Deploy custom containers from prebuilt templates in isolated environments enabling users to efficiently run CPU and GPU-intensive tasks without managing physical infrastructure.
Overview | Capabilities | Architecture | Requirements | Architecture
-
SLURM on Bare Metal Servers
Batch or interactive workloads on SLURM based HPC clusters based on bare metal servers. Run CPU and GPU-intensive tasks efficiently across distributed compute resources.
-
SLURM on Kubernetes
Batch or interactive workloads on SLURM based HPC clusters running on Kubernetes.
AIML App SKUs¶
Rafay’s AIML SKUs provide fully managed, GPU-accelerated environments for interactive development, distributed training, automated MLOps pipelines, and real-time inference. Each SKU is optimized for multi-tenancy, lifecycle automation, and operational consistency, enabling organizations to streamline the entire AI/ML model lifecycle from experimentation to production.
-
Jupyter Notebooks
1-Click Jupyter Notebook based Interactive Development Environment with instant access to NVIDIA GPUs – start building in minutes.
-
Inferencing with Hourly Metering
Deploy and operate real-time AI inference for popular LLMs with support for hourly metering/billing.
-
Serverless Inference with Token based Metering
Deliver a fully managed serverless inference service for popular LLMs with support for token based metering/billing.
-
MLOps-Kubeflow
Deploy and operate a multi-tenant MLOps platform on your infrastructure based on Kubeflow, MLflow, TensorBoard etc.
-
MLOps-KubeRay
Provide your users with a Ray as a Service multi-tenant offering on your infrastructure.
-
AI Lab
Structured and scalable solution to manage shared lab environments with shared GPU infrastructure.
App Marketplace¶
The App Marketplace enables service providers to publish open-source or custom applications that can be consumed by end users across hundreds of tenants, with all deployments running on the provider’s managed infrastructure. This delivers a seamless, one-click self-service experience for every tenant while ensuring consistency, security, and operational efficiency.
-
Overview
Prepackaged or custom applications deployable on Kubernetes clusters using manifests, Helm charts, or other Kubernetes-native formats.
-
Kubernetes Apps
Prepackaged or custom applications deployable on Kubernetes clusters using manifests, Helm charts, or other Kubernetes-native formats.
-
Docker Apps
Containerized applications deployable via Docker images from DockerHub or private registries, enabling fast and portable deployments.