GPU PaaS
The following tables summarizes requirements and supported environments for the GPU PaaS capabilities. GPU PaaS offering builds on and extends Rafay Kubernetes Manager and Environment Manager. Please review the support matrix for these offerings to understand dependencies.
Infrastructure Types¶
- Private Data Center (Bare Metal or Virtualized-vSphere/OpenStack etc)
- Public Cloud Providers (AWS, Azure, GCP and OCI)
GPU Providers¶
- Nvidia GPUs based on Ampere, Hopper and Blackwell architectures
- Intel Gaudi 3 AI accelerators
- AMD Instinct MI200 and MI300 Series
Note
Spatial partitioning of GPUs is supported only on Nvidia GPUs capable of supporting MIG
Kubernetes Versions¶
Any version supported by Rafay Kubernetes Manager.
Note
GPU resources in "imported" Kubernetes clusters cannot be automatically scaled up/down.
Storage¶
Private Cloud¶
For data center environments with Upstream Kubernetes based on Rafay MKS, organizations have the following options:
- Rafay's Integrated Managed Storage (based on Rook/Ceph)
- Bring Your Own Storage (BYOS)
- 3rd Party, AI/HPC Storage Specialist Provider such as DDN
Public Cloud¶
All storage options recommended and supported by the cloud provider. For example, on AWS, users can use EBS or EFS or FSx for Lustre or S3.
Frameworks¶
- PyTorch (vx.y)
- TensorFlow (vx.y)
User Applications¶
- Workloads: Kubernetes YAML or Helm Charts
- Jobs: Kubernetes YAML or Helm Charts
- Jupyter Notebooks: v7.2.1 or later
Integrations¶
- Git (GitHub, GitLab)
- Helm Repositories