Skip to content

GPU PaaS

The following tables summarizes requirements and supported environments for the GPU PaaS capabilities. GPU PaaS offering builds on and extends Rafay Kubernetes Manager and Environment Manager. Please review the support matrix for these offerings to understand dependencies.


Infrastructure Types

  • Private Data Center (Bare Metal or Virtualized-vSphere/OpenStack etc)
  • Public Cloud Providers (AWS, Azure, GCP and OCI)

GPU Providers

Note

Spatial partitioning of GPUs is supported only on Nvidia GPUs capable of supporting MIG


Kubernetes Versions

Any version supported by Rafay Kubernetes Manager.

Note

GPU resources in "imported" Kubernetes clusters cannot be automatically scaled up/down.


Storage

Private Cloud

For data center environments with Upstream Kubernetes based on Rafay MKS, organizations have the following options:

  • Rafay's Integrated Managed Storage (based on Rook/Ceph)
  • Bring Your Own Storage (BYOS)
  • 3rd Party, AI/HPC Storage Specialist Provider such as DDN

Public Cloud

All storage options recommended and supported by the cloud provider. For example, on AWS, users can use EBS or EFS or FSx for Lustre or S3.


Frameworks

  • PyTorch (vx.y)
  • TensorFlow (vx.y)

User Applications

  • Workloads: Kubernetes YAML or Helm Charts
  • Jobs: Kubernetes YAML or Helm Charts
  • Jupyter Notebooks: v7.2.1 or later

Integrations

  • Git (GitHub, GitLab)
  • Helm Repositories