Skip to content

Support Matrix

This page summarizes requirements and supported environments for the Rafay's Token Factory. This document is focused primarily on the Token Factory data plane where the models are deployed onto compute clusters.

Info

This support matrix is valid as of GPU PaaS v3.1-39 release.


Compute Clusters

Any CNCF conformant Kubernetes cluster can be registered as a compute cluster for Token Factory. The following are actively tested.

  • Rafay MKS Kubernetes Clusters (k8s v1.34 or later releases)
  • RedHat OpenShift (v4.21.7 or later releases)

AI Accelerators

Token Factory itself does not impose any limitation/constraint on AI Accelerator vendor/model. The selection of the AI accelerator and model will depend on your choice of the inference engine. Click on the provided links below for additional details.


Inference Engines

The Token Factory supports multiple inference engines (aka backends) to ensure operators have maximum flexibility and choice.

  • vLLM
  • NVIDIA NIM
  • NVIDIA Dynamo (choice of vLLM, SGLang, TensorRT-LLM)

Note

TokenFactory does not support engines such as Ollama or llama.cpp because these are not designed for for production scale, multi-user, multi-tenant deployments.


Model Providers

Token Factory does not impose any limitation/constraint on model providers (e.g. Llama, Qwen, Gemma etc). Please navigate to each inference engine's support matrix below.


Model Distributors

Models and associated model weights are distributed by providers over the Internet. Supported providers are:

Internet Download

Locally Staged Model Weights

  • Storage Namespace (backed by object storage)
  • Local PVC

Note that model weights can also be download out of band by operators and staged locally via storage namespaces and local PVC.

Info

Model weights uploaded to storage namespaces need to be in SafeTensor format.


Storage Namespace

Model weights can be downloaded and staged locally in a storage namespace backed by Object storage providers. Any s3 compatible object storage provider should work. Currently supported providers are:

  • DDN
  • Vast Data
  • Weka
  • Dell Object Store
  • Ceph
  • MinIO
  • Amazon S3
  • Google Storage
  • Azure Blob Storage

Languages

The user self service portal supports multiple languages (admin configured, user selectable). Languages that are supported out of box are:

  • English
  • French
  • Spanish
  • Japanese
  • Turkish
  • Arabic

Operators can also add new languages using the Operations Console. They can also customize the text for the default languages.