Support Matrix
This page summarizes requirements and supported environments for the Rafay's Token Factory. This document is focused primarily on the Token Factory data plane where the models are deployed onto compute clusters.
Info
This support matrix is valid as of GPU PaaS v3.1-39 release.
Compute Clusters¶
Any CNCF conformant Kubernetes cluster can be registered as a compute cluster for Token Factory. The following are actively tested.
- Rafay MKS Kubernetes Clusters (k8s v1.34 or later releases)
- RedHat OpenShift (v4.21.7 or later releases)
AI Accelerators¶
Token Factory itself does not impose any limitation/constraint on AI Accelerator vendor/model. The selection of the AI accelerator and model will depend on your choice of the inference engine. Click on the provided links below for additional details.
- vLLM: NVIDIA, AMD and Google TPU
- NVIDIA NIM: NVIDIA GPUs
- NVIDIA Dynamo: NVIDIA GPUs
Inference Engines¶
The Token Factory supports multiple inference engines (aka backends) to ensure operators have maximum flexibility and choice.
- vLLM
- NVIDIA NIM
- NVIDIA Dynamo (choice of vLLM, SGLang, TensorRT-LLM)
Note
TokenFactory does not support engines such as Ollama or llama.cpp because these are not designed for for production scale, multi-user, multi-tenant deployments.
Model Providers¶
Token Factory does not impose any limitation/constraint on model providers (e.g. Llama, Qwen, Gemma etc). Please navigate to each inference engine's support matrix below.
Model Distributors¶
Models and associated model weights are distributed by providers over the Internet. Supported providers are:
Internet Download
Locally Staged Model Weights
- Storage Namespace (backed by object storage)
- Local PVC
Note that model weights can also be download out of band by operators and staged locally via storage namespaces and local PVC.
Info
Model weights uploaded to storage namespaces need to be in SafeTensor format.
Storage Namespace¶
Model weights can be downloaded and staged locally in a storage namespace backed by Object storage providers. Any s3 compatible object storage provider should work. Currently supported providers are:
- DDN
- Vast Data
- Weka
- Dell Object Store
- Ceph
- MinIO
- Amazon S3
- Google Storage
- Azure Blob Storage
Languages¶
The user self service portal supports multiple languages (admin configured, user selectable). Languages that are supported out of box are:
- English
- French
- Spanish
- Japanese
- Turkish
- Arabic
Operators can also add new languages using the Operations Console. They can also customize the text for the default languages.