Support Matrix

This page summarizes requirements and supported environments for the Rafay's Token Factory. This document is focused primarily on the Token Factory data plane where the models are deployed onto compute clusters.

Info

This support matrix is valid as of GPU PaaS v3.1-39 release.

Compute Clusters¶

Any CNCF conformant Kubernetes cluster can be registered as a compute cluster for Token Factory. The following are actively tested.

Rafay MKS Kubernetes Clusters (k8s v1.34 or later releases)
RedHat OpenShift (v4.21.7 or later releases)

AI Accelerators¶

Token Factory itself does not impose any limitation/constraint on AI Accelerator vendor/model. The selection of the AI accelerator and model will depend on your choice of the inference engine. Click on the provided links below for additional details.

vLLM: NVIDIA, AMD and Google TPU
NVIDIA NIM: NVIDIA GPUs
NVIDIA Dynamo: NVIDIA GPUs

Inference Engines¶

The Token Factory supports multiple inference engines (aka backends) to ensure operators have maximum flexibility and choice.

vLLM
NVIDIA NIM
NVIDIA Dynamo (choice of vLLM, SGLang, TensorRT-LLM)

Note

TokenFactory does not support engines such as Ollama or llama.cpp because these are not designed for for production scale, multi-user, multi-tenant deployments.

Model Providers¶

Token Factory does not impose any limitation/constraint on model providers (e.g. Llama, Qwen, Gemma etc). Please navigate to each inference engine's support matrix below.

Model Distributors¶

Models and associated model weights are distributed by providers over the Internet. Supported providers are:

Internet Download

Locally Staged Model Weights

Storage Namespace (backed by object storage)
Local PVC

Note that model weights can also be download out of band by operators and staged locally via storage namespaces and local PVC.

Info

Model weights uploaded to storage namespaces need to be in SafeTensor format.

Storage Namespace¶

Model weights can be downloaded and staged locally in a storage namespace backed by Object storage providers. Any s3 compatible object storage provider should work. Currently supported providers are:

DDN
Vast Data
Weka
Dell Object Store
Ceph
MinIO
Amazon S3
Google Storage
Azure Blob Storage

Languages¶

The user self service portal supports multiple languages (admin configured, user selectable). Languages that are supported out of box are:

English
French
Spanish
Japanese
Turkish
Arabic

Operators can also add new languages using the Operations Console. They can also customize the text for the default languages.