Compute Cluster
Conpute Clusters are Kubernetes Clusters that comprise the "Data Plane" of Rafay's Serverless Inferencing Offering. Multiple models can be deployed and operated on the Data Plane concurrently.
Info
The data plane should work on any CNCF conformant Kubernetes cluster. We have validated this extensively with Rafay's MKS Kubernetes Distribution.
Note that the "life cycle management" of the compute cluster's Kubernetes cluster is not managed by Rafay's Serverless Inference solution. It can be managed by Infra administrators using Rafay's Kubernetes Management capabilities.
New Compute Cluster¶
In the Ops Console, click on GenAI and then Compute Cluster. Now, click on "New Compute Cluster" to initiate the workflow to register an existing Kubernetes cluster as a compute cluster for model deployments.
- Download the generated Kubernetes YAML file
- Use kubectl (as a cluster administrator) to apply the k8s manifest on the target Kubernetes cluster.
This step deploys Rafay's Serverless Inferencing data plane k8s resources on the Kubernetes cluster. Once this step is completed, the resources on this compute cluster become part of the Data Plane for Serverless Inferencing.
Info
Operators can aggregate capacity from multiple compute clusters into a unified inventory that can be used for multiple model deployments.
List All Compute Clusters¶
In the Ops Console, click on GenAI and then Compute Cluster. This will display the list of registered compute clusters that can be used for model deployments.
Delete Compute Cluster¶
To delete a compute cluster, click on the ellipses (3 dots) under actions for the selected compute cluster.
When you click on delete, the serverless inferencing operator resources are removed from the underlying compute cluster and the compute cluster will not be used for any subsequent model deployments.
Info
Note that this step will not deprovision the underlying Kubernetes cluster and associated infrastructure. Administrators will need to perform this out of band.


