Skip to content

Compute Cluster

Conpute Clusters are Kubernetes Clusters that comprise the "Data Plane" of Rafay's Serverless Inferencing Offering. Multiple models can be deployed and operated on the Data Plane concurrently.

Info

The data plane should work on any CNCF conformant Kubernetes cluster. We have validated this extensively with Rafay's MKS Kubernetes Distribution.

Note that the "life cycle management" of the compute cluster's Kubernetes cluster is not managed by Rafay's Serverless Inference solution. It can be managed by Infra administrators using Rafay's Kubernetes Management capabilities.


New Compute Cluster

In the Ops Console, click on GenAI and then Compute Cluster. Now, click on "New Compute Cluster" to initiate the workflow to register an existing Kubernetes cluster as a compute cluster for model deployments.

Register Compute Cluster

  • Download the generated Kubernetes YAML file
  • Use kubectl (as a cluster administrator) to apply the k8s manifest on the target Kubernetes cluster.

This step deploys Rafay's Serverless Inferencing data plane k8s resources on the Kubernetes cluster. Once this step is completed, the resources on this compute cluster become part of the Data Plane for Serverless Inferencing.

Info

Operators can aggregate capacity from multiple compute clusters into a unified inventory that can be used for multiple model deployments.


List All Compute Clusters

In the Ops Console, click on GenAI and then Compute Cluster. This will display the list of registered compute clusters that can be used for model deployments.

List All Compute Clusters


Delete Compute Cluster

To delete a compute cluster, click on the ellipses (3 dots) under actions for the selected compute cluster.

When you click on delete, the serverless inferencing operator resources are removed from the underlying compute cluster and the compute cluster will not be used for any subsequent model deployments.

Delete Compute Cluster

Info

Note that this step will not deprovision the underlying Kubernetes cluster and associated infrastructure. Administrators will need to perform this out of band.