Skip to content

Configure

In this section, you will create a standardized cluster blueprint with the kuberay-operator add-on. You can then reuse this blueprint with all your clusters.

Step 1: Create Namespace

In this step, you will create a namespace for the Kuberay Operator.

  • Navigate to a project in your Org
  • Select Infrastructure -> Namespaces
  • Click New Namespace
  • Enter the name kuberay
  • Select wizard for the type
  • Click Save
  • Click Discard Changes & Exit

Step 2: Create Repository

In this step, you will create a repository in your project so that the controller can retrieve the kuberay Operator Helm chart automatically.

Optionally, you can click on the validate button on the repo to confirm connectivity.


Step 3: Create kuberay-operator Addon

In this step, you will create a custom add-on for the kuberay-operator that will pull the Helm chart from the previously created repository. This add-on will be added to a custom cluster blueprint in a later step.

  • Select Infrastructure -> Add-Ons
  • Click New Add-On -> Create New Add-On
  • Enter the name kuberay-operator
  • Select Helm 3 for the type
  • Selct Pull files from repository
  • Select Helm for the repository type
  • Select kuberay for the namespace
  • Click Create

  • Click New Version

  • Enter a version name
  • Select the previously created repository
  • Enter kuberay-operator for the chart name
  • Click Save Changes

Step 4: Create Blueprint

In this step, you will create a custom cluster blueprint that contains the kuberay-operator add-on that is previously created. The cluster blueprint can be applied to one or multiple clusters.

  • Select Infrastructure -> Blueprints
  • Click New Blueprint
  • Enter the name kuberay
  • Click Save
  • Enter a version name
  • Select Minimal for the base blueprint
  • In the add-ons section, click Configure Add-Ons
  • Click the + symbol next to the previously created add-ons to add them to the blueprint
  • Click Save Changes

Step 5: Apply Blueprint

In this step, you will apply the previously created cluster blueprint to an existing cluster. The blueprint will deploy the kuberay-operator add-ons to the cluster.

  • Select Infrastructure -> Clusters
  • Click the gear icon on the cluster card -> Update Blueprint
  • Select the previously created kuberay blueprint and version
  • Click Save and Publish

The controller will publish and reconcile the blueprint on the target cluster. This can take a few seconds to complete.


Step 6: Create RayLLM Workload

In this step, you will create a workload for the RayLLM.

  • Save the below YAML to a file named ray-service-llm.yaml
apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayllm
spec:
  serviceUnhealthySecondThreshold: 1200
  deploymentUnhealthySecondThreshold: 1200
  serveConfigV2: |
      applications:
      - name: router
        import_path: rayllm.backend:router_application
        route_prefix: /
        args:
          models:
            - ./models/continuous_batching/amazon--LightGPT.yaml
            - ./models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
  rayClusterConfig:
    headGroupSpec:
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 2}"'
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image: anyscale/ray-llm:latest
            resources:
              limits:
                cpu: 2
                memory: 8Gi
              requests:
                cpu: 2
                memory: 8Gi
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 0
      maxReplicas: 1
      groupName: gpu-group
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 1, \"accelerator_type_a10\": 1, \"accelerator_type_a100_80g\": 1}"'
      template:
        spec:
          containers:
          - name: llm
            image: anyscale/ray-llm:latest
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh","-c","ray stop"]
            resources:
              limits:
                cpu: "48"
                memory: "192G"
                nvidia.com/gpu: 1
              requests:
                cpu: "1"
                memory: "1G"
                nvidia.com/gpu: 1
            ports:
            - containerPort: 8000
              name: serve
          tolerations:
            - key: "ray.io/node-type"
              operator: "Equal"
              value: "gpu"
              effect: "NoSchedule"

Note

Make sure to have the taint ray.io/node-type:gpu on the gpu nodes in the cluster.

  • Select Applications -> Workloads
  • Click New Workload -> Create New Workload
  • Enter the name kuberay-llm
  • Select K8s YAML for the type
  • Select Upload files manually
  • Select kuberay for the namespace
  • Click Create
  • Click Upload and select the previously saved ray-service-llm.yaml file
  • Click Save And Goto Placement
  • Select the cluster in the placement page
  • Click Save And Goto Publish
  • Click Publish
  • Wait for the pods to come up and rayllm-serve-svc to be created

kuberay-pods


Next Step

At this point, you have done everything required to get kuberay-operator installed and operational on your cluster along with RayLLM. In the next step, we will query the models deployed using RayLLM.