kubeRay

Machine Learning workloads such as deep learning and hyperparameter tuning are compute-intensive by nature. Parallel execution is key to reducing the learning time. The Ray Framework is a distributed middleware that provides primitives to seamlessly parallelize machine learning code execution across a cluster of compute node. Since Ray is not a Kubernetes-native project, in order to deploy Ray on Kubernetes, the OSS community has created KubeRay.

Think of this project as a toolkit to deploy Ray in Kubernetes clusters. Specifically, KubeRay is a Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. KubeRay offers several key components:

Component	Description
KubeRay core	The official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService
RayCluster	KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.
RayJob	With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.
RayService	RayService is made up of two parts: a RayCluster and a Ray Serve deployment graph. RayService offers zero-downtime upgrades for RayCluster and high availability.

Ray Cluster¶

A Ray cluster consists of a single head node and a number of connected workers. With support for autoscaling, KubeRay can dynamically size your Ray clusters according to the requirements of your Ray workload. It will automatically add/remove Ray worker as required. KubeRay also supports heterogeneous compute nodes (including GPUs) as well as running multiple Ray clusters with different Ray versions in the same Kubernetes cluster.

Important

Ray nodes (i.e. head and workers) are implemented as pods on Kubernetes.

Shown below is a Ray cluster with two workers. Each worker runs Ray helper processes to facilitate distributed scheduling and memory management. The head node runs additional control processes.

Head Node¶

Every Ray cluster has one node which is designated as the head node of the cluster. The head node pod which acts as a management controller, responsible for the Ray driver processes that run the Ray Jobs. It also contains the cluster’s GCS (Global Control Store) which is the cluster’s metadata which holds information about the nodes, the resources of the nodes, and placement group information. The worker nodes communicate with GCS often via gRPC.

The head node is identical to other worker nodes, except that it also runs singleton processes responsible for cluster management such as the autoscaler, GCS and the Ray driver processes which run Ray jobs. Ray may schedule tasks and actors on the head node just like any other worker node, which is not desired in large-scale clusters.

Worker Node¶

Worker nodes do not run any head node management processes, and serve only to run user code in Ray tasks and actors. They participate in distributed scheduling, as well as the storage and distribution of Ray objects in cluster memory.

Autoscaling¶

The Ray autoscaler is a process that runs as a sidecar container in the head pod on Kubernetes. When the resource demands of the Ray workload exceed the current capacity of the cluster, the autoscaler will try to increase the number of worker nodes. When worker nodes sit idle, the autoscaler will remove worker nodes from the cluster.

RayJob¶

A Ray job is a single application, a collection of Ray tasks, objects, and actors that originate from the same script. The worker that runs the Python script is known as the driver of the job. The recommended way to run a Ray Job on a Ray Cluster is to submit the job using the Ray Jobs API.

RayService¶

RayService in KubeRay is a managed service feature that allows for the deployment of Ray applications with built-in support for high availability, scalability, and zero-downtime upgrades. RayService simplifies the process of deploying and managing Ray-based workloads in Kubernetes, making it easier for developers to run distributed machine learning, reinforcement learning, or other parallelized applications at scale.

It consists of two key components:

RayCluster¶

This is the Ray cluster that runs distributed applications, managing the lifecycle of nodes and ensuring that the cluster can scale up or down based on workload demand.

Ray Serve Deployment Graph¶

RayService integrates with Ray Serve, a scalable model serving library, to deploy and manage inference workloads. The deployment graph manages how models are deployed and served, ensuring that incoming requests are handled efficiently.

The key features of RayService are:

Zero-Downtime Upgrades RayService ensures that updates to Ray clusters or deployment graphs can be made without interrupting ongoing tasks or services, providing continuous availability.

High Availability It provides fault tolerance and can recover from failures, ensuring that the Ray cluster remains operational even if some nodes fail.

Scalability RayService helps scale applications automatically based on demand, adjusting the number of nodes in the RayCluster to optimize resource utilization.