From Slurm to Kubernetes: A Guide for HPC Users¶
If you've spent years submitting batch jobs with Slurm, moving to a Kubernetes-based cluster can feel like learning a new language. The concepts are familiar — resource requests, job queues, priorities — but the vocabulary and tooling are different. This guide bridges that gap, helping HPC veterans understand how Kubernetes handles workloads and what that means day-to-day.
The Core Concept: Jobs and Pods¶
In Slurm, you write a batch script and submit it with sbatch. Kubernetes has its own equivalent: the Job resource. A Kubernetes Job defines your workload and manages one or more Pods — the smallest schedulable unit, roughly analogous to a single task or process in Slurm.
Where a Slurm job script defines the compute environment via #SBATCH directives, a Kubernetes Job is defined in a YAML manifest. Let's compare what these look like:
Slurm:
#SBATCH --job-name=my-job
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
python train.py
Kubernetes:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job
spec:
template:
spec:
containers:
- name: my-job
image: python:3.11
command: ["python", "train.py"]
resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "4"
memory: "16Gi"
restartPolicy: Never
In Kubernetes, you submit this with kubectl apply -f job.yaml, much like sbatch job.sh.
Requesting Resources¶
Kubernetes uses a requests and limits model for resource allocation:
Requests¶
Requests are the minimum resources guaranteed to your Pod. The scheduler uses this value to decide where to place your workload.
Limits¶
Limits are the maximum resources your Pod is allowed to consume.
For CPU-bound and memory-intensive HPC jobs, it's best practice to set requests equal to limits to get predictable, dedicated allocations — similar to how Slurm handles exclusive node access.
Requesting GPUs¶
GPU resources are requested as extended resources. For NVIDIA GPUs:
resources:
limits:
nvidia.com/gpu: 2 # Request 2 GPUs
Info
This mirrors Slurm's --gres=gpu:2 directive. Note that unlike CPU and memory, GPU requests don't distinguish between "requests" and "limits". They must be equal.
How Jobs Are Prioritized¶
This is where things differ most significantly from Slurm. Kubernetes uses a Priority Class system rather than a traditional fair-share queue.
- Each job is assigned a PriorityClass, a named integer value.
- Higher numbers mean higher priority.
When the cluster is resource-constrained, the scheduler may preempt (read as evict) lower-priority Pods to make room for higher-priority ones. This is analogous to Slurm's --qos and preemption policies.
Common priority classes might look like:
| Priority Class | Value | Use Case |
|---|---|---|
critical |
1000 | Production / time-sensitive jobs |
high |
500 | Standard research workloads |
low |
100 | Background / exploratory jobs |
batch |
0 | Opportunistic / best-effort jobs |
You assign a priority class in your Job manifest:
spec:
template:
spec:
priorityClassName: high
If no priority class is specified, the cluster default applies (typically low or batch priority).
Queue Behavior and the Scheduler¶
By default, Kubernetes uses the built-in kube-scheduler. But most HPC-oriented Kubernetes clusters will require an advanced scheduler like Volcano or Kueue or KAI. These add batch-aware scheduling features to Kuberentes:
Gang Scheduling¶
Ensures all Pods in a job start simultaneously — critical for MPI workloads where partial starts are wasteful. This is the Kubernetes equivalent of Slurm's --nodes allocation behavior.
Queue Management¶
Workloads are submitted to named queues with configurable capacity and borrowing policies, similar to Slurm partitions.
Fairshare¶
Kueue and Volcano both support weighted fairshare across teams or namespaces, ensuring no single group monopolizes cluster resources over time.
Info
Read our blogs on Kubernetes Schedulers: Custom Schedulers; Schedulers Compared; Fractional GPUs using KAI Scheduler.
Practical Tips for Slurm Users¶
Making the transition smoother comes down to mapping what you know onto the new model:
Slurm Partitions → Kubernetes queues or namespaces.¶
Your team or project will likely have an assigned namespace with resource quotas.
Job arrays → Indexed Jobs.¶
Kubernetes supports completions and parallelism fields to run many parallel tasks from one Job definition.
Interactive jobs → kubectl exec or ephemeral containers.¶
For debugging, you can shell into a running Pod much like srun --pty bash.
Job monitoring → kubectl get jobs and kubectl logs¶
These replace squeue and sacct for checking status and output.
Getting Started¶
The learning curve is real, but the underlying logic is extremely familiar:
- Request resources,
- Define your workload, and
- Let the scheduler find the right place to run it.
Most clusters will provide a kubeconfig file that authenticates your kubectl CLI. Ask your sysadmin for this, just as you would request Slurm account access. From there, submitting your first job is as simple as writing a YAML manifest and running kubectl apply.
Here are the other Slurm-related blogs that we posted recently.
- Introduction to Slurm-The Backbone of HPC
- Project Slinky: Bringing Slurm Scheduling to Kubernetes
- Self-Service Slurm Clusters on Kubernetes with Rafay GPU PaaS
-
Free Org
Sign up for a free Org if you want to try this yourself with our Get Started guides.
-
Live Demo
Schedule time with us to watch a demo in action.
-
Rafay's AI/ML Products
Learn about Rafay's offerings in AI/ML Infrastructure and Tooling
-
:simple-linkedin:{ .middle } About the Author
Read other blogs by the author. Connect with the author on :simple-linkedin:
