Slurm

Slurm compute instances are designed for High Performance Computing (HPC) workloads that require large-scale parallel processing and efficient scheduling of batch jobs. They provide an environment to run compute-intensive applications on bare metal CPU and GPU nodes, managed through a head node and optional login node. This compute option is well-suited for scientific simulations, AI/ML training, and workloads requiring advanced scheduling.

Create Slurm Compute Instance¶

To create a Slurm compute instance, navigate to the Developer Hub and select the Slurm type.

Click New Slurm to create a new instance.
Click View All to view and manage existing Slurm instances.

Users can also access using the left navigation pane under the Compute section.

Clicking on the Slurm option from the left pane or selecting "New Slurm Cluster" from the home page opens a wizard that allows users to select from the available Slurm Compute Profiles.

Once the profile is selected, provide the required details. If pricing for the selected profile is configured in Global Settings by the Org Admin, a monthly estimate will be displayed.

Name: Enter a unique name for the compute instance
Description: Provide a brief summary of the instance
Compute Profile: Proceed with the selected profile
Workspace: Select the workspace from the drop-down menu
Configuration:
Number of CPU nodes
Number of GPU nodes (e.g., H100, L40S)
Enable Login Node (optional)
Click Deploy

Once an instance deployment is initiated, the Status Tracker section displays real-time progress and estimated completion time.

Note: The deployment typically takes 60–90 minutes to complete.

Upon successful deployment, the status updates to Success.

Important

The list of compute profiles presented to the end user is dynamic i.e. if the administrator updates or publishes a new profile, it will be immediately available to the end user as an option to consume.

View Compute Instances¶

When the user clicks on Slurm, they are presented with the list of Slurm compute instances in their workspace.

Name
Workspace
Created At
Nodes
Publish Status
Actions

Actions¶

Once an instance has been launched and is operational, users can perform a number of actions on it. This section describes the list of actions that can be performed.

Run Slurm Command¶

Users can execute Slurm commands on the deployed cluster without logging into the head node.

Click Run Slurm Command from the Actions menu.
Enter the command in the slurm_command field.
Optionally configure the timeout_seconds value (default: 60).
Click Apply to execute.

Example Commands:

sinfo: View the state of the cluster
squeue: List active jobs
sbatch: Submit a batch job (via login node)
scancel: Cancel a job

Add Node¶

Use this action to scale your SLURM cluster by adding additional compute resources. You can choose the type and number of nodes to add based on your workload requirements.

When you select Add Node, you’ll be prompted to specify counts for the supported node types:

l40s_node_addon_count: Number of additional GPU nodes with NVIDIA L40s to be provisioned.
h100_node_addon_count: Number of additional GPU nodes with NVIDIA H100 to be provisioned.
cpu_node_addon_count: Number of additional CPU-only nodes to be provisioned.

The platform automatically provisions the requested nodes and registers them with the SLURM cluster. Once added, these nodes become available for scheduling workloads.

Decommission Node¶

Use this action to remove one or more nodes from the SLURM cluster. This is typically done when nodes are no longer needed, under maintenance, or being scaled down.

You must provide the node name(s) that should be removed. The specified nodes will be drained and decommissioned from the cluster.

Note: Make sure the nodes you enter exist in the current cluster.

Sample Input:

node-L40s-006,node-H100-002,node-CPU-010

Delete Compute Instance¶

If the user does not require a compute instance, they can delete it by clicking on the delete icon on the right of the compute instance.

Note

This action is irreversible and cannot be undone.