Skip to content

Slurm

Slurm compute instances are designed for High Performance Computing (HPC) workloads that require large-scale parallel processing and efficient scheduling of batch jobs. They provide an environment to run compute-intensive applications on bare metal CPU and GPU nodes, managed through a head node and optional login node. This compute option is well-suited for scientific simulations, AI/ML training, and workloads requiring advanced scheduling.


Create Slurm Compute Instance

To create a Slurm compute instance, navigate to the Developer Hub and select the Slurm type.

  • Click New Slurm to create a new instance.
  • Click View All to view and manage existing Slurm instances.

Users can also access using the left navigation pane under the Compute section.

Hierarchy

Clicking on the Slurm option from the left pane or selecting "New Slurm Cluster" from the home page opens a wizard that allows users to select from the available Slurm Compute Profiles.

Hierarchy

Once the profile is selected, provide the required details. If pricing for the selected profile is configured in Global Settings by the Org Admin, a monthly estimate will be displayed.

  • Name: Enter a unique name for the compute instance
  • Description: Provide a brief summary of the instance
  • Compute Profile: Proceed with the selected profile
  • Workspace: Select the workspace from the drop-down menu
  • Configuration:
  • Number of CPU nodes
  • Number of GPU nodes (e.g., H100, L40S)
  • Enable Login Node (optional)
  • Click Deploy

Create Compute Instance

The instance initially displays a status of In Progress.

Create Compute Instance

Upon successful deployment, the status updates to Success.

Create Compute Instance

Important

The list of compute profiles presented to the end user is dynamic i.e. if the administrator updates or publishes a new profile, it will be immediately available to the end user as an option to consume.


View Compute Instances

When the user clicks on Slurm, they are presented with the list of Slurm compute instances in their workspace.

  • Name
  • Workspace
  • Created At
  • Nodes
  • Publish Status
  • Actions

Compute Instances


Actions

Once an instance has been launched and is operational, users can perform a number of actions on it. This section describes the list of actions that can be performed.

Run Slurm Command

Users can execute Slurm commands on the deployed cluster without logging into the head node.

  • Click Run Slurm Command from the Actions menu.
  • Enter the command in the slurm_command field.
  • Optionally configure the timeout_seconds value (default: 60).
  • Click Apply to execute.

Compute Instances

Example Commands:

  • sinfo: View the state of the cluster
  • squeue: List active jobs
  • sbatch: Submit a batch job (via login node)
  • scancel: Cancel a job

Add Node

Use this action to scale your SLURM cluster by adding additional compute resources. You can choose the type and number of nodes to add based on your workload requirements.

When you select Add Node, you’ll be prompted to specify counts for the supported node types:

  • l40s_node_addon_count: Number of additional GPU nodes with NVIDIA L40s to be provisioned.
  • h100_node_addon_count: Number of additional GPU nodes with NVIDIA H100 to be provisioned.
  • cpu_node_addon_count: Number of additional CPU-only nodes to be provisioned.

The platform automatically provisions the requested nodes and registers them with the SLURM cluster. Once added, these nodes become available for scheduling workloads.

Compute Instances

Decommission Node

Use this action to remove one or more nodes from the SLURM cluster. This is typically done when nodes are no longer needed, under maintenance, or being scaled down.

You must provide the node name(s) that should be removed. The specified nodes will be drained and decommissioned from the cluster.

Note: Make sure the nodes you enter exist in the current cluster.

Sample Input:

node-L40s-006,node-H100-002,node-CPU-010

Compute Instances


Delete Compute Instance

If the user does not require a compute instance, they can delete it by clicking on the delete icon on the right of the compute instance.

Compute Instances

Note

This action is irreversible and cannot be undone.