Skip to content

Slurm

Prerequisites

Before creating a Slurm Compute Profile, ensure the following are in place:

  • The system-slurm-bm environment template is published and accessible in the project
  • GitOps Agent and Agent Host details are configured
  • Bare metal resources are available in the inventory:
    • CPU nodes for head node and worker node provisioning
    • GPU nodes (e.g., H100, L40S) if required
  • API Key and Controller Endpoint are configured in the platform; these will be available as output variables for accessing the Slurm service

Create a Compute Profile for Slurm

Slurm compute profiles are created by PaaS Administrators to enable users to request and manage HPC clusters on tenant networks using BCM-based provisioning.

This profile simplifies the end-user experience by exposing only a minimal set of options (for example, number of nodes and login node toggle), while administrators configure advanced parameters such as SKU mappings, API endpoints, and GitOps agent details.

Slurm clusters are commonly used for job scheduling, resource allocation, and high-performance workloads in GPU/CPU-intensive environments.

Refer to the Compute Profile Overview for general information.

Steps to Create a Slurm Compute Profile

  • In the Developer Console, select Compute Profiles from the left navigation pane
  • Click the + New Compute Profile button
  • In the Compute Profile form:
    • Name: Provide a unique identifier (e.g., slurm-profile)
    • Display Name: Enter a user-friendly name (optional)
    • Description (Optional): Add relevant information about this profile's purpose or configuration
    • Environment Template: Select the available template:
      • system-slurm-bm: Use this when provisioning and managing a complete Slurm cluster on bare metal. This automates lifecycle operations such as PXE boot, IPMI configuration, and provisioning the head node with worker nodes.
    • Environment Template Version: Select the required version (e.g., v6)
    • Compute Type: Select Slurm

⚠️ This determines that the compute instances launched using this profile will be provisioned for Slurm-based workload management.

Compute Type - SLURM

  • Once all fields are configured, click Save & Continue.

Compute Profile Configuration

Once saved, the Compute Profile Configuration page appears.

General

Name Default Value Value Type Description
Name slurm-prod-profile string Internal identifier for the compute profile
Display Name SLURM Production string User-friendly label for UI display
Description Profile for SLURM production workloads string Notes describing the profile purpose or usage
Allocation Type Shared string Determines whether nodes are dedicated or shared

Advanced Configuration

Name Default Value Value Type Description
Labels env=production, team=ai key-value Key-value pairs used for grouping or identifying resources
Annotations owner=platform-team key-value Non-identifying metadata for resource management or documentation purposes
Extra Config {"logLevel":"debug"} key-value Additional configuration in key-value or JSON format for advanced tuning

Compute Type - SLURM

Display Settings

Name Default Value Value Type Description
Icon URL https://example.com/icons/slurm.png string URL to a custom icon used to visually identify the compute profile
Read Me SLURM profile for AI/ML workloads string Short summary describing the purpose or characteristics of the profile

Compute Type - SLURM

Input Settings

Name Value Type Description
API Key Enter Value envVars Authentication key used for API access
Controller Endpoint console.stage.shakticloud.ai envVars Endpoint of the controller managing the cluster
CPU Node SKU bmass-cpu-slurm json SKU identifier for CPU-based worker nodes
CPU Nodes 0 text Number of CPU worker nodes
Enable Login Node false text Flag to enable or disable a dedicated login node
GitOps Agent Host IP 10.0.7.62.73 text IP address of the GitOps agent host
GitOps Agent Host Password ****** text Password for GitOps agent host authentication
GitOps Agent Host TAN Interface bond0 text Network interface used by the GitOps agent host
GitOps Agent Host Username rafayuser text Username for GitOps agent host login
H100 Node SKU bmass-h100-slurm text SKU identifier for GPU-based H100 nodes
H100 Nodes 0 text Number of GPU H100 nodes
Head Node SKU vm-bom-head-node text SKU identifier for the cluster head node
L40S Node SKU bmass-l40s-slurm text SKU identifier for GPU-based L40S nodes
L40S Nodes 2 text Number of GPU L40S nodes
Login Node SKU vm-slurm-login-node text SKU identifier for login node
NCP Server API Key ****** text API key for NCP server authentication
Ops Console Endpoint ops-console.stage.shakticloud.ai text Endpoint of the Ops console
Partner API Key ****** text API key for partner integrations
PXE Subnet Name default-pxe text Subnet name for PXE boot
PXE VPC Name default-pxe text VPC name for PXE boot
TAN Subnet Name default text Subnet name for TAN
TAN VPC Name default text VPC name for TAN

Compute Type - SLURM

Input Configuration Controls (Slurm)

  • Override (Checkbox): Enables environment-level overrides for a specific input parameter in the Slurm configuration. When selected, users can customize values such as partition size, node count, or job submission parameters in their environment-specific settings.

  • Allow Override For All: A global toggle to enable override capability across all listed Slurm inputs in one click. This is useful when overrides need to be enabled for multiple cluster or job parameters simultaneously.

  • Preview Input Form: Clicking Preview Input Form displays how the configured Slurm inputs appear to users. It includes field labels, tooltips, input types, validation, and grouping as defined in the configuration (for example, partition definitions or resource limits).

  • Display Config (Edit): Opens a configuration panel that allows customization of how each Slurm input field appears in the environment form. It can be used to change the display name, add tooltips for guidance (e.g., how to set job limits), set default values, define input types, or group related parameters such as compute resources or scheduling options.

Example: Edited Input for Controller Endpoint

Field Description
Alias Internal reference name for the field (endpoint)
Tooltip Help text shown when hovering over the info icon (empty in this case)
Disabled Field is disabled and cannot be edited by end users
Order/Weight Position of the field in the form (30; lower values appear first)
Type Input type is set to File Upload (Text Only)
Validation Type Validation is based on Length
Validation Pattern Length validation allows a maximum of 20 characters
Section Field is grouped under the General section
Section Description Optional description for the section (currently empty)

Edit cluster_name

Output Settings

PaaS Admin can define outputs such as SSH keys, node details, or Slurm command results. These outputs are displayed to end users after they deploy the Slurm cluster, enabling them to access the login node and interact with the cluster using Slurm commands.

Name Label Description
private_key_pem Login-node SSH Private Key SSH private key to access the login node
nodes Slurm Nodes Total number of nodes provisioned in the Slurm cluster
login_node_hostname Login-node Hostname Hostname of the provisioned login node
slurm_command_output Slurm Command Output Output of Slurm commands executed on the cluster
slurm_command_validation_error Slurm Command Error Error messages from Slurm command execution
login_node_ip Login-node IP Address IP address of the login node
login_node_username Login-node Username Username used to access the login node
login_node_password Login-node Password Password for accessing the login node (if enabled)

Compute Type - SLURM

Once all configurations are complete, click Save Changes to apply the updates.