Skip to content

Containerized Jobs

The SLURM cluster is configured with Apptainer and Enroot/Pyxis in order to run containers.


Apptainer

With SLURM clusters, you can run workloads inside containers using Apptainer, a lightweight and secure container engine built for HPC and scientific computing environments. Formerly known as Singularity, Apptainer integrates seamlessly with Slurm to provide portable, reproducible job execution and continues to use the familiar .sif image format.

Follow these steps to run containers with Apptainer.

  • Access the Login node via SSH
  • Run the following command to pull the container image and convert it to .sif format used by Apptainer
srun apptainer pull cuda_image.sif docker://nvidia/cuda:12.4.1-cudnn-devel-rockylinux8

Create a file named apptainer.sbatch on the shared volume, /mnt/data with the following content

#!/bin/bash
#SBATCH --job-name=apptainer
#SBATCH --output=output.log
#SBATCH --error=error.log
apptainer exec --nv cuda_image.sif nvidia-smi

For more information about other parameters available for srun, see Apptainer documentation.

The above job script will run the container and execute the nvidia-smi command within the container.

Once the image is converted, run the job by executing the following command

sbatch apptainer.sbatch

Check the output.log file in the directory where the script was run to see the nvidia-smi command output by the container.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:18:00.0 Off |                    0 |
| N/A   27C    P0             76W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:2A:00.0 Off |                    0 |
| N/A   28C    P0             74W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:3A:00.0 Off |                    0 |
| N/A   29C    P0             77W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:5D:00.0 Off |                    0 |
| N/A   28C    P0             77W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:9A:00.0 Off |                    0 |
| N/A   27C    P0             77W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:AB:00.0 Off |                    0 |
| N/A   30C    P0             77W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:BA:00.0 Off |                    0 |
| N/A   29C    P0             75W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:DB:00.0 Off |                    0 |
| N/A   30C    P0             76W /  700W |       1MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Enroot/Pyxis

SLURM clusters integrate Enroot and Pyxis to enable containerized job execution:

  • Enroot is a lightweight container runtime developed by NVIDIA for machine learning and high-performance computing (HPC) workloads. It supports Docker-compatible images and runs them efficiently within SLURM environments.
  • Pyxis is a SLURM plugin that extends Enroot’s functionality, allowing users to launch containers directly with the srun command by adding --container-* options.

With this setup, you can easily execute a SLURM job inside a container, whether the container image resides in a registry or is stored locally. Follow these steps to run containers with Enroot/Pyxis.

  • Access the Login node via SSH
  • Create a file named pyxis.sbatch on the shared volume, /mnt/data with the following content
#!/bin/bash

#SBATCH --job-name=pyxis
#SBATCH --output=output.log
#SBATCH --error=error.log
#SBATCH --partition=pyxis

srun --partition=pyxis --container-image=alpine:latest grep PRETTY /etc/os-release

For more information about other parameters available for srun, see Pyxis documentation. Run the job by executing the following command

sbatch pyxis.sbatch

Check the output.log file in the directory where the script was run to see the grep command output by the container.

pyxis: importing docker image: alpine:latest
pyxis: imported docker image: alpine:latest
PRETTY_NAME="Alpine Linux v3.22"