Skip to content

GPU Cluster Commands

This is an example showcasing how admins can remotely execute a command on GPU nodes using the Rafay API.

Request API

In this example, we demonstrate how to send two distinct commands to specified GPU nodes using the Rafay API.

  • nvidia-smi -q
  • nvidia-smi -L

API Request to Execute Commands

Use the following curl command to send the commands:

curl -X 'POST' \
  'https://console.rafay.dev/cmdexec/v1/projects/gkjzn02/edges/ky5x842/execute/' \
  -H 'accept: application/json' \
  -H 'X-RAFAY-API-KEYID: ra2.*******************' \
  -H 'Content-Type: application/json' \
  -d '{
  "target_type": "cluster",
  "command": "nvidia-smi -q;nvidia-smi -L",
  "timeout": 60
}'
In the following example, gkjzn02 is the project ID, and ky5x842 is the cluster ID.

Output

The output response includes an Exec ID in the form of Id, which is used to query the results of the command execution as shown below

Retrieve Results

To retrieve the results of the executed commands, use the Exec ID (e.g., from the above response) with the following API call:

curl -X 'GET' \
  'https://console.rafay.dev/cmdexec/v1/projects/gkjzn02/edges/ky5x842/execution/z24w40m/' \
  -H 'accept: application/json' \
  -H 'X-RAFAY-API-KEYID: ra2.*************************'

This API call will return the results of the commands executed on the GPU nodes.

A downloadable tarball containing a shell script and a README explaining how to use the script is available: here