GPU Cluster Commands
This is an example showcasing how admins can remotely execute a command on GPU nodes using the Rafay API.
Request API¶
In this example, we demonstrate how to send two distinct commands to specified GPU nodes using the Rafay API.
nvidia-smi -q
nvidia-smi -L
API Request to Execute Commands¶
Use the following curl
command to send the commands:
curl -X 'POST' \
'https://console.rafay.dev/cmdexec/v1/projects/gkjzn02/edges/ky5x842/execute/' \
-H 'accept: application/json' \
-H 'X-RAFAY-API-KEYID: ra2.*******************' \
-H 'Content-Type: application/json' \
-d '{
"target_type": "cluster",
"command": "nvidia-smi -q;nvidia-smi -L",
"timeout": 60
}'
gkjzn02
is the project ID, and ky5x842
is the cluster ID.
Output
The output response includes an Exec ID in the form of Id, which is used to query the results of the command execution as shown below
Retrieve Results¶
To retrieve the results of the executed commands, use the Exec ID (e.g.,
curl -X 'GET' \
'https://console.rafay.dev/cmdexec/v1/projects/gkjzn02/edges/ky5x842/execution/z24w40m/' \
-H 'accept: application/json' \
-H 'X-RAFAY-API-KEYID: ra2.*************************'
This API call will return the results of the commands executed on the GPU nodes.
A downloadable tarball containing a shell script and a README explaining how to use the script is available: here