Deep Dive into nvidia-smi
: Monitoring Your NVIDIA GPU with Real Examples¶
Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi
is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.
In this blog, we’ll explore what nvidia-smi
is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.
What is nvidia-smi
?¶
nvidia-smi
is a CLI utility bundled with the NVIDIA driver. It enables:
- Real-time GPU monitoring
- Driver and CUDA version discovery
- Process visibility and control
- GPU configuration and performance tuning
You can execute it using:
nvidia-smi
Breakdown by Section¶
Let's use real life output from a system using the T1000 8GB Nvidia GPU to review each section in detail.
Driver and CUDA Info¶
Driver Version: 550.163.01
CUDA Version: 12.4
- Driver Version: Installed NVIDIA kernel driver
- CUDA Version: Max supported CUDA runtime version
GPU Status Table¶
Metric | Value |
---|---|
GPU Name | NVIDIA T1000 8GB |
Temp | 69°C |
Fan Speed | 52% |
Power Cap | 50W |
GPU Utilization | 4% |
Memory Usage | 473 MiB / 8192 MiB |
Performance State | P0 (max performance) |
In this case, the GPU is mostly idle, used lightly by background processes.
Running GPU Processes¶
GPU PID Type Process Name GPU Memory
0 2154 G /usr/lib/xorg/Xorg 99 MiB
0 2430 C+G gnome-remote-desktop-daemon 132 MiB
0 2485 G /usr/bin/gnome-shell 150 MiB
0 4535 G seed-version-20250825-050038.168000 49 MiB
G
: Graphics processC+G
: Uses both compute and graphicsseed-version-...
: Likely a custom or sandboxed job with a version tag
To investigate it further:
ps -fp 4535
ls -l /proc/4535/exe
Practical Use Cases¶
Monitor GPU Live¶
watch -n 1 nvidia-smi
Updates output every second.
Kill a GPU Process¶
sudo kill -9 <PID>
Or reset the GPU:
sudo nvidia-smi --gpu-reset -i 0
Query Usage via Script¶
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv
Useful for logging and dashboards.
Tips for Advanced Users¶
Enable persistence mode
sudo nvidia-smi -pm 1
Restrict compute access
sudo nvidia-smi -c EXCLUSIVE_PROCESS
View app clocks
nvidia-smi -q -d CLOCK
Summary¶
With tools like nvidia-smi
, you gain critical visibility into GPU usage and health. It’s an essential part of any ML or HPC workflow. We have developed the integrated GPU Dashboards in the Rafay Platform to provide the same information in a graphical manner. In additionl, users do not require any form of privileged, root access to visualize this critical data.
Feature | Benefit |
---|---|
Monitor Usage | Check GPU load, temp, memory |
Debug Issues | Kill or trace problematic PIDs |
Multi-GPU Support | Check all GPUs in one place |
Script Integration | Log metrics via CSV |
Lightweight Tool | Works even without CUDA toolkit |
-
Free Org
Sign up for a free Org if you want to try this yourself with our Get Started guides.
-
Live Demo
Schedule time with us to watch a demo in action.