Deep Dive into `nvidia-smi`: Monitoring Your NVIDIA GPU with Real Examples¶

Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.

In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.

What is `nvidia-smi`?¶

nvidia-smi is a CLI utility bundled with the NVIDIA driver. It enables:

Real-time GPU monitoring
Driver and CUDA version discovery
Process visibility and control
GPU configuration and performance tuning

You can execute it using:

nvidia-smi

Breakdown by Section¶

Let's use real life output from a system using the T1000 8GB Nvidia GPU to review each section in detail.

Driver and CUDA Info¶

Driver Version: 550.163.01
CUDA Version: 12.4

Driver Version: Installed NVIDIA kernel driver
CUDA Version: Max supported CUDA runtime version

GPU Status Table¶

Metric	Value
GPU Name	NVIDIA T1000 8GB
Temp	69°C
Fan Speed	52%
Power Cap	50W
GPU Utilization	4%
Memory Usage	473 MiB / 8192 MiB
Performance State	P0 (max performance)

In this case, the GPU is mostly idle, used lightly by background processes.

Running GPU Processes¶

GPU   PID   Type   Process Name                                GPU Memory
0     2154   G     /usr/lib/xorg/Xorg                            99 MiB
0     2430   C+G   gnome-remote-desktop-daemon                  132 MiB
0     2485   G     /usr/bin/gnome-shell                         150 MiB
0     4535   G     seed-version-20250825-050038.168000           49 MiB

G: Graphics process
C+G: Uses both compute and graphics
seed-version-...: Likely a custom or sandboxed job with a version tag

To investigate it further:

ps -fp 4535
ls -l /proc/4535/exe

Practical Use Cases¶

Monitor GPU Live¶

watch -n 1 nvidia-smi

Updates output every second.

Kill a GPU Process¶

sudo kill -9 <PID>

Or reset the GPU:

sudo nvidia-smi --gpu-reset -i 0

Query Usage via Script¶

nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

Useful for logging and dashboards.

Tips for Advanced Users¶

Enable persistence mode

sudo nvidia-smi -pm 1

Restrict compute access

sudo nvidia-smi -c EXCLUSIVE_PROCESS

View app clocks

nvidia-smi -q -d CLOCK

Summary¶

With tools like nvidia-smi, you gain critical visibility into GPU usage and health. It’s an essential part of any ML or HPC workflow. We have developed the integrated GPU Dashboards in the Rafay Platform to provide the same information in a graphical manner. In additionl, users do not require any form of privileged, root access to visualize this critical data.

Feature	Benefit
Monitor Usage	Check GPU load, temp, memory
Debug Issues	Kill or trace problematic PIDs
Multi-GPU Support	Check all GPUs in one place
Script Integration	Log metrics via CSV
Lightweight Tool	Works even without CUDA toolkit

Free Org

Sign up for a free Org if you want to try this yourself with our Get Started guides.

Free Org
Live Demo

Schedule time with us to watch a demo in action.

Schedule Demo

Deep Dive into nvidia-smi: Monitoring Your NVIDIA GPU with Real Examples¶

What is nvidia-smi?¶