Skip to content

Deep Dive into nvidia-smi: Monitoring Your NVIDIA GPU with Real Examples

Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.

In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.


What is nvidia-smi?

nvidia-smi is a CLI utility bundled with the NVIDIA driver. It enables:

  • Real-time GPU monitoring
  • Driver and CUDA version discovery
  • Process visibility and control
  • GPU configuration and performance tuning

You can execute it using:

nvidia-smi

Breakdown by Section

Let's use real life output from a system using the T1000 8GB Nvidia GPU to review each section in detail.

Nvidia SMI Example Output

Driver and CUDA Info

Driver Version: 550.163.01
CUDA Version: 12.4
  • Driver Version: Installed NVIDIA kernel driver
  • CUDA Version: Max supported CUDA runtime version

GPU Status Table

Metric Value
GPU Name NVIDIA T1000 8GB
Temp 69°C
Fan Speed 52%
Power Cap 50W
GPU Utilization 4%
Memory Usage 473 MiB / 8192 MiB
Performance State P0 (max performance)

In this case, the GPU is mostly idle, used lightly by background processes.


Running GPU Processes

GPU   PID   Type   Process Name                                GPU Memory
0     2154   G     /usr/lib/xorg/Xorg                            99 MiB
0     2430   C+G   gnome-remote-desktop-daemon                  132 MiB
0     2485   G     /usr/bin/gnome-shell                         150 MiB
0     4535   G     seed-version-20250825-050038.168000           49 MiB
  • G: Graphics process
  • C+G: Uses both compute and graphics
  • seed-version-...: Likely a custom or sandboxed job with a version tag

To investigate it further:

ps -fp 4535
ls -l /proc/4535/exe

Practical Use Cases

Monitor GPU Live

watch -n 1 nvidia-smi

Updates output every second.

Kill a GPU Process

sudo kill -9 <PID>

Or reset the GPU:

sudo nvidia-smi --gpu-reset -i 0

Query Usage via Script

nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

Useful for logging and dashboards.


Tips for Advanced Users

Enable persistence mode

sudo nvidia-smi -pm 1

Restrict compute access

sudo nvidia-smi -c EXCLUSIVE_PROCESS

View app clocks

nvidia-smi -q -d CLOCK

Summary

With tools like nvidia-smi, you gain critical visibility into GPU usage and health. It’s an essential part of any ML or HPC workflow. We have developed the integrated GPU Dashboards in the Rafay Platform to provide the same information in a graphical manner. In additionl, users do not require any form of privileged, root access to visualize this critical data.

Feature Benefit
Monitor Usage Check GPU load, temp, memory
Debug Issues Kill or trace problematic PIDs
Multi-GPU Support Check all GPUs in one place
Script Integration Log metrics via CSV
Lightweight Tool Works even without CUDA toolkit