Back to browse
Utilyze, an open source GPU monitoring tool more accurate than nvtop

Utilyze, an open source GPU monitoring tool more accurate than nvtop

by ManyaGhobadi·Apr 27, 2026·128 points·28 comments

AI Analysis

●●SolidBig BrainSolve My Problem

Throughput-based GPU metrics expose 1% real utilization when nvtop reports 100%.

Strengths
  • Hardware performance counter sampling reveals actual compute throughput
  • Estimates attainable utilization ceiling for specific workloads
  • Open-source Apache 2.0 with negligible runtime overhead
Weaknesses
  • NVIDIA-focused, limited ROCm and other accelerator support
  • GPU monitoring space already has nvtop, cloud dashboards, and Datadog
Target Audience

ML engineers and AI infrastructure teams

Similar To

nvtop · nvidia-smi · Datadog GPU monitoring

Post Description

The standard GPU utilization metric reported by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is highly misleading. It reports the fraction of time that any kernel is running on the GPU, which means a GPU can report 100% utilization even if only a small portion of its compute capacity is actually being used. In practice, we've seen workloads with ~1–10% real compute throughput while dashboards show 100%.

This becomes a problem when teams rely on that metric for capacity planning or optimization decisions, it can make underutilized systems look saturated.

We're releasing an open-source (Apache 2.0) tool, Utilyze, to measure GPU utilization differently. It samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.

GitHub link: https://github.com/systalyze/utilyze

We'd love to hear your thoughts!

Similar Projects

Developer Tools●●●Banger

WattSeal – PC power consumption monitor

Per-app wattage attribution using RAPL and GPU counters when other monitors only show component totals.

Big BrainNiche Gem
Daminoup
402mo ago
Health●●Solid

I vibecoded a glucose analysis tool

Comprehensive AGP computation with LBGI, HBGI, GRADE—medical-grade metrics in open source.

Niche GemSolve My Problem
dclavijo
203mo ago

GPU-hot Dashboard for monitoring Nvidia GPUs on remote servers

Shoots for zero-setup GPU visibility: one docker run spins up a service you open in the browser to see live NVIDIA metrics without Prometheus, SSH, or dashboards to configure. The UI and interactive demo show attention to UX and make it instantly useful for small clusters or single-node setups. It doesn’t reinvent observability — if you need long-term metrics, alerting, or enterprise integrations you’ll still reach for exporters + Grafana — but for lightweight, immediate GPU troubleshooting this is convenient and focused.

Niche GemSlick
github-trending
213mo ago