Back to browse
GitHub Repository
49 starsRust

Nabla – Pure Rust GPU math engine, 7.5× faster matmul than PyTorch

by fumishiki·Mar 1, 2026·1 point·1 comment

AI Analysis

●●SolidWizardryBig Brain

Pure Rust autodiff + GPU math avoids C++ FFI hell, but matmul claim needs apples-to-apples benchmarks.

Strengths
  • Pure Rust implementation eliminates C++ FFI complexity and dependency bloat, enabling true cross-platform CUDA/Vulkan/AMD kernels from one codebase
  • Loss-backward() API and kernel fusion (fuse!()) with einsum! macros reduce boilerplate vs hand-rolled CUDA
  • Benchmark on GH200 is credible hardware, though TF32 vs FP32 comparison requires scrutiny
Weaknesses
  • 7.5× speedup is precision-qualified (TF32 vs PyTorch default); apples-to-apples FP32 shows only 1.6× advantage
  • Minimal ecosystem maturity: only 7 GitHub stars, unclear adoption, and no real-world training examples beyond micro benchmarks
Target Audience

Rust developers building ML systems, inference pipelines, and numerical computing

Similar To

PyTorch · tch-rs · Candle

Similar Projects