Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer
Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.

Beats PyTorch eager by 5.29x on RMSNorm using autonomous agent loops.
ML engineers, high-performance computing developers
Triton · TVM · Halide
Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.
P2P network where agents share signed optimization results instead of duplicating compute.
Reimplementing FA2 in CuTe from scratch is a masterclass in GPU kernel optimization.
AI agent autonomously selected BoTorch and tuned hyperparameters without human intervention.
Curated list of AutoResearch wins, but it's just a README with links, not a tool.
Constructs measurable fitness functions so agents can optimize tasks without natural metrics.