AutoKernel, Auto GPU Kernel Optimization
Beats PyTorch eager by 5.29x on RMSNorm using autonomous agent loops.
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x
Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.
ML engineers, GPU kernel developers, researchers working on attention mechanisms
Cursor · Triton GPT · AI-assisted kernel compilers
Beats PyTorch eager by 5.29x on RMSNorm using autonomous agent loops.
Recursive benchmarking loop is clever, but 4 commits and gimmick license raise concerns.
Autonomous code optimization loops using fitness functions inside Claude Code plugins.
PSHUFB nibble trick beats GCC's lookup table by 4.8–6.3x on base64; reproducible fuzz methodology.
Yet another prompt optimizer when DSPy and LangChain already exist.
Vision-based VM debugging loop lets AI fix kernel panics without text logs.