Back to browse
GitHub Repository

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

66 starsPython

Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer

by dogacel·May 26, 2026·1 point·0 comments

AI Analysis

●●●BangerWizardryBig BrainBold Bet

Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.

Strengths
  • Multi-agent architecture with specialized Profiler, Research, and Workload inspector agents.
  • 34.93x speedup is contest-validated with public benchmark results and runtime tables.
  • Runs optimization loop on Modal cloud—no local GPU required for development.
Weaknesses
  • Tied to Claude Code CLI—no flexibility to swap in other models or agents.
  • Competition-specific setup may not generalize cleanly to arbitrary kernel optimization tasks.
Category
Target Audience

ML engineers, GPU kernel developers, researchers working on attention mechanisms

Similar To

Cursor · Triton GPT · AI-assisted kernel compilers

Similar Projects