GitHub Repository

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

66 starsPython

Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer

Name: Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer
Availability: InStock
Author: dogacel

by dogacel·May 26, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainBold Bet

Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.

Strengths

•Multi-agent architecture with specialized Profiler, Research, and Workload inspector agents.
•34.93x speedup is contest-validated with public benchmark results and runtime tables.
•Runs optimization loop on Modal cloud—no local GPU required for development.

Weaknesses

•Tied to Claude Code CLI—no flexibility to swap in other models or agents.
•Competition-specific setup may not generalize cleanly to arbitrary kernel optimization tasks.