Back to browse
All the LM solutions on SWE-bench are bloated compared to humans

All the LM solutions on SWE-bench are bloated compared to humans

by lieret·Mar 4, 2026·1 point·0 comments

AI Analysis

Pass

Twitter thread with a chart; not a product or tool.

Strengths
  • Clear visual comparison of patch bloat across multiple models.
  • Identifies real pattern in LLM code generation (verbose output).
Weaknesses
  • No actionable tool, framework, or code provided—pure analysis posted to Twitter.
  • No link to reproducible methodology or dataset beyond the thread.
Category
Target Audience

AI researchers, ML practitioners studying code generation models

Similar Projects

AI/ML●●●Banger

97% on SWE-bench Verified with subscription-token agents

97% on SWE-bench Verified with full artifact transparency, not just a score claim.

Big BrainZero to One
kimjune01
2010d ago