All the LM solutions on SWE-bench are bloated compared to humans

Name: All the LM solutions on SWE-bench are bloated compared to humans
Availability: InStock
Author: lieret

by lieret·Mar 4, 2026·1 point·0 comments

AI Analysis

○Pass

Twitter thread with a chart; not a product or tool.

Strengths

Weaknesses

•No actionable tool, framework, or code provided—pure analysis posted to Twitter.
•No link to reproducible methodology or dataset beyond the thread.

AI/ML●●●Banger

Game-based AI benchmark measuring spatial reasoning against human speedrun records.

Big BrainNiche Gem

ClassicRob

3027d ago

AI/ML●Mid

Pre-registered methodology shows 17/50 wrong fixes ungated vs 0/50 gated, but the actual tool is private.

Big Brain

kolesnikov-arch

2114d ago

Transparent proxy cuts Codex context tokens by 87% via working memory.

Big BrainNiche Gem

george_ciobanu

1022mo ago

AI/ML●●Solid

Multilingual tokenization comparison across Arabic, Chinese, French that LangSmith ignores.

Big BrainNiche Gem

lognebudo

104mo ago

AI/ML●●●Banger

97% on SWE-bench Verified with full artifact transparency, not just a score claim.

Big BrainZero to One

kimjune01

201mo ago

AI/ML●●Solid

Beats humans at pronunciation scoring but doesn't ship product integration yet.

Big BrainWizardry

fabiosuizu

1315mo ago