Codex context bloat? 87% avg reduction on SWE-bench Verified traces

Name: Codex context bloat? 87% avg reduction on SWE-bench Verified traces
Availability: InStock
Author: george_ciobanu

by george_ciobanu·Apr 24, 2026·10 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Transparent proxy cuts Codex context tokens by 87% via working memory.

Strengths

•Benchmarks show massive token reduction on SWE-bench traces.
•Zero code changes required, sits between Codex and OpenAI.
•Open source TypeScript implementation includes replay testing suite.

Weaknesses

•Tied specifically to Codex, brittle if upstream API changes.
•Working memory logic is opaque, risk of losing critical context.

Post Description

If you had to build a context window manager in 24h, would you stick to the existing model or come up with something better?

Here's what I did:

1. Built a proxy that intercepts Codex's calls to OpenAI and rewrites them on the fly.

2. Replayed 3,807 rounds of SWE-bench Verified traces through it: avg prompt 44k → 6k tokens (-87%).

3. Posted it to HN to get the next reduction applied to my confidence interval — starting with the inevitable "How about accuracy?"

npx -y pando-proxy · github.com/human-software-us/pando-proxy

Similar Projects

AI/ML●●●Banger

97% on SWE-bench Verified with subscription-token agents

97% on SWE-bench Verified with full artifact transparency, not just a score claim.

Big BrainZero to One

kimjune01

2019d ago

AI/ML○Pass

All the LM solutions on SWE-bench are bloated compared to humans

Twitter thread with a chart; not a product or tool.

lieret

103mo ago

Developer Tools●●Solid

A open-source, local trace viewer for Claude Code and Codex sessions

Chrome DevTools for Claude Code sessions when LangSmith drops local tool calls.

Solve My ProblemNiche Gem

MediumD

101mo ago

Developer Tools●Mid

Salacia – The First Runtime OS for Agentic Coding

Fault-localization scaffolding for AI agents; claims 93% top-5 recall, but Cursor/Cline already integrate similar.

Big BrainBold Bet

alfredhua

203mo ago

AI/ML●●Solid

PrismoDev – local CLI for finding token waste in Claude Code/Codex

Finally, a tool that tells you why your AI coding bill is exploding.

Solve My ProblemBig Brain

shanirshad

2124d ago

Developer Tools●●Solid

We achieved 72.2% issue resolution on SWE-bench Verified using AI teams

They split responsibilities across isolated agents (engineer, reviewer, manager) that get real shell access and independent filesystems, which makes failures traceable and lets you tune model capacity per role. Hitting 72.2% on SWE-bench Verified with no benchmark-specific tuning is an impressive empirical result — interesting architecture and strong evidence — though the security and long-term reliability of autonomous shell-executing agents remain the big open questions.

WizardryBig Brain

NBenkovich

204mo ago