Stop Losing LangGraph Progress to 429 Errors
Races providers, coordinates retries, resumes workflows—turns 429 crashes into idempotent recovery.
Catch common retry anti-patterns (429, infinite retries, missing Retry-After) before they hit production.
Found real retry bugs in OpenClaw's 323k-star repo that ignored Retry-After headers.
AI agent developers, API integration engineers, backend developers
eslint-plugin-import · SonarQube · Semgrep
Looks fine at first. Under load it turns rate limits into request storms.
I wrote a small CLI to catch it:
npx pitstop-check ./src
It scans TS/JS and flags things like:- 429 handled without Retry-After - blanket retry of all 429s (no CAP vs WAIT distinction) - unbounded retry loops (no max elapsed)
Example (ran against OpenClaw):[WARN] src/agents/venice-models.ts:24 — 429 handled without Retry-After [WARN] src/agents/venice-models.ts:24 — All 429s treated as retryable — CAP vs WAIT not distinguished
The retry primitive supports Retry-After. The callers just don’t wire it up.So when the API returns Retry-After: 600, the client retries on its own schedule instead of backing off.
What’s going on is basically collapsing different failure modes into one:
WAIT — respect Retry-After CAP — limit retries / concurrency STOP — don’t retry
Most code just does:retry()
The tool is heuristic (will flag some test files), but it’s been useful for quickly spotting this in real repos.Races providers, coordinates retries, resumes workflows—turns 429 crashes into idempotent recovery.
Forces LLMs to debug with AST evidence instead of pattern-matching symptoms.
AI PR generation for typos and copy, but bug reporting tools already exist elsewhere.
Session replay + auto-generated bug steps from recordings saves real time vs. manual Slack screenshots.
Source-controlled PR checks that agents can fix; solves agent code review at scale.
Real-time structural failure detection when LangSmith only shows post-mortems.