Sift, a small CLI that groups noisy test failures into root causes
Compresses 198k tokens to 129 by grouping test failures before the agent sees them.
Turn noisy command output into a short, actionable first pass for coding agents.
Heuristic-first parsing cuts 198K tokens to 129 before the LLM ever sees output.
Developers using AI coding agents for debugging
pytest-rich · pytest-sugar
A test run fails, you get a huge wall of output, and most of the effort goes into figuring out what actually went wrong.
In many cases, the failures are not independent. It’s the same issue repeated across many tests.
In one case: 128 failures → 2 root causes
I built a small CLI that groups repeated failures into shared root causes before passing the result to the model.
It’s mainly built for coding agents, but works on raw CLI output as well.
On my backend tests, this reduced debugging time and token usage quite a bit.
Compresses 198k tokens to 129 by grouping test failures before the agent sees them.
198k tokens down to 129 — local heuristics beat LLM summarization.
Kubernetes root cause via dependency graphs, but kubectl debug and observability tools already solve this.
Claude diagnoses pipeline failures in seconds, but log analysis tools already exist.
Auto-clusters failure patterns across sessions and suggests prompt patches.
Read-only AI agent architecture prevents production accidents during incident response.