Sift, a small CLI that groups noisy test failures into root causes
Compresses 198k tokens to 129 by grouping test failures before the agent sees them.

Auto-clusters failure patterns across sessions and suggests prompt patches.
AI engineers, LLM application developers
Langfuse · Arize Phoenix · Logfire
AI agents don't crash. They just quietly give wrong answers. You end up scrolling through traces one by one, trying to find a pattern across hundreds of sessions.
Kelet automates that investigation. Here's how it works:
1. You connect your traces and signals (user feedback, edits, clicks, sentiment, LLM-as-a-judge, etc.) 2. Kelet processes those signals and extracts facts about each session 3. It forms hypotheses about what went wrong in each case 4. It clusters similar hypotheses across sessions and investigates them together 5. It surfaces a root cause with a suggested fix you can review and apply
The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge.
The fastest way to integrate is through the Kelet Skill for coding agents — it scans your codebase, discovers where signals should be collected, and sets everything up for you. There are also Python and TypeScript SDKs if you prefer manual setup.
It’s currently free during beta. No credit card required. Docs: https://kelet.ai/docs/
I'd love feedback on the approach, especially from anyone running agents in prod. Does automating the manual error analysis sound right?
Compresses 198k tokens to 129 by grouping test failures before the agent sees them.
Heuristic-first parsing cuts 198K tokens to 129 before the LLM ever sees output.
Kubernetes root cause via dependency graphs, but kubectl debug and observability tools already solve this.
Toyota factory discipline for runaway LLM agents—stops bad deploys, learns from failures.
198k tokens down to 129 — local heuristics beat LLM summarization.
Multi-cloud diagnosis in <30s, but infra observability (Datadog, New Relic) already solves this better.