Agent-triage – diagnosis of agent failures from production traces
Replays agent traces step-by-step to pinpoint exact failure turns automatically.
Yet another CI debugger when act and devcontainers already solve this.
Backend developers, DevOps engineers debugging CI/CD pipelines
act · nektos/act · devcontainers
Turn any CI failure into a replayable artifact. Works on Linux, macOS, and Termux (Android).
- Portable .cit format - Honest runtime validation - Hermetic replay semantics - No hidden environment mutation
Install: `curl -fsSL https://github.com/hknzer/citadeld/releases/download/v1.0.0/... -o ~/.local/bin/citadeld && chmod +x ~/.local/bin/citadeld`
Replays agent traces step-by-step to pinpoint exact failure turns automatically.
Git for agent cognition—clever framework, but no working implementation yet.
Turns failing agent runs into a self-contained, inspectable package: report.html for human review and compare-report.json for automatic CI decisions. The evidence manifest + integrity checks and the option to apply redaction before artifacts are written are smart, practical details that make offline handoff and automated gating actually usable for teams building agents.
Compresses 198k tokens to 129 by grouping test failures before the agent sees them.
Heuristic-first parsing cuts 198K tokens to 129 before the LLM ever sees output.
Deterministic capture + replay for LLM agents is a practical, under-served problem and this repo actually ships a 'golden run' zip with cold‑run verification hashes — that’s the kind of evidence chain auditors want. The focus on portable evidence bundles and stress verification suggests useful forensics and load testing of agent logic, but the release page looks early-stage; I'd like to see integrations (tooling for popular agent frameworks), richer docs, and example pipelines before I'd evangelize it.