Time Machine – Debug AI Agents by Forking and Replaying from Any Step
Fork from step 8 and replay downstream — saves money when agents fail at step 9.
Orchid - Orchestration interactive debugger - Record, inspect, & replay AI agents
Deterministic replay of agent runs without mocking—that's genuinely new.
AI/ML developers, agent builders
LangSmith · Arize · Helicone
I built it because I was tired of debugging agent failures by grepping through logs, and the available AI observability tools all seemed to require intrusive instrumentation and/or sending my prompts and responses to a cloud service. I wanted something that would let me debug agent runs locally, without having to worry about vendor lock-in or data privacy.
Orchid is that tool. The call inspection features work extremely well, at least for my use cases, but the replay feature is perhaps more interesting. It makes LLM pipeline testing deterministic without mocking or re-running expensive API calls.
Free, self-hosted, runs on your machine or infrastructure: https://github.com/mario-guerra/orchid-trace
Would love feedback from anyone building multi-step agentic systems or struggling with non-deterministic LLM test failures.
Fork from step 8 and replay downstream — saves money when agents fail at step 9.
Catches silent MCP breakage VCR.py never could—schema drift detection.
Recording what an agent considered — not just what it executed — is a tidy, concrete insight. GhostTrace already gives record/replay commands, a .ghost.json schema and a --show-phantoms terminal replay so you can inspect rejected actions and the agent's reasoning. The thing that will decide if this takes off is integrations (LangChain/OpenAI Agents/CrewAI) and the promised web/VS Code UIs; without those it's a very useful niche tool, not yet a platform.
Flight recorder for AI agents: record, replay, enforce policies on every LLM call.
Turns an agent run into a verifiable .epi bundle you can hand to auditors or replay locally for debugging. Concrete engineering choices stand out — crash-safe SQLite WAL storage, Ed25519 sealing, and an embedded viewer — though wider integrations (Kubernetes/CICD hooks, verifier tooling) and stronger ecosystem docs will be needed for real adoption.
Record production Python bugs and step backwards from crash to cause in VS Code.