Gonfire – analyze Claude Code session logs to see how candidates think
Yet another coding assessment platform, but this one parses AI agent logs.

Diagnoses agent failure modes from existing logs without spending new API credits.
AI engineers and prompt developers debugging LLM agents
LangSmith · Arize Phoenix · Braintrust
Yet another coding assessment platform, but this one parses AI agent logs.
Agent writes its own Python tools and saves rules to avoid repeating mistakes.
Scores agent logs on verifiability, separating internal traces from portable evidence.
Finally, an MCP server that uses your actual cookies instead of spawning headless browsers.
Controls your actual logged-in browser session instead of spinning up headless instances.
403 error on the landing page — can't verify the paper or any implementation exists.