Gonfire – analyze Claude Code session logs to see how candidates think

Name: Gonfire – analyze Claude Code session logs to see how candidates think
Availability: InStock
Author: abr0ahm

by abr0ahm·May 17, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidSolve My Problem

Yet another coding assessment platform, but this one parses AI agent logs.

Strengths

•Targets the emerging 'AI Engineer' interview format specifically.
•Focuses on process analysis rather than just final code output.

Weaknesses

•Requires candidates to use specific tooling (Claude Code) which limits adoption.
•No clear differentiation from existing take-home assessment platforms like HackerRank.

Post Description

When I graduated from a CS program in 2020, leetcode was basically a SWE entrance exam. Your ability to solve a coding puzzle thrown at you on the spot determined your fate.

Recently, I’ve interviewed for a handful of “AI Engineer” positions at several startups and I noticed a shift in the format of technical assessments. Timed OAs and live leetcoding have been replaced with a “case study” format where AI use is encouraged. These were the two main patterns I saw:

1. Take home: Candidate clones a github repo or receives a zip file with starter code and README. They complete the assignment according to the instructions using any tools or resources that they would like, the final code gets pushed up to a github repo and the user submits a link to the repo. The hiring team evaluates the submission.

2. Live assessment: Candidate is live on a call with an interviewer with screenshare. Candidate clones a github repo or receives a zip file with starter code and README instructions. The interviewer observes the candidate think out loud to assess how they solve the problem using AI.

Both of these formats still seem sub-optimal. Reviewing a submitted take-home solution involves the HM sifting through a codebase that is entirely AI generated and reveals little about the candidate’s thought process or problem solving ability. Live “vibe” assessment takes a whole hour of time from the interviewer (which was often the CTO) per candidate.

Moreover they are throwing away the most valuable piece of info: the claude code session log.

I built Gonfire, which consists of a proxy which records and analyzes a candidate’s claude code interactions while solving the assessment and displays a digestible report to a hiring manager. *I’ve refrained from deriving any quantitative metrics of performance until I feel confident that there is a solid basis for any such metric, so the analysis is primarily qualitative for now.

I took an assessment myself, you can view my results in the demo.

Live demo: https://app.gonfire.io ([email protected] / Aa123123123123)

Relevant post from Anthropic: <https://www.anthropic.com/engineering/AI-resistant-technical...>

This could allow for some interesting directions in the future:

- “Anti-Spoiler” - Prevent LLMs from spoiling key problem insights/ideation

- Clustering candidates based on distinguishing features of their thinking process

Similar Projects

Developer Tools●●Solid

Tokenscope – see what your Claude Code session cost

Shows cache-read costs eating 66% of your Claude Code bill when dashboards only show totals.

Solve My ProblemBig Brain

wartzarbee

4017d ago

Developer Tools●●●Banger

We analyzed 1,573 Claude Code sessions to see how AI agents work

First analytics layer for Claude Code revealing 26% session abandonment rate.

Zero to OneDark Horse

keks0r

144863mo ago

Developer Tools●Mid

Argus – VSCode debugger for Claude Code sessions

Analyzes Claude Code sessions, but mostly visualizes what Claude already logs.

lydionfinance

34143mo ago

Developer Tools●●Solid

Recover bricked Claude Code sessions with "thinking blocks" error

The repo does one painful thing well: it inspects SESSION_ID.jsonl, pinpoints where streaming/interleaving corrupted 'thinking' blocks, and offers diagnose, fix (with auto-backup), and nuke fallbacks. It's pragmatic and ruthless — small CLI commands, clear recovery workflow, and rescue-first guidance — but it's narrowly targeted to Claude Code users and won't matter to most people.

Niche GemSolve My Problem

miteshashar

104mo ago

Developer Tools●●●Banger

Claud-ometer – See your Claude Code usage, costs, and sessions locally

Claude Code usage dashboard reading local files—fills exact gap Anthropic didn't address.

Solve My ProblemSlickDark Horse

deshraj

103mo ago

Developer Tools●●Solid

Claude-replay – A video-like player for Claude Code sessions

Self-contained HTML replays beat screenshots, but solves niche problem for Claude Code users.

Solve My ProblemShip It

es617

105363mo ago