Back to browse
Gonfire – analyze Claude Code session logs to see how candidates think

Gonfire – analyze Claude Code session logs to see how candidates think

by abr0ahm·May 17, 2026·1 point·0 comments

AI Analysis

MidSolve My Problem

Yet another coding assessment platform, but this one parses AI agent logs.

Strengths
  • Targets the emerging 'AI Engineer' interview format specifically.
  • Focuses on process analysis rather than just final code output.
Weaknesses
  • Requires candidates to use specific tooling (Claude Code) which limits adoption.
  • No clear differentiation from existing take-home assessment platforms like HackerRank.
Target Audience

Hiring managers and technical interviewers at startups

Similar To

HackerRank · CodeSignal · TakeHome.io

Post Description

When I graduated from a CS program in 2020, leetcode was basically a SWE entrance exam. Your ability to solve a coding puzzle thrown at you on the spot determined your fate.

Recently, I’ve interviewed for a handful of “AI Engineer” positions at several startups and I noticed a shift in the format of technical assessments. Timed OAs and live leetcoding have been replaced with a “case study” format where AI use is encouraged. These were the two main patterns I saw:

1. Take home: Candidate clones a github repo or receives a zip file with starter code and README. They complete the assignment according to the instructions using any tools or resources that they would like, the final code gets pushed up to a github repo and the user submits a link to the repo. The hiring team evaluates the submission.

2. Live assessment: Candidate is live on a call with an interviewer with screenshare. Candidate clones a github repo or receives a zip file with starter code and README instructions. The interviewer observes the candidate think out loud to assess how they solve the problem using AI.

Both of these formats still seem sub-optimal. Reviewing a submitted take-home solution involves the HM sifting through a codebase that is entirely AI generated and reveals little about the candidate’s thought process or problem solving ability. Live “vibe” assessment takes a whole hour of time from the interviewer (which was often the CTO) per candidate.

Moreover they are throwing away the most valuable piece of info: the claude code session log.

I built Gonfire, which consists of a proxy which records and analyzes a candidate’s claude code interactions while solving the assessment and displays a digestible report to a hiring manager. *I’ve refrained from deriving any quantitative metrics of performance until I feel confident that there is a solid basis for any such metric, so the analysis is primarily qualitative for now.

I took an assessment myself, you can view my results in the demo.

Live demo: https://app.gonfire.io ([email protected] / Aa123123123123)

Relevant post from Anthropic: <https://www.anthropic.com/engineering/AI-resistant-technical...>

This could allow for some interesting directions in the future:

- “Anti-Spoiler” - Prevent LLMs from spoiling key problem insights/ideation

- Clustering candidates based on distinguishing features of their thinking process

Similar Projects

Developer Tools●●Solid

Recover bricked Claude Code sessions with "thinking blocks" error

The repo does one painful thing well: it inspects SESSION_ID.jsonl, pinpoints where streaming/interleaving corrupted 'thinking' blocks, and offers diagnose, fix (with auto-backup), and nuke fallbacks. It's pragmatic and ruthless — small CLI commands, clear recovery workflow, and rescue-first guidance — but it's narrowly targeted to Claude Code users and won't matter to most people.

Niche GemSolve My Problem
miteshashar
104mo ago