Tokenscope – see what your Claude Code session cost
Shows cache-read costs eating 66% of your Claude Code bill when dashboards only show totals.

Yet another coding assessment platform, but this one parses AI agent logs.
Hiring managers and technical interviewers at startups
HackerRank · CodeSignal · TakeHome.io
Recently, I’ve interviewed for a handful of “AI Engineer” positions at several startups and I noticed a shift in the format of technical assessments. Timed OAs and live leetcoding have been replaced with a “case study” format where AI use is encouraged. These were the two main patterns I saw:
1. Take home: Candidate clones a github repo or receives a zip file with starter code and README. They complete the assignment according to the instructions using any tools or resources that they would like, the final code gets pushed up to a github repo and the user submits a link to the repo. The hiring team evaluates the submission.
2. Live assessment: Candidate is live on a call with an interviewer with screenshare. Candidate clones a github repo or receives a zip file with starter code and README instructions. The interviewer observes the candidate think out loud to assess how they solve the problem using AI.
Both of these formats still seem sub-optimal. Reviewing a submitted take-home solution involves the HM sifting through a codebase that is entirely AI generated and reveals little about the candidate’s thought process or problem solving ability. Live “vibe” assessment takes a whole hour of time from the interviewer (which was often the CTO) per candidate.
Moreover they are throwing away the most valuable piece of info: the claude code session log.
I built Gonfire, which consists of a proxy which records and analyzes a candidate’s claude code interactions while solving the assessment and displays a digestible report to a hiring manager. *I’ve refrained from deriving any quantitative metrics of performance until I feel confident that there is a solid basis for any such metric, so the analysis is primarily qualitative for now.
I took an assessment myself, you can view my results in the demo.
Live demo: https://app.gonfire.io ([email protected] / Aa123123123123)
Relevant post from Anthropic: <https://www.anthropic.com/engineering/AI-resistant-technical...>
This could allow for some interesting directions in the future:
- “Anti-Spoiler” - Prevent LLMs from spoiling key problem insights/ideation
- Clustering candidates based on distinguishing features of their thinking process
Shows cache-read costs eating 66% of your Claude Code bill when dashboards only show totals.
First analytics layer for Claude Code revealing 26% session abandonment rate.
Analyzes Claude Code sessions, but mostly visualizes what Claude already logs.
The repo does one painful thing well: it inspects SESSION_ID.jsonl, pinpoints where streaming/interleaving corrupted 'thinking' blocks, and offers diagnose, fix (with auto-backup), and nuke fallbacks. It's pragmatic and ruthless — small CLI commands, clear recovery workflow, and rescue-first guidance — but it's narrowly targeted to Claude Code users and won't matter to most people.
Claude Code usage dashboard reading local files—fills exact gap Anthropic didn't address.
Self-contained HTML replays beat screenshots, but solves niche problem for Claude Code users.