Is grep enough? A transparent benchmark for agentic code navigation

Name: Is grep enough? A transparent benchmark for agentic code navigation
Availability: InStock
Author: bonigv

by bonigv·Jun 30, 2026·2 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Grep wins on correctness, but tree-sitter cuts tokens by 50% on hard tasks.

Strengths

•150 blind-judged runs across 10 real codebases with full transparency
•Concrete token counts: 395k structural vs 780k baseline for same answers
•All scripts, Docker images, and transcripts publicly available

Weaknesses

•n=1 per cell with no significance testing limits statistical confidence
•Research findings, not a tool you can directly use in your workflow

Post Description

Felt LSP Servers were too complex. Bash tools alone too brutish. Wanted to see what if it is a tree-sitter as a firstclass tool. Ran a bench over 10 large codebases [bitcon, django, rails, redis,...] at 5 levels of exploration complexity each. That 150 context isolated runs over the last few days. Sharing the results with full tarnsparency. All scripts, docker image scripts, all transcrpts. There is a TL;DR; but I hope you don't leave it at that. Has been quite a bit of work. Repo links are on the site.