Back to browse
Is grep enough? A transparent benchmark for agentic code navigation

Is grep enough? A transparent benchmark for agentic code navigation

by bonigv·Jun 30, 2026·2 points·2 comments

AI Analysis

●●SolidBig BrainNiche Gem

Grep wins on correctness, but tree-sitter cuts tokens by 50% on hard tasks.

Strengths
  • 150 blind-judged runs across 10 real codebases with full transparency
  • Concrete token counts: 395k structural vs 780k baseline for same answers
  • All scripts, Docker images, and transcripts publicly available
Weaknesses
  • n=1 per cell with no significance testing limits statistical confidence
  • Research findings, not a tool you can directly use in your workflow
Target Audience

AI agent developers, tooling engineers

Post Description

Felt LSP Servers were too complex. Bash tools alone too brutish. Wanted to see what if it is a tree-sitter as a firstclass tool. Ran a bench over 10 large codebases [bitcon, django, rails, redis,...] at 5 levels of exploration complexity each. That 150 context isolated runs over the last few days. Sharing the results with full tarnsparency. All scripts, docker image scripts, all transcrpts. There is a TL;DR; but I hope you don't leave it at that. Has been quite a bit of work. Repo links are on the site.

Similar Projects