GitHub Repository

Smart(er) code reading for humans and AI agents. Reduces cost per correct answer by ~40% on average. Install: cargo install tilth -or- npx tilth

313 starsRust

Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)

Name: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)
Availability: InStock
Author: jahala

by jahala·Feb 14, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainSolve My Problem

Tree-sitter MCP cuts Claude code task costs 17–82% while improving accuracy.

Strengths

•Rigorous benchmarking across 4 real repos, 3 Claude models, measuring cost-per-correct-answer—not just token savings.
•Smart deduplication (session memory, shared expand budget) minimizes redundant token spend across multi-turn agent queries.
•Structural code intelligence (definitions vs. usages, callee resolution, outline folding) beats text search for AI reasoning.

Weaknesses

•Tree-sitter language support limited; repo-specific tool adoption (Haiku ignores MCP entirely—instruction tuning failed).
•Early-stage adoption friction; value locked behind agent integration, not immediately useful to individual developers.

Post Description

tilth gives AI agents structural code intelligence (tree-sitter definitions, callee resolution, smart outlining) via MCP. I benchmarked it on 21 code navigation tasks across 4 real repos (Express, FastAPI, Gin, ripgrep).

-> https://github.com/jahala/tilth

Results: Sonnet 4.5 — 26% cheaper per correct answer (79% → 86% accuracy). Opus 4.6 — 14% cheaper (and the only model+mode combo to crack the hardest task). Haiku 4.5 — 82% cheaper when forced to use tilth (69% → 100% accuracy at $0.04/answer).

We measure “cost per correct answer” — what you’d expect to spend before getting a usable answer under retry. A wrong answer isn’t a cheap success.

Interesting finding: smarter models adopt MCP tools voluntarily (Sonnet 95%, Opus 94%), but Haiku ignores them (9%). Instruction tuning didn’t help. Removing the overlapping built-in tools did.

https://github.com/jahala/tilth/blob/main/benchmark/README.m...

PS: I dont have the budget to run the benchmark a lot with Opus, so if any token whales has capacity to run some benchmarks, please feel free to PR results.