Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)
44% cheaper Claude code navigation via tree-sitter definitions + call resolution.
Smart(er) code reading for humans and AI agents. Reduces cost per correct answer by ~40% on average. Install: cargo install tilth -or- npx tilth
Tree-sitter MCP cuts Claude code task costs 17–82% while improving accuracy.
AI/LLM engineers, developers building code-aware agents, teams using Claude or other LLMs for code analysis
Sourcegraph Cody (code context for LLMs) · Continue.dev (agent code tools) · Cursor's built-in code navigation
-> https://github.com/jahala/tilth
Results: Sonnet 4.5 — 26% cheaper per correct answer (79% → 86% accuracy). Opus 4.6 — 14% cheaper (and the only model+mode combo to crack the hardest task). Haiku 4.5 — 82% cheaper when forced to use tilth (69% → 100% accuracy at $0.04/answer).
We measure “cost per correct answer” — what you’d expect to spend before getting a usable answer under retry. A wrong answer isn’t a cheap success.
Interesting finding: smarter models adopt MCP tools voluntarily (Sonnet 95%, Opus 94%), but Haiku ignores them (9%). Instruction tuning didn’t help. Removing the overlapping built-in tools did.
https://github.com/jahala/tilth/blob/main/benchmark/README.m...
PS: I dont have the budget to run the benchmark a lot with Opus, so if any token whales has capacity to run some benchmarks, please feel free to PR results.
44% cheaper Claude code navigation via tree-sitter definitions + call resolution.
Instruction tuning on tool descriptions cut Sonnet costs 29% without code changes.
Routes subagents at the gateway level instead of forcing the main agent to waste tokens on routing decisions.
Claude sees your Neuroglancer viewport and reasons about what it sees — genuine co-scientist workflow.
Tree-style tabs for agent sessions solve the flat-terminal scaling problem nicely.
Merkle tree hashing detects stale files before reusing subagent context.