Focused input cuts LLM output tokens by 63% bench on CC with FastAPI

Name: Focused input cuts LLM output tokens by 63% bench on CC with FastAPI
Availability: InStock
Author: nicola_alessi

by nicola_alessi·Mar 3, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemWizardryShip It

Dependency-graph filtering cuts output tokens 63%, not just input—Claude stops narrating when focused.

Strengths

•Output token reduction (63%) is genuinely surprising and unintuitive—most tools only optimize input.
•Local-first with session memory and zero cloud/account—real privacy, runs entirely on machine.
•Rigorous benchmarking methodology: 42 runs across multiple MCP clients, reproducible with open FastAPI repo.

Weaknesses

•Early-stage adoption: only 720 downloads, 12 agents supported; network effects matter for agent ecosystem.
•Dependency graph accuracy directly impacts value—no public discussion of parse failures or edge cases.

Post Description

I built an MCP server (vexp) that pre-indexes a codebase into a dependency graph and serves only relevant code to AI coding agents. While benchmarking it, I found something I wasn't looking for. The expected results were straightforward: less input context → lower cost, fewer tool calls → faster. But the output token reduction was the surprise.

Benchmark: 7 tasks on FastAPI (the OSS repo, ~800 Python files), 3 runs/task/arm, 42 total runs, Claude Sonnet 4.6, both arms in --strict-mcp-config isolation. Without graph: ~23 tool calls, ~40K input tokens, 504 output tokens, $0.78/task With graph: ~2.3 tool calls, ~8K input tokens, 189 output tokens, $0.33/task The 58% cost reduction and 22% speed improvement were expected. The 63% output token reduction was not. When Claude gets 40K tokens of context (most irrelevant), it generates a lot of "let me look at this file... I can see that..." narration while it orients itself. When it gets 8K tokens of pre-filtered, graph-ranked context, it skips straight to the answer. The exploration filler disappears. This seems like a general property of these models: noisy input → verbose output, focused input → focused output. I'd be curious if others have observed this in different contexts.

The approach: tree-sitter AST parsing → dependency graph in SQLite → single MCP tool (run_pipeline) that takes a task description, walks the graph, returns ranked context. Full source for high-centrality pivot nodes, compact skeletons for supporting code. Savings varied by task type — code understanding tasks saved the most (-64%), bug fixes the least (-30%). Makes sense: the more exploration a task normally requires, the more waste there is to cut.

Code: the graph resolution is handwritten Rust. The MCP transport, SQLite schema, and benchmark harness were built with Claude Code (felt appropriate). The benchmark analysis scripts were 100% Claude.

Free tier at https://vexp.dev — 2K nodes, 1 repo, no time limit. Runs locally (tree-sitter + SQLite, no cloud).

Similar Projects

AI/ML●●Solid

Memory for LLM apps that cuts input tokens up to 80% (avg 68%)

Cuts token bills 68% by swapping full history for vector-retrieved signals.

Solve My ProblemBig Brain

degutemesgen

3012d ago

Developer Tools●●Solid

Rocky-Project Hail Mary agent skill that cut output tokens ~47%

Persona-based prompting cuts tokens 47% without breaking code like Caveman styles do.

Big BrainNiche Gem

hpbyte

101mo ago

Developer Tools●●Solid

OpenSlimedit – Cut AI coding token usage by 21-45% with zero config

It actually attacks a concrete, expensive nuisance: repeated token bloat from tool schemas and file blobs. The line-range edit expansion is a neat trick — let the model reference lines instead of pasting content — and the README ships per-model benchmarks (up to ~45% savings) plus one-line installation so you can try it without changing your workflow. Expect real wins in edit-heavy sessions, though results will vary with project size and tooling.

Big BrainNiche Gem

aSidorenkoCode

283mo ago

AI/ML●●Solid

Token Saving Tinyscreenshot Skill

4x token savings on screenshots with readable text at 800px grey.

Solve My ProblemBig Brain

franze

211mo ago

AI/ML●●●Banger

Tier – Adaptive tool routing that makes small LLMs 10pt more accurate

Makes 1.5B models 10% more accurate by hiding 90% of tool descriptions.

Big BrainNiche Gem

pranabsarkar

431mo ago

AI/ML●●●Banger

AgentUQ, a token-logprob runtime gate for LLM agents

Skips heavy judge loops by using logprobs to gate agent actions at runtime.

Big BrainShip ItSolve My Problem

AntoineN2

102mo ago