Reducing LLM input tokens by 70%
Cuts token costs 70% with receipts proving no accuracy drop on hard evals.
Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON for MCP and CLI agents.
Artifact-backed tool output cuts token usage 95%, accuracy jumps 33% to 99%.
Developers building LLM agents and MCP-based systems
Mem0 · Antml.ai · LangGraph persistence patterns
I built Sift, a drop-in MCP gateway that stores tool outputs as local artifacts (filesystem blobs indexed in SQLite) and returns an `artifact_id` plus compact schema hints when responses are large or paginated.
Instead of reasoning over full JSON in the prompt, the model runs a small Python query:
def run(data, schema, params): return max(data, key=lambda x: x["magnitude"])["place"]
Query code runs in a constrained subprocess (AST/import guards + timeout/memory caps). Only the computed result is returned to the model.Benchmark (Claude Sonnet 4.6, 103 questions across 12 datasets):
- Baseline (raw JSON in prompt): 34/103 (33%), 10.7M input tokens
- Sift (artifact + code query): 102/103 (99%), 489K input tokens
Open benchmark + MIT code: https://github.com/lourencomaciel/sift-gateway
Install:
pipx install sift-gateway sift-gateway init --from claude
Works with Claude Code, Cursor, Windsurf, Zed, and VS Code. Existing MCP servers and tools require no changes.Cuts token costs 70% with receipts proving no accuracy drop on hard evals.
Token efficiency beats Stagehand — 2-5k vs 29-51k per action with cached selectors.
Self-repairing LLM output format that beats JSON on tokens and recovery.
Language purpose-built for token costs: 55 tokens vs 120 in JavaScript. Real compiler, 1291 tests.
Persona-based prompting cuts tokens 47% without breaking code like Caveman styles do.
Referenced element indexing cuts token spend 3-10x versus DOM-dumping AI browsers.