GitHub Repository

Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON for MCP and CLI agents.

30 starsPython

Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Name: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens
Availability: InStock
Author: loumaciel

by loumaciel·Mar 5, 2026·10 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerWizardrySolve My Problem

Artifact-backed tool output cuts token usage 95%, accuracy jumps 33% to 99%.

Strengths

•Real benchmark across 12 datasets with 103 questions—99% accuracy vs 33% baseline is substantial.
•Constrained Python execution (AST guards, timeouts, memory caps) is genuinely clever isolation.
•Drop-in MCP gateway architecture means no upstream server changes required.

Weaknesses

•Early-stage (4 stars on GitHub); unclear adoption and production maturity.
•Requires agents to learn query syntax; may not work with all LLM-based agent frameworks.

Post Description

LLM agents often place raw JSON tool outputs directly in the prompt. After a few tool calls, earlier results get compacted or truncated and answers become incorrect or inconsistent.

I built Sift, a drop-in MCP gateway that stores tool outputs as local artifacts (filesystem blobs indexed in SQLite) and returns an `artifact_id` plus compact schema hints when responses are large or paginated.

Instead of reasoning over full JSON in the prompt, the model runs a small Python query:

def run(data, schema, params): return max(data, key=lambda x: x["magnitude"])["place"]

Query code runs in a constrained subprocess (AST/import guards + timeout/memory caps). Only the computed result is returned to the model.

Benchmark (Claude Sonnet 4.6, 103 questions across 12 datasets):

- Baseline (raw JSON in prompt): 34/103 (33%), 10.7M input tokens

- Sift (artifact + code query): 102/103 (99%), 489K input tokens

Open benchmark + MIT code: https://github.com/lourencomaciel/sift-gateway

Install:

pipx install sift-gateway sift-gateway init --from claude

Works with Claude Code, Cursor, Windsurf, Zed, and VS Code. Existing MCP servers and tools require no changes.

Similar Projects

AI/ML●●●Banger

Reducing LLM input tokens by 70%

Cuts token costs 70% with receipts proving no accuracy drop on hard evals.

Zero to OneSolve My Problem

Jbunga

56331mo ago

AI/ML●●Solid

Sentinel – LLM browser automation using 10x fewer tokens

Token efficiency beats Stagehand — 2-5k vs 29-51k per action with cached selectors.

Solve My ProblemSlick

isoldex

102mo ago

AI/ML●●Solid

RAIF – an experimental structured I/O format for LLMs

Self-repairing LLM output format that beats JSON on tokens and recovery.

Big BrainSolve My Problem

truehazker

211d ago

Developer Tools●●●Banger

Arc – A language that uses 27-63% fewer tokens than JavaScript

Language purpose-built for token costs: 55 tokens vs 120 in JavaScript. Real compiler, 1291 tests.

Zero to OneBig Brain

kai_builds

113mo ago

Developer Tools●●Solid

Rocky-Project Hail Mary agent skill that cut output tokens ~47%

Persona-based prompting cuts tokens 47% without breaking code like Caveman styles do.

Big BrainNiche Gem

hpbyte

102mo ago

Developer Tools●●●Banger

Tappi Browser – Fastest AI browser, 3-10x fewer tokens, zero telemetry

Referenced element indexing cuts token spend 3-10x versus DOM-dumping AI browsers.

WizardryBig BrainShip It

shaihazher

103mo ago