Costile – open-source proxy, blocks AI API requests when budget is hit
Hard caps that block requests mid-flight beat provider dashboards that alert 6 hours too late.

SDK blocks over-budget calls without proxying traffic through their servers.
Engineering teams using AI APIs with multiple customers or features
LangFuse · Helicone · Portkey
Hard caps that block requests mid-flight beat provider dashboards that alert 6 hours too late.
One decorator reveals which feature burned $2,800 instead of two-day forensics.
MCP budget gating as a zero-dep npx proxy—solves the real friction of runaway tool costs.
Macaroon-based budget enforcement for AI agents—fills a real economic governance gap.
Proxying every LLM call to log tokens is the right kind of blunt instrument — you get per-developer, per-model cost telemetry immediately. Smart routing and the built-in semantic cache (claims 45–80% savings) are the most useful ideas here, but the default SQLite backend and admin/admin creds scream MVP rather than production-ready scale.
No-proxy LLM cost tracking beats Helicone for teams avoiding traffic rerouting.