GitHub Repository

OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.

7 starsRust

AI Cost Firewall – OpenAI-compatible gateway with semantic caching

Name: AI Cost Firewall – OpenAI-compatible gateway with semantic caching
Availability: InStock
Author: vcaluser

by vcaluser·Mar 28, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidSlickShip It

LLM gateway with Redis + Qdrant caching, but LiteLLM does this.

Strengths

•Two-layer cache (exact Redis + semantic Qdrant) captures both identical and similar requests
•Hot config reload via SIGHUP means no downtime when updating routing rules
•Prometheus metrics and Grafana dashboard show real cost savings with embedding overhead

Weaknesses

•LLM gateway caching is a crowded category with LiteLLM, CacheLLM, and others already established
•Semantic cache quality depends heavily on embedding model choice with no guidance provided

Similar Projects

Infrastructure●Mid

Nexus Gateway – Reduce LLM API Costs Using Semantic Caching

Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.

Ship ItSolve My Problem

Sunnyanand_dev

213mo ago

AI/ML●●●Banger

CacheCore – semantic agent caching with dependency invalidation

Semantic caching with dependency invalidation beats standard Redis wrappers for agent costs.

Big BrainSolve My Problem

fabriziorocco

241mo ago

Infrastructure●●Solid

LLM-Gateway – Zero-Trust LLM Gateway

Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.

Big BrainSolve My Problem

michaelquigley

712mo ago

Developer Tools●●●Banger

Isartor – Pure-Rust prompt firewall, deflects 60-95% of LLM traffic

Local semantic caching cuts LLM costs without changing your code.

Solve My ProblemSlick

zippode

312mo ago

Developer Tools●●Solid

LLM Gateway for OpenAI/Anthropic Written in Golang

Runs as a single binary with embedded SQLite and zero-config start, acting as a transparent, provider-agnostic proxy that logs model, tokens, latency, cost and API key hashes while leaving full body capture opt-in. It also proxies streaming responses in real time and exposes stable JSON analytics endpoints — a practical, instrumentable way to get reproducible, audit-ready traces for real LLM traffic, though long-term value depends on how it handles provider edge-cases and SDK compatibility.

Solve My ProblemNiche GemSlick

oatmale

423mo ago

AI/ML●Mid

Apertis – OpenAI-compatible API gateway for 470 AI models

Yet another OpenAI-compatible gateway when LiteLLM and OpenRouter already exist.

Crowd Pleaser

thequert

109d ago