Back to browse
GitHub Repository

OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.

7 starsRust

AI Cost Firewall – OpenAI-compatible gateway with semantic caching

by vcaluser·Mar 28, 2026·1 point·1 comment

AI Analysis

●●SolidSlickShip It

LLM gateway with Redis + Qdrant caching, but LiteLLM does this.

Strengths
  • Two-layer cache (exact Redis + semantic Qdrant) captures both identical and similar requests
  • Hot config reload via SIGHUP means no downtime when updating routing rules
  • Prometheus metrics and Grafana dashboard show real cost savings with embedding overhead
Weaknesses
  • LLM gateway caching is a crowded category with LiteLLM, CacheLLM, and others already established
  • Semantic cache quality depends heavily on embedding model choice with no guidance provided
Target Audience

Engineering teams running production LLM applications with cost concerns

Similar To

LiteLLM · CacheLLM · Portkey

Similar Projects

Infrastructure●●Solid

LLM-Gateway – Zero-Trust LLM Gateway

Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.

Big BrainSolve My Problem
michaelquigley
712mo ago
Developer Tools●●Solid

LLM Gateway for OpenAI/Anthropic Written in Golang

Runs as a single binary with embedded SQLite and zero-config start, acting as a transparent, provider-agnostic proxy that logs model, tokens, latency, cost and API key hashes while leaving full body capture opt-in. It also proxies streaming responses in real time and exposes stable JSON analytics endpoints — a practical, instrumentable way to get reproducible, audit-ready traces for real LLM traffic, though long-term value depends on how it handles provider edge-cases and SDK compatibility.

Solve My ProblemNiche GemSlick
oatmale
423mo ago