Nexus Gateway – Reduce LLM API Costs Using Semantic Caching
Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.
OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.
LLM gateway with Redis + Qdrant caching, but LiteLLM does this.
Engineering teams running production LLM applications with cost concerns
LiteLLM · CacheLLM · Portkey
Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.
Semantic caching with dependency invalidation beats standard Redis wrappers for agent costs.
Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.
Local semantic caching cuts LLM costs without changing your code.
Runs as a single binary with embedded SQLite and zero-config start, acting as a transparent, provider-agnostic proxy that logs model, tokens, latency, cost and API key hashes while leaving full body capture opt-in. It also proxies streaming responses in real time and exposes stable JSON analytics endpoints — a practical, instrumentable way to get reproducible, audit-ready traces for real LLM traffic, though long-term value depends on how it handles provider edge-cases and SDK compatibility.
Yet another OpenAI-compatible gateway when LiteLLM and OpenRouter already exist.