NadirClaw – Open-source LLM router with 10ms classification
Smart LLM routing cuts costs, but competing against established OpenRouter and vLLM ecosystems.
Open-source LLM router & AI cost optimizer. Routes simple prompts to cheap/local models, complex ones to premium — automatically. Drop-in OpenAI-compatible proxy for Claude Code, Codex, Cursor, OpenClaw. Saves 40-70% on AI API costs. Self-hosted, no middleman.
If you're burning through Claude/OpenAI credits, this is a low-friction stopgap: it classifies prompts in ~10ms and routes trivial tasks to cheaper/local models while reserving premium APIs for complex work. The agentic-task detection, reasoning-aware routing, session pinning and context-window fallback are practical touches that avoid mid-thread model bouncing and 429 failures. It isn't reinventing the space (OpenRouter and others exist), but it's focused on real-world cost tradeoffs and drop-in compatibility.
Backend developers, AI/ML engineers, devs and hobbyists who integrate LLMs and want to reduce API cost
So I built NadirClaw. It's a Python proxy that sits between your app and your LLM providers. It classifies each prompt in about 10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap/local model you want. Only the complex prompts hit your premium API.
It's OpenAI-compatible, so you just point your existing tools at it. Works with OpenClaw, Cursor, Claude Code, or anything that talks to the OpenAI API.
In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/... | sh
Still early. The classifier is simple (token count + pattern matching + optional embeddings), and I'm sure there are edge cases I'm missing. Curious what breaks first, and whether the routing logic makes sense to others.
Smart LLM routing cuts costs, but competing against established OpenRouter and vLLM ecosystems.
Three-line wrapper cuts LLM costs 80%+ via prompt classification and same-provider routing.
Komilion turns model sprawl into a cost-control layer you drop in by swapping a base_url: requests are classified (regex fast path + tiny LLM) and matched to ~400 models so cheap models handle the easy stuff and premium models only run when needed. The ~60% zero‑call regex fast path and benchmark-driven routing (LMArena) are clever, pragmatic moves; the hard questions left are model-quality drift across providers and how routing decisions map to real-world user satisfaction.
Three-line fix for GDPR Article 44 violations when LLM prompts contain EU user data.
Drop-in proxy that cuts GPT token costs 40-60% without changing app code.
Multi-vendor token comparison with specific cut recommendations and dollar savings at scale.