Back to browse
API router that picks the cheapest model that fits each query

API router that picks the cheapest model that fits each query

by robinbanner·Feb 16, 2026·1 point·1 comment

AI Analysis

●●SolidSolve My ProblemWizardrySlick
The Take

Komilion turns model sprawl into a cost-control layer you drop in by swapping a base_url: requests are classified (regex fast path + tiny LLM) and matched to ~400 models so cheap models handle the easy stuff and premium models only run when needed. The ~60% zero‑call regex fast path and benchmark-driven routing (LMArena) are clever, pragmatic moves; the hard questions left are model-quality drift across providers and how routing decisions map to real-world user satisfaction.

Target Audience

Backend developers, AI/ML engineers, startups and companies using large language models who need cost optimization and multi-provider routing

Post Description

I got frustrated paying $60/M tokens for reasoning queries when a $0.80/M model gives comparable results for most of them. So I built Komilion — a model router that classifies each API request and routes it to a cheaper model that fits.

- Drop-in replacement for the OpenAI SDK (change one line: base_url) - Each query gets classified (regex fast path + lightweight LLM classifier) and matched against ~390 models - Three tiers (Frugal/Balanced/Premium) to control the quality-cost tradeoff - Automatic failover if a provider goes down - Cost metadata in every response

The routing logic is benchmark-driven (LMArena, Artificial Analysis), not ML-based — simpler to debug and reason about. The regex fast path handles ~60% of requests in under 5ms with zero API calls.

Example: a customer support bot doing 10K conversations/month went from ~$250/mo (everything pinned to Opus 4.6) to ~$40/mo with routing. Most conversations were FAQ-level questions that a smaller model handled fine.

Stack: Next.js, Vercel, Neon PostgreSQL, OpenRouter upstream. Hosting cost: ~$20/month.

We ran a head-to-head benchmark: same 15 prompts through Opus, GPT-4o, Gemini Pro, and the router. Simple tasks cost 66% less with routing. Complex tasks produced 2x more detailed output because the router picked specialized models per task type. Full data: https://dev.to/robinbanner/we-benchmarked-4-ai-api-strategie...

Architecture writeup: https://dev.to/robinbanner/inside-komilions-architecture-how... — there's a free tier if you want to try it.

Similar Projects

Developer Tools●●Solid

NadirClaw – Open-source LLM router with 10ms classification

Smart LLM routing cuts costs, but competing against established OpenRouter and vLLM ecosystems.

Solve My ProblemBig Brain
amirdor
103mo ago