API router that picks the cheapest model that fits each query

Name: API router that picks the cheapest model that fits each query
Availability: InStock
Author: robinbanner

by robinbanner·Feb 16, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemWizardrySlick

The Take

Komilion turns model sprawl into a cost-control layer you drop in by swapping a base_url: requests are classified (regex fast path + tiny LLM) and matched to ~400 models so cheap models handle the easy stuff and premium models only run when needed. The ~60% zero‑call regex fast path and benchmark-driven routing (LMArena) are clever, pragmatic moves; the hard questions left are model-quality drift across providers and how routing decisions map to real-world user satisfaction.

Post Description

I got frustrated paying $60/M tokens for reasoning queries when a $0.80/M model gives comparable results for most of them. So I built Komilion — a model router that classifies each API request and routes it to a cheaper model that fits.

- Drop-in replacement for the OpenAI SDK (change one line: base_url) - Each query gets classified (regex fast path + lightweight LLM classifier) and matched against ~390 models - Three tiers (Frugal/Balanced/Premium) to control the quality-cost tradeoff - Automatic failover if a provider goes down - Cost metadata in every response

The routing logic is benchmark-driven (LMArena, Artificial Analysis), not ML-based — simpler to debug and reason about. The regex fast path handles ~60% of requests in under 5ms with zero API calls.

Example: a customer support bot doing 10K conversations/month went from ~$250/mo (everything pinned to Opus 4.6) to ~$40/mo with routing. Most conversations were FAQ-level questions that a smaller model handled fine.

Stack: Next.js, Vercel, Neon PostgreSQL, OpenRouter upstream. Hosting cost: ~$20/month.

We ran a head-to-head benchmark: same 15 prompts through Opus, GPT-4o, Gemini Pro, and the router. Simple tasks cost 66% less with routing. Complex tasks produced 2x more detailed output because the router picked specialized models per task type. Full data: https://dev.to/robinbanner/we-benchmarked-4-ai-api-strategie...

Architecture writeup: https://dev.to/robinbanner/inside-komilions-architecture-how... — there's a free tier if you want to try it.