Kronaxis Router – Don't pay frontier prices when a local LLM is enough
LLM cost routing with LoRA awareness when LiteLLM already handles basic proxying.
role-model is a protocol for assigning the right model for the right job. Use local and cloud AI together, or route between several cloud providers.
Protocol-first routing contract beats ad-hoc LiteLLM configs for hybrid AI deployments.
Engineers building hybrid local/cloud AI infrastructure
LiteLLM · LangChain · vLLM
role-model is mostly deterministic, with fallback to a controller model, that routes requests based on a chosen routing strategy. the protocol is structured around assigning domains and roles to models, where requests sent by consumer applications like Pi have task types to enrich routing metadata and thereby accuracy. you can to run the built-in benchmark to compare performance of models across speed, quality and cost, as well as observed performance on real tasks. I have a diagram on how routing works in [0].
The runtime supports local models, either directly to your local endpoint (LM Studio, llama.cpp etc), or routing between multiple local models via vendored llama-swap.
Since there was another model router post yesterday where people discussed the basics of routing, I will focus on discussing some of the interesting learnings I've made building and testing this:
1. Model routing is essentially trying to predict the future: which model will perform optimally (based on criteria defined by the user) on this request?
2. After you have routed the request, you want to evaluate if it was the right decision or if some other model would have performed better
3. You also realize that having the router assess difficulty (among other things) to make decisions by itself is far from ideal - we'd prefer to have the consumer application work with the router to define what the request needs
4. You also realize that it becomes much easier, decisions become much accurate, and the outcomes of routing becomes more impactful when there is more of a distinction between models
For point 2, I will be launching evals that you can run locally to benchmark models in your pool on the same requests. The outcomes here can then be used for point 1, as input when routing new requests.
For point 3, I've built the pi-role-model package for Pi, which lets the Pi agent inject role_model.intent metadata including difficulty, preferred roles or even specfic model ids, required capabilities (say tool use or image input) and so on. You should be able to customized this further in Pi, and route in additional ways by changing metadata. This is why I've also built the role-model routing protocol.
For point 4, what model routing really does as a second order effect is create a market for specialized models - models that may or may not be smaller, could be cheaper or more expensive, may be locally runnable. It makes little sense to route between two frontier models (GPT 5.5 and Opus 4.8); it makes more sense to route between models where one of the factors of quality, speed, cost is a multiple of the other candidate models, and it makes even more sense to have specialized domain models: code, prose, math and science, visuals and so on. It is at this stage model routing becomes really valuable.
While role-model has a reference runtime that I'm continuously building out (there's lots to do to improve routing, as well as give users more granular control over routing decisions, and also ways to improve cross-model caching and also add techniques like FastContext), the ultimate goal of role-model is for there to be a standard protocol for inference requests that is used by consumer applications, so that the provider, be it a router middleware or an inference provider, will be able to route to a model that strikes the best balance between cost, speed and quality and also respects user choices, and even lets the user control these preferences to use local models for some tasks and allow cloud for others.
Links:
[0] role-model - the case for a model routing protocol: https://try.works/role-model-the-case-for-a-model-routing-pr...
[1] GitHub: https://github.com/try-works/role-model
[2] Docs: https://role-model.dev/
LLM cost routing with LoRA awareness when LiteLLM already handles basic proxying.
Specialized routing logic for MoE models without a demo or benchmarks.
Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.
LLM governance framework, but early-stage spec with no working code—Phase 0 skeleton promised.
Erlang actor model for agent messaging when most frameworks use REST APIs.
This stitches Arch-Router into Plano so OpenClaw traffic can be steered to different models by task preference — e.g., cheap k2.5 for calendar/email and Opus 4.6 for heavy app-building — which is a sensible, pragmatic way to shave inference costs without manual swapping. The demo looks usable (config.yaml + README + diagram) but stops at integration; I'd like to see performance/latency comparisons, failure handling and more real-world routing rules before I'd trust it in production.