Back to browse
GitHub Repository

role-model is a protocol for assigning the right model for the right job. Use local and cloud AI together, or route between several cloud providers.

9 starsTypeScript

role-model, a router for hybrid local/cloud AI

by try-working·Jun 28, 2026·1 point·1 comment

AI Analysis

●●●BangerBig BrainZero to One

Protocol-first routing contract beats ad-hoc LiteLLM configs for hybrid AI deployments.

Strengths
  • Protocol layer creates durable contract for routing decisions across tools
  • Observability artifacts record why each routing decision was made
  • Supports local-local, local-cloud, and cloud-cloud routing in one runtime
Weaknesses
  • Adoption depends on other tools implementing the role-model protocol
  • LiteLLM and LangChain already handle basic model routing without new protocol
Category
Target Audience

Engineers building hybrid local/cloud AI infrastructure

Similar To

LiteLLM · LangChain · vLLM

Post Description

Hey everyone, I'm launching role-model today: a routing protocol, a reference router runtime, and an extension for Pi that allows for better informed routing decisions.

role-model is mostly deterministic, with fallback to a controller model, that routes requests based on a chosen routing strategy. the protocol is structured around assigning domains and roles to models, where requests sent by consumer applications like Pi have task types to enrich routing metadata and thereby accuracy. you can to run the built-in benchmark to compare performance of models across speed, quality and cost, as well as observed performance on real tasks. I have a diagram on how routing works in [0].

The runtime supports local models, either directly to your local endpoint (LM Studio, llama.cpp etc), or routing between multiple local models via vendored llama-swap.

Since there was another model router post yesterday where people discussed the basics of routing, I will focus on discussing some of the interesting learnings I've made building and testing this:

1. Model routing is essentially trying to predict the future: which model will perform optimally (based on criteria defined by the user) on this request?

2. After you have routed the request, you want to evaluate if it was the right decision or if some other model would have performed better

3. You also realize that having the router assess difficulty (among other things) to make decisions by itself is far from ideal - we'd prefer to have the consumer application work with the router to define what the request needs

4. You also realize that it becomes much easier, decisions become much accurate, and the outcomes of routing becomes more impactful when there is more of a distinction between models

For point 2, I will be launching evals that you can run locally to benchmark models in your pool on the same requests. The outcomes here can then be used for point 1, as input when routing new requests.

For point 3, I've built the pi-role-model package for Pi, which lets the Pi agent inject role_model.intent metadata including difficulty, preferred roles or even specfic model ids, required capabilities (say tool use or image input) and so on. You should be able to customized this further in Pi, and route in additional ways by changing metadata. This is why I've also built the role-model routing protocol.

For point 4, what model routing really does as a second order effect is create a market for specialized models - models that may or may not be smaller, could be cheaper or more expensive, may be locally runnable. It makes little sense to route between two frontier models (GPT 5.5 and Opus 4.8); it makes more sense to route between models where one of the factors of quality, speed, cost is a multiple of the other candidate models, and it makes even more sense to have specialized domain models: code, prose, math and science, visuals and so on. It is at this stage model routing becomes really valuable.

While role-model has a reference runtime that I'm continuously building out (there's lots to do to improve routing, as well as give users more granular control over routing decisions, and also ways to improve cross-model caching and also add techniques like FastContext), the ultimate goal of role-model is for there to be a standard protocol for inference requests that is used by consumer applications, so that the provider, be it a router middleware or an inference provider, will be able to route to a model that strikes the best balance between cost, speed and quality and also respects user choices, and even lets the user control these preferences to use local models for some tasks and allow cloud for others.

Links:

[0] role-model - the case for a model routing protocol: https://try.works/role-model-the-case-for-a-model-routing-pr...

[1] GitHub: https://github.com/try-works/role-model

[2] Docs: https://role-model.dev/

Similar Projects

AI/ML●●Solid

Kronaxis Router – Don't pay frontier prices when a local LLM is enough

LLM cost routing with LoRA awareness when LiteLLM already handles basic proxying.

Big BrainSolve My Problem
JasonDuke
202mo ago
AI/ML●●Solid

LLM-use – cost-effective LLM orchestrator for agents

Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.

Niche GemBig Brain
justvugg
214mo ago

Preference-aware routing for OpenClaw via Plano

This stitches Arch-Router into Plano so OpenClaw traffic can be steered to different models by task preference — e.g., cheap k2.5 for calendar/email and Opus 4.6 for heavy app-building — which is a sensible, pragmatic way to shave inference costs without manual swapping. The demo looks usable (config.yaml + README + diagram) but stops at integration; I'd like to see performance/latency comparisons, failure handling and more real-world routing rules before I'd trust it in production.

Niche GemSolve My Problem
sparacha
104mo ago