Ranvier – Prefix-aware routing for LLM inference
Routes LLM requests to GPUs with cached KV prefixes, skipping redundant prefill computation.
Tier — tier-based tool routing for AI agents of any size
Makes 1.5B models 10% more accurate by hiding 90% of tool descriptions.
Developers building local AI agents or running models on edge devices
LangChain · LlamaIndex · LiteLLM
Routes LLM requests to GPUs with cached KV prefixes, skipping redundant prefill computation.
Information density scoring beats semantic similarity for scientific RAG retrieval.
LLM cost routing with LoRA awareness when LiteLLM already handles basic proxying.
Facial recognition ensemble paper, not a shipped product or reproducible codebase.
Ancient Rome Q&A benchmark shows 81pp accuracy lift, but lacks adversarial defense evidence.
94% GPU reduction claim needs verifiable benchmarks to stand out.