I reduced LLM inference GPU calls by 94% using semantic routing
94% GPU reduction claim needs verifiable benchmarks to stand out.

Semantic routing with distance/direction/contrast predicates beats topic-based brokers for agents.
Engineers building multi-agent AI systems
Kafka · RabbitMQ · Redis Pub/Sub
Today we are launching Semantik, a message broker that routes by meaning instead of topics. Messages carry embeddings and metadata, and subscribers define what they care about using SemQL, a query language for high-dimensional space. SemQL has three predicates: distance (how similar), direction (messages that align with a concept), and contrast (similar to X but not Y). Semantik behaves like a vector database with a rolling window paired with SQL.
The secret sauce behind openclaw is channels, multiplexing incoming messages into a running LLM conversation. Semantik allows agents to jack into semantic namespaces but only to retrieve information that they care about, skipping the rigid plumbing around traditional message brokers. It solves the issue of "who needs to know" for coordinating between agents.
Feedback greatly appreciated! Reach us using the forms on the webpage.
94% GPU reduction claim needs verifiable benchmarks to stand out.
Architecture diagrams and buzzwords without working code or a live demo to validate claims.
Multi-model LLM router with semantic cache, but caching+fallback already exist (Anthropic, LangSmith, Unify).
Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.
Direct agent-to-agent messaging beats manual tab-switching between Claude and Codex.
This stitches Arch-Router into Plano so OpenClaw traffic can be steered to different models by task preference — e.g., cheap k2.5 for calendar/email and Opus 4.6 for heavy app-building — which is a sensible, pragmatic way to shave inference costs without manual swapping. The demo looks usable (config.yaml + README + diagram) but stops at integration; I'd like to see performance/latency comparisons, failure handling and more real-world routing rules before I'd trust it in production.