Turn your Google accounts into a free, load-balanced LLM API gateway
Multi-account rotation with cooldowns beats single-account rate limits.
A fast, native-protocol LLM gateway with weighted pool composition and correct billing-vs-client failure handling.
Mid-request failover reroutes streaming responses before your client sees a byte.
Teams running production LLM applications with multiple vendor dependencies
LiteLLM · Portkey · Helicone
One thing let to another and I ended up writing Busbar: An LLM gateway, written in Rust (I have a thing for Rust lately). You point your existing OpenAI/Anthropic/Gemini SDK at it, change the model to a pool name, and that name now load-balances across the vendors. Your client code doesn't change and never learns it even happened.
My central idea is "protocols, not providers". I implement six protocols - Anthropic, OpenAI, Gemini, Bedrock, Responses, Cohere - losslessly. You define a provider in three lines of YAML, mainly specifying the protocol that provider speaks.
Your client speaks a protocol in to Busbar and Busbar speaks a protocol out to the provider.
- Each protocol translates request and response, streamed and buffered, in both directions. Same-protocol calls pass through untouched; cross-protocol calls reconcile the awkwardness (a field one dialect requires and another makes optional).
- A circuit breaker that knows whose fault a failure is. It stops routing to a backend that's genuinely failing, but it won't penalize a model for a request that was simply too big (it retries on a larger-context model instead), and it won't blame a backend when the caller sent a bad request. A healthy model never gets pulled from rotation for something that wasn't its fault. All issues I have personally faced and wanted to fix one time in busbar vs 10x in 10 applications.
- Hand-rolled AWS implementations so I am not reliant on AWS SDK's: SigV4 and a from-scratch AWS eventstream frame decoder for Bedrock
It's 1.0.0-rc.2 — feature-complete and API-stable, with release-candidate validation underway before 1.0.0. I have been using it on my projects and its solving my problems nicely.
Solo project, AGPL-3.0. The AGPL choice is open to discussion; I know it matters for a request-path component.
Feedback very welcome, particularly on where the translation might still be lossy in edge cases. Contribution and conversation desired!
Multi-account rotation with cooldowns beats single-account rate limits.
Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.
Reverse-engineers free Gemini API; smart quota rotation, but against Google's terms of service.
Unified API gateway for Ollama + vLLM with real-time GPU telemetry and drain mode.
Bifrost combines an OpenAI-compatible front door with adaptive load balancing, semantic caching, automatic failover, cluster mode and a built-in web UI — you can spin it up with npx or Docker in seconds. The performance claims (sub-100µs overhead at 5k RPS, '50x faster than LiteLLM') and multi-provider routing are the project's selling points; I want to see independent benchmarks and deeper docs on guardrails/provider quirks before trusting it for critical workloads.
LiteLLM already does this with more providers, more features, and way more maturity.