Back to browse
GitHub Repository

A generative AI load balancer and token accounting system.

10 starsPython

AI load balancer and API translator

by sheneman42·Mar 6, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainSolve My ProblemSlick

Unified API gateway for Ollama + vLLM with real-time GPU telemetry and drain mode.

Strengths
  • Fair-share scheduling with Deficit Round Robin + burst credits is non-trivial queue discipline
  • GPU sidecar agent + real-time telemetry per node and backend is genuinely useful for operators
  • Dual dashboards (public status + authenticated admin) + audit logging + Azure AD SSO shows production maturity
Weaknesses
  • Self-hosted LLM clusters are a narrow audience; most orgs use OpenAI or Anthropic directly
  • Requires running Ollama/vLLM nodes yourself — no managed service advantage
Target Audience

Teams running self-hosted LLM inference clusters who need unified API routing and quota management

Similar To

vLLM's built-in OpenAI API server · Ollama's REST API · LiteLLM proxy

Similar Projects

Infrastructure●●Solid

LLM-Gateway – Zero-Trust LLM Gateway

Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.

Big BrainSolve My Problem
michaelquigley
712mo ago