Back to browse
Clusterflock: An AI orchestrator for networked hardware

Clusterflock: An AI orchestrator for networked hardware

by notum·Apr 4, 2026·3 points·0 comments

AI Analysis

●●SolidSolve My ProblemShip It

Hardware-aware model bin-packing across mixed GPUs beats manual vLLM config.

Strengths
  • Auto-profiles VRAM across mixed hardware: consumer GPUs, Mac, DGX in one cluster.
  • OpenAI-compatible API at port 1919 works with LangChain, LiteLLM, any SDK.
  • Native llama.cpp parallelism enables tight VRAM packing with multiple models per device.
Weaknesses
  • v0.7.10 suggests early stage, unclear scaling beyond small multi-GPU setups.
  • vLLM, Ray Serve, and Kubernetes solutions already handle distributed inference.
Target Audience

ML engineers, self-hosted AI enthusiasts, teams with multiple GPUs

Similar To

vLLM · Ray Serve · TGI

Post Description

Hi HN!

We built Clusterflock to solve our own headaches with managing AI agents across distributed setups, different VRAM and RAM allowances, and the need to easily try out new models.

While the focus on infrastructure (we built this specifically for networked hardware) it does ship with a powerful mission runner (or orchestrator), which is multi-session and asynchronous.

Here is what it does best:

Hardware-aware auto-downloading: It profiles your networked hardware and automatically pulls down the best models for your specific setup (currently only from HuggingFace).

Tight packing: Native parallelism via llama.cpp, you can allow it to fit multiple smaller models on same device.

It is fully open-source. We wanted a painless way to deploy agentic clusters, and we hope you find it useful too.

Website: https://clusterflock.net

Happy to hear feedback. Flocks very much given.

Similar Projects

Developer Tools●●Solid

Network-AI – A Distributed Mutex for AI Agent Swarms

Offers a very practical surface — a 2-line @lock("resource_id") decorator plus Redis or file-backed locks and timeout handling to avoid zombie agents. The project pairs that mutex model with a shared blackboard, 12 adapter integrations, and built-in AES-256/HMAC and rate-limiting, so it reads like an orchestration layer rather than just a lock. Impressive test coverage and adapters suggest solid engineering, though I'd want an explicit comparison to existing Redis lock patterns (Redlock) and more distributed-safety docs.

Solve My ProblemNiche Gem
jovanaccount
104mo ago