Back to browse
GitHub Repository

microVM orchestrators to boot from docker images, packed with L7 reverse proxy

31 starsGo

Herd – Session-affine process pool for Go

by sankalpnarula·Mar 7, 2026·1 point·0 comments

AI Analysis

●●●BangerSolve My ProblemBig Brain

1:1 session-to-process routing prevents context storms on stateful binaries—elegant constraint.

Strengths
  • Solves a real, underappreciated problem: stateful binary multi-tenancy without Kubernetes
  • Singleflight + session affinity approach is clever and minimal—zero dependencies
  • Auto-scaling, TTL eviction, health checks packed into tiny API surface
Weaknesses
  • Early-stage with zero stars; no production deployments or community validation yet
  • Requires users to understand session affinity pattern—niche mental model
Target Audience

Backend engineers running stateful binaries like Ollama, headless browsers, or REPLs at scale

Similar To

Kubernetes StatefulSets · Firecracker · HAProxy

Post Description

Hey HN,

Herd is a zero-dependency Go library that manages fleets of OS subprocesses and routes HTTP traffic to them with strict 1:1 session affinity.

If you put heavy, stateful binaries (like Ollama, headless Chromium, or Python REPLs) behind a standard reverse proxy and get a spike in traffic, it usually ends badly. You either trigger a massive CUDA/Metal context storm that OOM-kills the host machine, or you bleed state across different users' sessions.

Herd handles this without needing a heavy control plane like Kubernetes StatefulSets or Firecracker. It gives you automatic process lifecycle management and a built-in reverse proxy in about 10 lines of Go.

How it works under the hood:

- It spawns OS-level subprocesses via exec.Cmd.

- It routes incoming HTTP traffic based on any custom Session ID you define (a header, a cookie, a path parameter).

- If a session exists, it routes to that exact pinned OS process.

- If it doesn't, it safely acquires a singleflight lock, spawns a new process, waits for the /health endpoint, and proxies the request.

- If a process crashes, the blast radius is contained to one session, and the pool auto-recovers.

To test the concurrency constraints, I hurled 200 concurrent LLM inference requests at a Herd gateway backed by a pool capped at 10 Ollama (Qwen3:0.6B) workers on an M4 Pro Mac. It scored 200/200 with zero dropped packets, acting as a perfect backpressure queue to safely drip-feed the OS without thrashing the host's Unified Memory.

It’s MIT licensed. Would love for you to check out the repo, try to break the singleflight lock, or review the architecture.

Repo: https://github.com/HackStrix/herd Architecture & Mermaid Diagrams: https://github.com/HackStrix/herd/blob/main/docs/ARCHITECTUR...

Similar Projects