Herd – Session-affine process pool for Go

Name: Herd – Session-affine process pool for Go
Availability: InStock
Author: sankalpnarula

by sankalpnarula·Mar 7, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemBig Brain

1:1 session-to-process routing prevents context storms on stateful binaries—elegant constraint.

Strengths

•Solves a real, underappreciated problem: stateful binary multi-tenancy without Kubernetes
•Singleflight + session affinity approach is clever and minimal—zero dependencies
•Auto-scaling, TTL eviction, health checks packed into tiny API surface

Weaknesses

•Early-stage with zero stars; no production deployments or community validation yet
•Requires users to understand session affinity pattern—niche mental model

Post Description

Hey HN,

Herd is a zero-dependency Go library that manages fleets of OS subprocesses and routes HTTP traffic to them with strict 1:1 session affinity.

If you put heavy, stateful binaries (like Ollama, headless Chromium, or Python REPLs) behind a standard reverse proxy and get a spike in traffic, it usually ends badly. You either trigger a massive CUDA/Metal context storm that OOM-kills the host machine, or you bleed state across different users' sessions.

Herd handles this without needing a heavy control plane like Kubernetes StatefulSets or Firecracker. It gives you automatic process lifecycle management and a built-in reverse proxy in about 10 lines of Go.

How it works under the hood:

- It spawns OS-level subprocesses via exec.Cmd.

- It routes incoming HTTP traffic based on any custom Session ID you define (a header, a cookie, a path parameter).

- If a session exists, it routes to that exact pinned OS process.

- If it doesn't, it safely acquires a singleflight lock, spawns a new process, waits for the /health endpoint, and proxies the request.

- If a process crashes, the blast radius is contained to one session, and the pool auto-recovers.

To test the concurrency constraints, I hurled 200 concurrent LLM inference requests at a Herd gateway backed by a pool capped at 10 Ollama (Qwen3:0.6B) workers on an M4 Pro Mac. It scored 200/200 with zero dropped packets, acting as a perfect backpressure queue to safely drip-feed the OS without thrashing the host's Unified Memory.

It’s MIT licensed. Would love for you to check out the repo, try to break the singleflight lock, or review the architecture.

Repo: https://github.com/HackStrix/herd Architecture & Mermaid Diagrams: https://github.com/HackStrix/herd/blob/main/docs/ARCHITECTUR...