I wrote an LLM inference engine in pure Go – 48 tok/s zero dependencies
Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.

In-process LLM inference in PHP beats the usual Python sidecar pattern.
PHP developers building AI features
llama.cpp · Ollama · LM Studio
Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.
Custom GGUF parser with mmap beats llama.cpp load times, but zero stars means unproven claims.
Pure Vulkan compute enables LLMs inside game loops without CUDA lock-in.
Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.
Unified API gateway for Ollama + vLLM with real-time GPU telemetry and drain mode.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.