Back to browse
GitHub Repository

Golang inference engine and deep learning primitives

19 starsC

I wrote an LLM inference engine in pure Go – 48 tok/s zero dependencies

by computerex·Mar 7, 2026·2 points·0 comments

AI Analysis

●●●BangerZero to OneWizardryBig Brain

Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.

Strengths
  • Zero external dependencies (SIMD optional) plus pure Go implementation lowers deployment friction dramatically
  • Declarative architecture spec resolved at load time means adding new models is config, not code rewrite
  • Covers 25+ quantization formats, Whisper, and multi-turn chat—serious breadth for single developer
Weaknesses
  • ~48 tok/s on small models significantly slower than llama.cpp; won't replace it for latency-critical apps
  • Apple Silicon + Linux support only mentioned; Windows support unclear
Category
Target Audience

Go developers needing local LLM inference without Python/C++ dependencies

Similar To

llama.cpp · Ollama · tinygrad

Post Description

dlgo is a pure Go deep learning inference engine. It loads GGUF models and runs them on CPU with no dependencies beyond the standard library (SIMD acceleration is optional via CGo).

I built this because I wanted to add local LLM inference to a Go project without shelling out to Python or linking against llama.cpp. The whole thing is go get github.com/computerex/dlgo and you're running models.

It supports LLaMA, Qwen 2/3/3.5, Gemma 2/3, Phi-2/4, SmolLM2, Mistral, and Whisper speech-to-text. Architectures are expressed as a declarative per-layer spec resolved at load time, so adding a new model family is mostly just describing its layer structure rather than writing a new forward pass.

Performance on a single CPU thread with Q4_K_M quantization: ~31 tok/s for LLaMA 3.2 1B, ~48 tok/s for Qwen3 0.6B, ~16 tok/s for Qwen3.5 2B (which has a hybrid attention + Gated Delta Network architecture). Not going to beat llama.cpp on raw speed, but it's fast enough to be useful and the ergonomics of a native Go library are hard to beat.

Supports 25+ GGML quantization formats (Q4_0 through Q8_0, all K-quants, I-quants, F16, BF16, F32). The GGUF parser, dequantization, tokenizer, forward pass, and sampling are all implemented from scratch.

Code: https://github.com/computerex/dlgo

Similar Projects

AI/ML●●Solid

WayInfer – Native GGUF engine that runs models larger than your RAM

Custom GGUF parser with mmap beats llama.cpp load times, but zero stars means unproven claims.

WizardryBold Bet
ahmedm24
102mo ago
Infrastructure●●Solid

LLM-Gateway – Zero-Trust LLM Gateway

Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.

Big BrainSolve My Problem
michaelquigley
712mo ago