Back to browse
GitHub Repository

Zero-dependency C99 GPT-2 engine for edge AI. Sub-1M parameter models train on-device in seconds. Organelle Pipeline Architecture (OPA) coordinates specialised micro-models — 91% win rates on 11 logic games with 30K–160K parameters. Composition beats capacity.

106 starsC

MicroGPT-C – C99 GPT for Edge Training and Tiny Model Pipelines

by Ajay__soni·Feb 23, 2026·1 point·0 comments

AI Analysis

●●●BangerWizardryBig Brain

Karpathy's microgpt in C99, proves tiny coordinated models beat single large models on logic.

Strengths
  • Composition-over-capacity insight: 30K–460K parameter organelles working together consistently outperform single models
  • Pure C99 with zero dependencies and on-device training in seconds; genuinely rare for GPT work
  • Validated on real signal (market regimes 57% accuracy) vs. noise (lottery 50% entropy floor)
Weaknesses
  • Limited to logic games and toy domains; unclear real-world utility beyond proof-of-concept
  • Constrained by C99 ecosystem; harder to iterate than Python or JAX for ML researchers
Category
Target Audience

C/C++ systems engineers, edge AI researchers, low-latency embedded ML specialists

Similar To

Karpathy's nanoGPT · llm.c

Post Description

TL;DR: Pure C99 GPT-2 engine, zero dependencies. Sub-1M-parameter “organelles” coordinate via a Planner-Worker-Judge pipeline and beat random opponents up to 91 % on logic games, all on CPU.

I’m a C/C++ architect focused on low-latency systems. Last year I tried building agentic pipelines with SLMs/LLMs and hit the usual wall: latency and orchestration overhead killed real-time edge use cases.

Initial research video: https://www.youtube.com/watch?v=q-rs9VZ1-0I

So I asked: how far can you push specialised logic at <1M parameters with nothing but local CPU?

MicroGPT-C is a from-scratch C99 port of Karpathy’s microgpt.py[](https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...). Zero deps, single-header, localised KV cache. Speed was never the goal (Andrej’s nanoGPT and llm.c already showed what’s possible). The real experiment was orchestration.

Organelle Pipeline Architecture (OPA): Agile-style Planners, Workers and Judges talking through tiny structured strings (board=XO_|valid=1,3) parsed by a safety-gated stack VM (3.7–5.8 M ops/s). A 64 K model needed 181 manual interventions; a 460 K model trained on those traces internalised everything and needed zero.

Beyond the research: fully auditable AI, great for education (~3 k lines of readable C), rapid prototyping, and embedded.

Personal itch: fraud/risk engines. I want agents that hunt “unknown-unknowns” in a sandbox where every decision is inspectable.

Happy to talk implementation, the 97 tests, 22 benchmarks, or anything else. FAQ: https://github.com/enjector/microgpt-c/blob/main/FAQ.md

Quick try (macOS/Linux/Windows): git clone https://github.com/enjector/microgpt-c && cd microgpt-c mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release && cmake --build . ./connect4_demo # 460 K params, ~21 min train, 88 % win rate vs random

Performance (Apple M2 Max): • 4.2 K params names: 685 k tok/s train, 110 k tok/s infer • 841 K Shakespeare char: 28 k / 16 k tok/s • 510 K Shakespeare word: 12.5 k / 40 k tok/s

Full leaderboard (11 games), market-regime experiment (57 % holdout = 2.8× baseline), and the book PDF: https://github.com/enjector/microgpt-c/blob/main/docs/book/M... GitHub: https://github.com/enjector/microgpt-c

Similar Projects

AI/ML●●●Banger

Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster

Pure C99 GPT with SIMD beats Python 4,600x; drop two files into any project.

WizardryZero to One
Ajay__soni
4033mo ago
AI/MLMid

PicoGPT – GPT in a QR Code

The author minified Karpathy’s MicroGPT, ported it to 39 lines of JS (including a tiny autograd, MHA, AdamW and training loop) and shoehorned the whole gzipped HTML into a version-40 QR code that the browser decompresses and runs. It's clearly a stunt — the model is toy-scale (≈4k params, 8-token context) — but the compression trick, browser-native DecompressionStream use, and runnable-in-QR delivery are a delightful technical flex.

WizardryCrowd Pleaser
kuberwastaken
103mo ago