Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster
Pure C99 GPT with SIMD beats Python 4,600x; drop two files into any project.

Karpathy's minigpt ported to TS, readable code instead of brevity Olympics.
ML students, educational content creators, browser-based ML enthusiasts
Karpathy's minigpt (Python original) · TensorFlow.js · ONNX.js
You can try it in the playground: https://microgpt-ts.vercel.app/playground
There are preset datasets (baby names, Pokemon, company names, movie titles, etc.) or you can paste your own text. The playground shows live loss curves as the model trains, and you can step through generation one token at a time to see the probability distribution at each step.
One difference from Karpathy's original is style. His microgpt is a single Python script optimized for brevity. This version splits the code into a few small files, types everything, and uses named helper functions (dotProduct, transpose, mean) instead of terse one-liners. The tradeoff is a bit more code, but it's easier to read and follow.
I built it up following the same progression as the blog post: bigram count table, then MLP with manual gradients, then autograd, single-head attention, multi-head + layer loop, and finally Adam. Each step is a separate PR and tag on GitHub [2] so you can follow along or check out any snapshot.
Pure C99 GPT with SIMD beats Python 4,600x; drop two files into any project.
Explains attention mechanisms to five-year-olds while building LLaMA 3 from scratch.
Tiny footprint (~500 lines vs 400k OpenClaw), but local AI assistants are crowded.
Tiny auditable AI agent: read the whole thing over coffee, modify it yourself.
Type a name and you can literally watch characters turn into IDs, 16‑dim embeddings get added with positional encodings, and causal attention matrices animate per head — all matched numerically to Karpathy's 244‑line microGPT. The implementation is pure TypeScript (no ML libs) and includes a helpful scrollable sidebar with the reference math, which makes this an excellent, low‑friction learning tool — more pedagogical deep dive than research innovation.
Karpathy's microgpt in C99, proves tiny coordinated models beat single large models on logic.