How-to-Train-Your-GPT
Build a LLaMA-style model from scratch with zero ML prerequisites or math.
This repository contains the code for all the book that I am writing `My adventures with LLM` book.
Loads real Meta and OpenAI weights, not just training from scratch.
ML engineers and students learning LLM internals
Andrej Karpathy's nanoGPT · Hugging Face Course · The Annotated Transformer
The progression:
- Ch1: Vanilla encoder-decoder transformer (English to Hindi translation) - Ch2: GPT-2 124M from scratch, loads real OpenAI pretrained weights - Ch3: Llama 3.2-3B by swapping 4 components of GPT-2 (LayerNorm to RMSNorm, learned PE to RoPE, GELU to SwiGLU, MHA to GQA), loads Meta's pretrained weights - Ch4: KV cache, MQA, GQA (inference optimisation) - Ch5: DeepSeek MLA (absorption trick, decoupled RoPE), DeepSeekMoE, Multi-Token Prediction, FP8 quantisation
All code is open source: https://github.com/S1LV3RJ1NX/mal-code
The book provides the explanations, derivations, diagrams, and narrative: https://leanpub.com/adventures-with-llms (free sample available)
I wrote it because most resources stop at GPT-2 and I wanted something that covered what's actually in production models today. Happy to answer questions about any of the implementations.
Build a LLaMA-style model from scratch with zero ML prerequisites or math.
Another transformer-from-scratch tutorial in a saturated space.
Yet another ML-from-scratch teaching repo in a crowded educational space.
Multi-model debates with synthesis, but MCP servers already chain APIs.
Claude debates GPT and Gemini in parallel rounds; costs $0.02–0.05 per brainstorm.
LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.