Back to browse
GitHub Repository

This repository contains the code for all the book that I am writing `My adventures with LLM` book.

27 starsPython

A book that builds GPT-2, Llama 3, DeepSeek from scratch in PyTorch

by s1lv3rj1nx·Apr 15, 2026·2 points·1 comment

AI Analysis

●●SolidNiche GemBig Brain

Loads real Meta and OpenAI weights, not just training from scratch.

Strengths
  • Progressive architecture swaps from GPT-2 to Llama 3 show exactly what changed
  • DeepSeek MLA, MoE, and FP8 quantization coverage is current and rare
  • All code is runnable PyTorch, no pseudocode or hand-waving
Weaknesses
  • Educational LLM content is crowded—Karpathy, Hugging Face courses exist
  • Book is paid on Leanpub, limiting reach compared to free alternatives
Category
Target Audience

ML engineers and students learning LLM internals

Similar To

Andrej Karpathy's nanoGPT · Hugging Face Course · The Annotated Transformer

Post Description

I'm a software engineer who works with LLMs professionally (Forward Deployed Engineer at TrueFoundry). Over the past year I built up implementations of five LLM architectures from scratch and wrote a book around them.

The progression:

- Ch1: Vanilla encoder-decoder transformer (English to Hindi translation) - Ch2: GPT-2 124M from scratch, loads real OpenAI pretrained weights - Ch3: Llama 3.2-3B by swapping 4 components of GPT-2 (LayerNorm to RMSNorm, learned PE to RoPE, GELU to SwiGLU, MHA to GQA), loads Meta's pretrained weights - Ch4: KV cache, MQA, GQA (inference optimisation) - Ch5: DeepSeek MLA (absorption trick, decoupled RoPE), DeepSeekMoE, Multi-Token Prediction, FP8 quantisation

All code is open source: https://github.com/S1LV3RJ1NX/mal-code

The book provides the explanations, derivations, diagrams, and narrative: https://leanpub.com/adventures-with-llms (free sample available)

I wrote it because most resources stop at GPT-2 and I wanted something that covered what's actually in production models today. Happy to answer questions about any of the implementations.

Similar Projects

Education●●●Banger

How-to-Train-Your-GPT

Build a LLaMA-style model from scratch with zero ML prerequisites or math.

CozyBig Brain
RaiyanYahya
101mo ago
AI/ML●●Solid

PyTorch on Java

LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.

Niche GemBig Brain
pdsminer
202d ago