Back to browse
GitHub Repository

GPT-2-style LLM built from scratch in C/CUDA with hand-written backprop, BPE tokenizer, FlashAttention, pretraining, and SFT.

0 starsCuda

NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

by vforno·Jun 19, 2026·2 points·0 comments

AI Analysis

●●SolidWizardryBig Brain

Hand-written FlashAttention and full gradient checks in pure CUDA with no PyTorch.

Strengths
  • Complete from-scratch pipeline: tokenizer, pretraining, and SFT in one repo
  • CPU reference implementation validates CUDA gradients via full-model check
  • Residual blocks explained as Forward-Euler ODE discretization
Weaknesses
  • 116M params on single GPU produces fluent but shallow output
  • LLM-from-scratch educational projects already exist (nanoGPT)
Category
Target Audience

ML engineers and students learning transformer internals

Similar To

nanoGPT · llm.c · Karpathy's implementations

Similar Projects

AI/ML●●●Banger

MicroGPT-C – C99 GPT for Edge Training and Tiny Model Pipelines

Karpathy's microgpt in C99, proves tiny coordinated models beat single large models on logic.

WizardryBig Brain
Ajay__soni
103mo ago
Education●●●Banger

How-to-Train-Your-GPT

Build a LLaMA-style model from scratch with zero ML prerequisites or math.

CozyBig Brain
RaiyanYahya
101mo ago