Back to browse
GitHub Repository

GPT-2-style LLM built from scratch in C/CUDA with hand-written backprop, BPE tokenizer, FlashAttention, pretraining, and SFT.

7 starsCuda

NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

by vforno·Jun 28, 2026·20 points·3 comments

AI Analysis

●●●BangerWizardryBig BrainNiche Gem

Hand-written CUDA kernels and backprop with no PyTorch dependency.

Strengths
  • Full gradient check validation against CPU reference implementation
  • Hand-written FlashAttention and byte-level BPE tokenizer from scratch
  • Complete training pipeline pretraining and SFT in ~116M parameters
Weaknesses
  • Educational artifact not a capable assistant per author's honest assessment
  • Single GPU training limits scale compared to production models
Category
Target Audience

ML engineers and students wanting to understand LLM internals

Similar To

nanoGPT · llama2.c · Karpathy's educational implementations

Post Description

Hi everyone,

I started working on nanoeuler after the ban of anthropic's fable because my ambition and dream is to work in the AI field in anthropic. The two interesting reasons that led me to create nanoeuler were (1) interfacing with llm does not mean understanding how they are composed and (2), working on llm with a very low-level layer to understand the correlation between parameters and data and growth of the model and how the GPU works and how some layers can be optimized.

So I started working on it with a research aspect by making nanoeuler grow more and more but doing one step after another starting from Shakespeare.txt and understanding what a text generation model understands at 23 million parameters. For example, nanoeuler at that number had understood that Name: started a line and wrote that line with sense.

I wrote everything in CUDA because I wanted to not use any intermediary between the model in training and inference and what it had to do. Then the use of SFT and much more, even if in small ways, were really useful to understand the various step to make an llm like a chatbot.Any feedback, help, or suggestions are absolutely welcome!

Similar Projects

AI/ML●●●Banger

MicroGPT-C – C99 GPT for Edge Training and Tiny Model Pipelines

Karpathy's microgpt in C99, proves tiny coordinated models beat single large models on logic.

WizardryBig Brain
Ajay__soni
104mo ago
Education●●●Banger

How-to-Train-Your-GPT

Build a LLaMA-style model from scratch with zero ML prerequisites or math.

CozyBig Brain
RaiyanYahya
101mo ago