TRiP – a complete transformer engine in C built from scratch just by me
From-scratch C transformer engine with training and vision, built by one person.
A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.
Unified video and image model trained from scratch on just 128 GPUs.
ML researchers and engineers experimenting with efficient multimodal models
Show-o · Emu3 · Chameleon
- Homepage: https://lance-project.github.io/
- Paper: https://arxiv.org/abs/2605.18678
- Model: https://huggingface.co/bytedance-research/Lance
p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.
From-scratch C transformer engine with training and vision, built by one person.
Karpathy's microgpt in C99, proves tiny coordinated models beat single large models on logic.
This is a practical, no-nonsense play: someone trained YOLOX from scratch, released MIT-licensed weights, and packaged a path toward running it on iOS. The value is procedural — dataset curation, training recipe, and an export/convert-for-iOS pipeline — but it's not a conceptual breakthrough; I'd like to see clear mAP numbers, model size and on-device latency benchmarks before recommending it for production.
Build a LLaMA-style model from scratch with zero ML prerequisites or math.
Train a working LLM in 5 minutes on free Colab with a fish personality.
Native ternary training beats post-training quantization for memory efficiency.