I built a tiny LLM to demystify how language models work
Train a working LLM in 5 minutes on free Colab with a fish personality.
Running a large language model on a PlayStation 2
Streams LLM weights from CD-ROM during inference to fit 77MB models in 32MB RAM.
Embedded AI developers, retro computing enthusiasts, ML engineers
llama.cpp · MLC-LLM · TinyLLM
The Emotion Engine has 32 MB of RAM total, so the trick is streaming weights from CD-ROM one matrix at a time during the forward pass — only activations, KV cache and embeddings live in RAM. This means models bigger than the RAM can still run, they just read more from disc.
Had to build a custom quantized format (PSNT), hack endianness, write a tokenizer pipeline, and most of the PS2 SDK from scratch (releasing that separately). The model itself is also custom — a 10M param Llama-style architecture I trained specifically for this.
And it works. On real hardware.
Train a working LLM in 5 minutes on free Colab with a fish personality.
Smart context window solution, but LLM-based summarization has its own failure modes.
Smart LLM routing cuts costs, but competing against established OpenRouter and vLLM ecosystems.
Native multilingual training covers GDPR Article 9 categories others skip.
Explains transformers to an 11-year-old with interactive playgrounds better than most courses.
Prefix notation language that cuts LLM token usage by 70% compared to Python or C.