Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
Build vLLM from scratch with PagedAttention kernels when llama.cpp already exists.

Classroom-friendly Ollama wrapper, but it's a setup script plus Medium article, not a product.
Computer science teachers and students learning AI fundamentals
Ollama (core tech) · Hugging Face educational resources · Local-first AI classroom setups
Lesson plan included on the GitHub page. Put students in front of Kali and have them follow the directions!
Thanks for looking.
Build vLLM from scratch with PagedAttention kernels when llama.cpp already exists.
S2 stream protocol enables agent-to-agent chat without central server middleware.
Type a name and you can literally watch characters turn into IDs, 16‑dim embeddings get added with positional encodings, and causal attention matrices animate per head — all matched numerically to Karpathy's 244‑line microGPT. The implementation is pure TypeScript (no ML libs) and includes a helpful scrollable sidebar with the reference math, which makes this an excellent, low‑friction learning tool — more pedagogical deep dive than research innovation.
Funny satire on AI hype, but literally just a script to waste money.
Another C-based scripting language when Lua, Python, and dozens already exist.
Generates setup scripts for GitHub repos, but Devbox and Nix already solve this.