1gbps Tokenizer written in Assembly. 20x faster than HuggingFace
Handwritten assembly tokenizer claiming 20x speedup over HuggingFace on SSE2.

Runs 32B models on 8GB VRAM by streaming weights, beating vLLM's memory constraints.
ML engineers deploying LLMs on consumer hardware or cost-sensitive infrastructure
vLLM · TGI · ExLlamaV2
Handwritten assembly tokenizer claiming 20x speedup over HuggingFace on SSE2.
SIEVE cache beats LRU with one-line swap, but only matters if you're bottlenecked on cache.
SSD-backed KV cache cuts coding agent TTFT from 90s to 1s, packed in a native macOS app.
CVE scanning for cached packages beats plain disk cleanup tools.
Cache-aware execution cuts eval costs while tracking grounding and relevance metrics.
Binary JSON with table reuse, but CBOR and MessagePack already own this space.