1gbps Tokenizer written in Assembly. 20x faster than HuggingFace
Handwritten assembly tokenizer claiming 20x speedup over HuggingFace on SSE2.

20x faster MoE inference on existing hardware with hash-verified output correctness.
AI infrastructure engineers, ML researchers, hyperscalers
cuBLAS · vLLM · TensorRT-LLM
Handwritten assembly tokenizer claiming 20x speedup over HuggingFace on SSE2.
20x faster knip—performance leap is real, but dependency linters are crowded and knip already solved this.
450k context on 32GB VRAM using turboquant KV cache compression.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.
Turns codebase scanning into assigned tasks with manager progress tracking.
Native Swift inference with SSD streaming runs 100B MoE models without kernel panics.