Back to browse
GitHub Repository

A Hyper Optimzed Tokenizer written in handwritten assembly. Made for SSE2 cpu architectures.

5 starsAssembly

1gbps Tokenizer written in Assembly. 20x faster than HuggingFace

by dogmaticdev·Apr 25, 2026·3 points·2 comments

AI Analysis

●●SolidWizardryNiche Gem

Handwritten assembly tokenizer claiming 20x speedup over HuggingFace on SSE2.

Strengths
  • Handwritten assembly implementation offers extreme low-level control over CPU instructions.
  • SSE2 compatibility ensures it runs on almost any x86 hardware from the last decade.
  • Claims 1gbps throughput which could significantly reduce preprocessing latency in pipelines.
Weaknesses
  • Tokenization is rarely the bottleneck compared to model inference and memory bandwidth.
  • SSE2 limits performance on modern CPUs with AVX2 or AVX-512 instructions available.
Target Audience

ML infrastructure engineers, high-throughput inference providers

Similar To

HuggingFace tokenizers · tiktoken

Similar Projects

Developer Tools●●Solid

Rev-dep – 20x faster knip.dev alternative build in Go

20x faster knip—performance leap is real, but dependency linters are crowded and knip already solved this.

Ship ItSolve My Problem
jayu_dev
46133mo ago