Wordchipper – Rust BPE tokenizer, 9x faster than tiktoken
Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.
Fast exact BPE tokenizer. Byte-identical to tiktoken, 7x faster
Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.
Phrase-mining beats tiktoken compression 1.21x with 1/3 vocab size, but niche for token optimization.
Novel BPE variant using tf-idf scoring produces shorter encodings than classic.
Token-efficient word IDs for LLMs, but it's a narrow utility library.
VP-Tree + SIMD beats Faiss 39× on exact L2, 257× on binary search.
Shows actual token boundaries visually, not just a count like other tools.