Wordchipper – Rust BPE tokenizer, 9x faster than tiktoken
Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.
Universal (general sequence) Byte-Pair Encoding
Novel BPE variant using tf-idf scoring produces shorter encodings than classic.
ML engineers and NLP researchers
tiktoken · sentencepiece · Hugging Face tokenizers
Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.
No-friction map link sharing across platforms—but link shorteners and native map apps already handle this.
Token-efficient word IDs for LLMs, but it's a narrow utility library.
Ancient geometry meets Fourier analysis—neat math, but application to LLMs and databases unproven.
One API unifies prompt tuning, code optimization, and blackbox search—beats domain-specific tools.
Replaces UUIDs with space-separated words to slash token costs in LLM prompts and tool calls.