GitHub Repository

Reproducible tiktoken benchmark showing the hidden language tax in LLM APIs. Same content, 8 languages, 1.5x-3.3x token cost ratio.

1 starsPython

Reproducible benchmark – OpenAI charges 1.5x-3.3x more for non-English

Name: Reproducible benchmark – OpenAI charges 1.5x-3.3x more for non-English
Availability: InStock
Author: vfalbor

by vfalbor·Apr 20, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidDark HorseBig Brain

Exposes 230% Arabic token tax that nobody talks about in pricing.

Strengths

•Reproducible benchmark with no secrets needed—anyone can verify the numbers.
•Quantifies real cost impact: $5K vs $16K monthly at 1M requests.
•Explains BPE tokenization mechanics behind the language bias clearly.

Weaknesses

•Benchmark only, not a tool—useful insight but nothing to integrate.
•Doesn't propose mitigation strategies beyond acknowledging the problem.

Similar Projects

Security●●Solid

ACE – A dynamic benchmark measuring the cost to break AI agents

Measures AI agent security in dollars to exploit, not just binary pass or fail rates.

Big Brain

zachdotai

932mo ago

AI/ML●●●Banger

Wordchipper – Rust BPE tokenizer, 9x faster than tiktoken

Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.

WizardryBig Brain

antimora

202mo ago

Developer Tools●●Solid

Claude-ts – Translation proxy to fix non-English token waste in Claude

Fixes multilingual token waste by translating to English before Claude, not after.

Solve My ProblemBig Brain

kiimdonglin

403mo ago

AI/ML●●●Banger

LLM Sycophancy Benchmark: Opposite-Narrator Contradictions

Opposite-narrator test catches models agreeing with both sides of same dispute.

Big BrainDark Horse

zone411

303mo ago

AI/ML●●Solid

Ragprobe – measure RAG domain difficulty before deploying,no embeddings

Predicts RAG benchmark transfer failure using vocabulary specificity—no embeddings needed.

Big BrainNiche Gem

metawake

102mo ago

AI/ML●●Solid

ErrataBench - A Proofreading Benchmark for LLMs

51 models, 1613 runs, $558 spent — finally proofreading benchmarks with real numbers.

Niche GemBig Brain

artursapek

302mo ago