Back to browse
GitHub Repository

Reproducible tiktoken benchmark showing the hidden language tax in LLM APIs. Same content, 8 languages, 1.5x-3.3x token cost ratio.

1 starsPython

Reproducible benchmark – OpenAI charges 1.5x-3.3x more for non-English

by vfalbor·Apr 20, 2026·1 point·0 comments

AI Analysis

●●SolidDark HorseBig Brain

Exposes 230% Arabic token tax that nobody talks about in pricing.

Strengths
  • Reproducible benchmark with no secrets needed—anyone can verify the numbers.
  • Quantifies real cost impact: $5K vs $16K monthly at 1M requests.
  • Explains BPE tokenization mechanics behind the language bias clearly.
Weaknesses
  • Benchmark only, not a tool—useful insight but nothing to integrate.
  • Doesn't propose mitigation strategies beyond acknowledging the problem.
Category
Target Audience

ML engineers, product teams with multilingual users

Similar To

tiktoken · OpenAI pricing calculator

Similar Projects