ACE – A dynamic benchmark measuring the cost to break AI agents
Measures AI agent security in dollars to exploit, not just binary pass or fail rates.
Reproducible tiktoken benchmark showing the hidden language tax in LLM APIs. Same content, 8 languages, 1.5x-3.3x token cost ratio.
Exposes 230% Arabic token tax that nobody talks about in pricing.
ML engineers, product teams with multilingual users
tiktoken · OpenAI pricing calculator
Measures AI agent security in dollars to exploit, not just binary pass or fail rates.
Nine times faster than tiktoken-rs with swappable lexer backends for benchmarking.
Fixes multilingual token waste by translating to English before Claude, not after.
Opposite-narrator test catches models agreeing with both sides of same dispute.
Predicts RAG benchmark transfer failure using vocabulary specificity—no embeddings needed.
51 models, 1613 runs, $558 spent — finally proofreading benchmarks with real numbers.