Back to browse
I built a live race between LLMs to see which generates tokens fastest

I built a live race between LLMs to see which generates tokens fastest

by baristaGeek·Mar 4, 2026·1 point·1 comment

AI Analysis

●●SolidEye Candy

Live drag race UI, but latency benchmarks are API-dependent, not architecturally novel.

Strengths
  • Visually engaging real-time race format makes benchmarking fun and memorable
  • Measures both TTFT (time to first token) and TPS (throughput), two critical metrics
  • Works today: public live demo with current model versions
Weaknesses
  • Results vary by geography, network, time-of-day—single run is snapshot, not reliable comparative data
  • No statistical significance testing, variance analysis, or methodology transparency—PR fodder for vendors
Category
Target Audience

Developers choosing between LLM APIs for latency-sensitive applications

Similar To

LMSys Chatbot Arena (voting-based ranking) · OpenAI's official LLM benchmarks · Anthropic's model comparison pages

Similar Projects

Developer Tools●●Solid

ccclub – See which of your friends is burning the most on Claude Code

Pulls data straight from ~/.claude/projects and only uploads aggregated metrics (tokens, cost, calls) via a 6-letter invite code flow — nice and surgical. The one-command npx init + join UX and a show-data privacy audit button make adoption trivial, but it’s strictly useful only to groups already using Claude Code and requires trust that aggregated uploads are enough for your threat model.

Niche GemSlick
mazzystar
203mo ago