I built a live race between LLMs to see which generates tokens fastest

Name: I built a live race between LLMs to see which generates tokens fastest
Availability: InStock
Author: baristaGeek

by baristaGeek·Mar 4, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidEye Candy

Live drag race UI, but latency benchmarks are API-dependent, not architecturally novel.

Strengths

•Visually engaging real-time race format makes benchmarking fun and memorable
•Measures both TTFT (time to first token) and TPS (throughput), two critical metrics
•Works today: public live demo with current model versions

Weaknesses

•Results vary by geography, network, time-of-day—single run is snapshot, not reliable comparative data
•No statistical significance testing, variance analysis, or methodology transparency—PR fodder for vendors

Similar Projects

Developer Tools●●Solid

TokenAdvisor – paste a prompt, see what to cut to lower your LLM bill

Multi-vendor token comparison with specific cut recommendations and dollar savings at scale.

Solve My ProblemSlick

Emadiali83

211mo ago

Open Source●●Solid

XML, Markdown, or JSON: Which gives LLMs the most reliable boundaries?

Settles the delimiter format debate with data—Markdown fails under adversarial inputs on MiniMax.

Big Brain

systima

324mo ago

AI/ML●●Solid

Sentinel – browser agent using 3x+ fewer tokens (open benchmark)

Self-benchmark shows Sentinel uses 57x fewer tokens than browser-use on hard tasks.

Big BrainNiche Gem

isoldex

112mo ago

Developer Tools●●Solid

ccclub – See which of your friends is burning the most on Claude Code

Pulls data straight from ~/.claude/projects and only uploads aggregated metrics (tokens, cost, calls) via a 6-letter invite code flow — nice and surgical. The one-command npx init + join UX and a show-data privacy audit button make adoption trivial, but it’s strictly useful only to groups already using Claude Code and requires trust that aggregated uploads are enough for your threat model.

Niche GemSlick

mazzystar

205mo ago

AI/ML●Mid

Benchmark multiple LLMs to compare quality, speed, and cost

Yet another prompt benchmarking UI when Promptfoo and LangSmith already exist.

SlickShip It

henriklipp

303mo ago

AI/ML●●Solid

Preseason – see which developer tools each LLM picks

Tracks which dev tools AI agents actually choose across thousands of prompts.

Dark HorseNiche Gem

betocmn

103mo ago