Digest AI vs HN About

We benchmarked 18 LLMs on OCR (7K+ calls) – cheaper models win

We benchmarked 18 LLMs on OCR (7K+ calls) – cheaper models win

by TimoKerr·Apr 22, 2026·5 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

7,560 runs proving cheaper models beat expensive ones on production OCR tasks.

Strengths

•pass^n consistency metric measures reliability across repeated runs, not single-shot accuracy
•Cost per successful outcome metric actually matters for production budgeting
•42 real business documents across receipts, invoices, logistics — not synthetic data

Weaknesses

•Limited to 42 document types, may not cover edge cases in specific industries
•Benchmark is the product — no actual OCR tool, just evaluation data

Category

Target Audience

Developers building OCR pipelines, ML engineers

Similar To

LangSmith · HELM · Papers With Code leaderboards

Similar Projects

AI/ML●●Solid

I benchmarked how good LLMs are at proofreading English

Agent loop proofreading evals where HELM and LMSys are too generic.

Solve My ProblemShip It

artursapek

321mo ago

AI/ML●●●Banger

Benchmarking LLMs through autonomous games of Blood on the Clocktower

Social deduction games test deception and theory of mind better than standard benchmarks.

Rabbit HoleCrowd PleaserZero to One

cjami

102mo ago

AI/ML●●Solid

LLM Debate Benchmark

Side-swapped debate matchups expose model weaknesses standard benchmarks miss.

Big BrainDark Horse

zone411

932mo ago

AI/ML●●●Banger

Llama CPU Benchmarks

Proves speculative decoding slows down 4B models on 4-core CPUs despite marketing claims.

Big BrainDark Horse

muthuishere

2024d ago

Developer Tools●Mid

OpenCode Benchmark Dashboard

Benchmarks OpenCode models locally, but lacks preloaded datasets and only works with configured OpenAI-compatible APIs.

Niche Gem

grigio

103mo ago

AI/ML●●Solid

WebGPU LLM inference comprehensive benchmark

Sequential-dispatch methodology corrects 20x overestimation in prior WebGPU benchmarks.

Big BrainNiche Gem

yu3zhou4

222mo ago