Digest AI vs HN About

GitHub Repository

7 starsTypeScript

A/B test your own VLMs for document parsing (Self-hosted Arena)

by matthew624·Feb 19, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemSlickNiche Gem

Document parsing A/B test arena with ELO ranking—niche but real alternative to OCR Arena.

Strengths

•Fair blind battle design with weighted matchmaking ensures underrepresented models get tested equally.
•Real-time token streaming via SSE + Markdown/LaTeX rendering makes result comparison immediate and readable.
•Docker one-click deploy + multi-provider support (Anthropic, OpenAI, Ollama, etc.) lowers friction significantly.

Weaknesses

•Solves a real pain point but audience is narrow: teams with self-hosted VLMs and private document sensitivity.
•90% Claude Code attribution raises questions about custom architecture vs. scaffolded boilerplate.

Category

Developer Tools

Target Audience

ML engineers evaluating custom document parsing models, teams comparing VLMs privately

Similar To

OCR Arena · Hugging Face Model Arena

Similar Projects

AI/ML●●Solid

Imagedojo.ai – Blind arena for Google, OpenAI, and xAI image generators

LMSYS Arena for images, but the leaderboard lacks volume—359 images doesn't drive statistical confidence.

Crowd PleaserEye Candy

vtail

154mo ago

AI/ML●●Solid

Stickblade Arena – I made two LLMs sword-fight to benchmark them

Physics-based sword fighting creates a fun, visual blind test that breaks the standard chat benchmark mold.

Crowd PleaserCozy

pioneer37

2011d ago

AI/ML●●●Banger

ParseBench – Document parsing benchmark for AI agents

First benchmark measuring semantic correctness over text similarity for document parsing.

Big BrainDark Horse

pierre

953mo ago

AI/ML●●Solid

I built a blind taste test for Claude and Codex designs

Fun blind taste test revealing distinct aesthetic biases in AI models.

Rabbit HoleCozy

rubenflamshep

403d ago

AI/ML●●Solid

Local Document Parsing for Agents

LlamaIndex open-sources their parser core, but LlamaParse cloud still handles complex layouts.

Solve My ProblemShip It

cheesyFish

2014mo ago

AI/ML●●Solid

I blind-test 6 LLMs daily by having them summarize the same story

Crowd-sourced LLM leaderboard that actually tests summarization quality instead of static benchmarks.

Dark HorseNiche Gem

SnipVote

1013d ago