AI Benchy – AI benchmarks and comparisons

Name: AI Benchy – AI benchmarks and comparisons
Availability: InStock
Author: XCSme

by XCSme·Mar 6, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidSolve My Problem

Clean leaderboard, but LMSys and HELM already solve model benchmarking comprehensively.

Strengths

•Multi-dimensional scoring (cost, latency, correctness) beyond raw benchmarks
•Covers 55 current models including frontier releases like Gemini 3 Flash
•Responsive design with language localization (15+ languages) and deep-dive model pages

Weaknesses

•No novel evaluation methodology—mirrors existing benchmark frameworks (MMLU, code tasks)
•No explanation of scoring weights or test rigor; methodology page content missing
•Crowded category: LMSys Chatbot Arena, HELM, Hugging Face Leaderboards already dominate

Post Description

My last submission didn't gain any traction, and since I improved the platform A LOT. I am really happy with how it turned out, I am really focused on small UX things, on desktop it's quite fun to just play around, especially on the model page: https://aibenchy.com/model/google-gemini-3-flash-preview-med...