Digest AI vs HN About

GitHub Repository

Benchmark local LLM inference speed (tokens/sec) on your own hardware — llama.cpp native + cloud APIs, 124-model catalog, optimal-quant picker, and an MCP serve mode.

2 starsPython

InferBench – Benchmark local LLM engines with one click

by JoniMartin·Jun 5, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidShip ItSolve My Problem

One-click LLM benchmarking with real tok/s metrics when llama.cpp requires manual setup.

Strengths

•Auto-bootstrap downloads engine binaries and GGUF models without Python or Node.
•Measures real metrics: TTFT, throughput, VRAM peak, and offline quality scoring.
•Cross-platform installers for Windows, macOS, and Linux with embedded FastAPI backend.

Weaknesses

•LLM benchmarking tools already exist in various forms across the ecosystem.
•Electron overhead for a benchmarking tool that could be lighter.

Category

Target Audience

Developers running local LLMs, hardware enthusiasts

Similar To

llama.cpp benchmarks · LM Studio · Ollama

Similar Projects

AI/ML●Mid

Ebbforge - 10M agent Rust swarm engine, 8 fundamental benchmarks

Rust swarm vs LLM agents is clever positioning, but benchmarks are self-designed and lack third-party validation.

Big BrainWizardry

agent-world

214mo ago

AI/ML●●●●Gem

New Benchmark from SWE-bench team is 0% solved

Agents fail completely at rebuilding binaries from scratch without source code.

Big BrainBold BetZero to One

lieret

2432mo ago

AI/ML●●●Banger

LLM Sycophancy Benchmark: Opposite-Narrator Contradictions

Opposite-narrator test catches models agreeing with both sides of same dispute.

Big BrainDark Horse

zone411

304mo ago

Developer Tools●●Solid

UFFS, a Rust NTFS search engine benchmarked against Everything

Direct MFT parsing beats Everything by 2.8× in 30/30 benchmark cells.

WizardrySolve My Problem

rnio

1113d ago

AI/ML●●Solid

LLM Debate Benchmark

Side-swapped debate matchups expose model weaknesses standard benchmarks miss.

Big BrainDark Horse

zone411

934mo ago

AI/ML●●Solid

ErrataBench - A Proofreading Benchmark for LLMs

51 models, 1613 runs, $558 spent — finally proofreading benchmarks with real numbers.

Niche GemBig Brain

artursapek

303mo ago