JazzBench, an LLM reasoning benchmark using jazz improvisation

Name: JazzBench, an LLM reasoning benchmark using jazz improvisation
Availability: InStock
Author: mikerubini

by mikerubini·Jun 9, 2026·2 points·0 comments

AI Analysis

○PassBig Brain

Interesting eval philosophy, but this is a blog post with no shipped code or tool.

Strengths

Weaknesses

AI/ML●●●Banger

62k puzzle benchmark reveals reasoning depth, cost variance, and stark US vs China model gaps.

Big BrainCrowd PleaserSolve My Problem

bluecoconut

503mo ago

AI/ML●●●Banger

Mafia-as-benchmark with learning-between-batches mechanism; public, inspectable sessions.

Zero to OneBig BrainWizardry

kkonstantin

103mo ago

AI/ML●●●Banger

Sealed-batch auctions remove inference speed bias from LLM trading benchmarks.

Big BrainDark Horse

Entropnt

401mo ago

AI/ML●●●Banger

Cuts token costs 70% with receipts proving no accuracy drop on hard evals.

Zero to OneSolve My Problem

Jbunga

563327d ago

AI/ML●●●Banger

Task-specific LLM benchmarking beats generic leaderboards that ignore your actual workload.

Big BrainDark HorseZero to One

gauravvij137

303mo ago

AI/ML●Mid

Civilization matches expose model divergence that static benchmarks miss—but it's a spectacle, not a measurement.

Rabbit HoleBig Brain

mbh159

12243mo ago