Back to browse
JazzBench, an LLM reasoning benchmark using jazz improvisation

JazzBench, an LLM reasoning benchmark using jazz improvisation

by mikerubini·Jun 9, 2026·2 points·0 comments

AI Analysis

PassBig Brain

Interesting eval philosophy, but this is a blog post with no shipped code or tool.

Strengths
  • Novel framing of evals around taste and constraint satisfaction
  • Charlie Parker solos provide genuine ground truth for scoring
Weaknesses
  • No repository, no code, no way to actually run JazzBench
  • Spec and manifesto without implementation to evaluate
Category
Target Audience

LLM researchers, AI engineers building evals

Similar Projects