Back to browse
GitHub Repository

Benchmark your CLAUDE.md against your own PRs

66 starsPython

Mdarena – Benchmark your Claude.md against your own PRs

by hudsongr·Apr 5, 2026·22 points·4 comments

AI Analysis

●●●BangerBig BrainSolve My Problem

Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.

Strengths
  • Uses real test execution like SWE-bench, not LLM-as-judge flakiness.
  • Auto-detects test commands from CI configs across multiple languages.
  • Statistical significance testing with paired t-tests for real comparisons.
Weaknesses
  • Niche audience — only matters if you're already using CLAUDE.md files.
  • No Windows support mentioned, Unix-centric tooling throughout.
Target Audience

Developers using Claude Code with CLAUDE.md agent configuration files

Similar To

SWE-bench · Aider · Claude Code

Similar Projects

AI/ML●●●Banger

Llama CPU Benchmarks

Proves speculative decoding slows down 4B models on 4-core CPUs despite marketing claims.

Big BrainDark Horse
muthuishere
2023d ago