GitHub Repository

Benchmark your CLAUDE.md against your own PRs

66 starsPython

Mdarena – Benchmark your Claude.md against your own PRs

Name: Mdarena – Benchmark your Claude.md against your own PRs
Availability: InStock
Author: hudsongr

by hudsongr·Apr 5, 2026·22 points·4 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.

Strengths

•Uses real test execution like SWE-bench, not LLM-as-judge flakiness.
•Auto-detects test commands from CI configs across multiple languages.
•Statistical significance testing with paired t-tests for real comparisons.

Weaknesses

•Niche audience — only matters if you're already using CLAUDE.md files.
•No Windows support mentioned, Unix-centric tooling throughout.

Similar Projects

Developer Tools●Mid

Mo – checks GitHub PRs against decisions approved in Slack

Slack-to-PR decision tracking, but landing page shows a different product entirely.

Solve My ProblemNiche Gem

oscarcaldera

202mo ago

Developer Tools●Mid

Mo – checks GitHub PRs against decisions approved in Slack

Slack-to-PR decision tracking, but landing page shows a different product entirely.

Solve My ProblemNiche Gem

oscarcaldera

1032mo ago

AI/ML●●Solid

Fine-tuned 3B outperforms Claude Haiku on constrained generation

Fine-tuned 3B Qwen matches Haiku on jokes, validating small models for constrained agent tasks.

Big BrainNiche Gem

serendip-ml

103mo ago

Developer Tools●Mid

Skylos – A Python dead code finder benchmarked against 9 libraries

Benchmarked dead code finder across FastAPI, Pydantic, Flask—but Vulture, Bandit already solve this.

Solve My Problem

duriantaco

313mo ago

AI/ML●●●Banger

Llama CPU Benchmarks

Proves speculative decoding slows down 4B models on 4-core CPUs despite marketing claims.

Big BrainDark Horse

muthuishere

2023d ago

AI/ML●●●Banger

OpenCastor Agent Harness Evaluator Leaderboard

263k config search space benchmarked across robot fleets—nothing like this exists for robotics AI.

Zero to OneBig BrainNiche Gem

craigm26

312mo ago