GitHub Repository

Lint, benchmark, and score your AI coding instructions. Stop guessing, start measuring.

4 starsTypeScript

agenteval – static analysis for AI coding instruction file

Name: agenteval – static analysis for AI coding instruction file
Availability: InStock
Author: lukasm1703

by lukasm1703·Apr 3, 2026·7 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Finally treats AI instructions like code—with linting, benchmarks, and CI gates.

Strengths

•Harvest command builds eval tasks from git history automatically
•Catches dead references, contradictions, and token budget overruns statically
•Self-contained binary requires no runtime—curl install and go

Weaknesses

•Emerging category means unclear long-term adoption among teams
•Benchmark quality depends on harvested task relevance to your codebase

Similar Projects

Developer Tools●●●Banger

Mdarena – Benchmark your Claude.md against your own PRs

Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.

Big BrainSolve My Problem

hudsongr

2243mo ago

Developer Tools●●Solid

Claude Pilot – Claude Code is powerful. Pilot makes it reliable

TDD enforcement and context preservation for Claude Code workflows, but hooks-on-every-edit pattern is established.

Solve My Problem

rittermax

205mo ago

Developer Tools●Mid

Skylos – A Python dead code finder benchmarked against 9 libraries

Benchmarked dead code finder across FastAPI, Pydantic, Flask—but Vulture, Bandit already solve this.

Solve My Problem

duriantaco

314mo ago

Developer Tools●●●Banger

Cheddar-bench – unsupervised benchmark for coding agents

Unsupervised bug benchmark using agents as both attackers and defenders—novel scoring methodology.

Big BrainWizardryShip It

przadka

904mo ago

Developer Tools●●Solid

Singularity-Claude – Self-Evolving Skills for Claude Code

Recursive repair loops improve skills automatically, unlike static Claude Code defaults.

Big BrainNiche GemShip It

shmayro

214mo ago

Developer Tools●●Solid

A Claude Code statusline that shows live World Cup scores

World Cup scores in Claude Code statusline—clever MCP integration, but tournament-limited.

CozyShip It

arturogarrido

611mo ago