Digest AI vs HN About

GitHub Repository

A test runner for agentskills.io-style AI agent skills

584 starsTypeScript

Agent-skills-eval – Test whether Agent Skills improve outputs

by darkrishabh·May 7, 2026·79 points·37 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemShip It

Lightweight A/B testing for SKILL.md files when LangSmith feels too heavy.

Strengths

•Local A/B tests with judge models remove the need for heavy SaaS platforms.
•Generates static HTML reports with side-by-side output comparisons for easy debugging.
•Integrates directly into CI workflows to prevent skill regressions before deployment.

Weaknesses

•Tied to the emerging SKILL.md standard which may not gain widespread adoption.
•Judge model grading can be inconsistent depending on the chosen evaluator model.

Category

Developer Tools

Target Audience

AI agent developers and prompt engineers

Similar To

LangSmith · Arize Phoenix · PromptLayer

Similar Projects

Developer Tools●●Solid

An agent skill for eval-driven development of LLM-powered app

Agent-native eval workflow beats LangSmith's manual dashboard setup.

Big BrainShip It

yol

103mo ago

Developer Tools●Mid

Agent-evals – Claude skill to build your own evals

Claude Skill for agent evals, but LangSmith and Arize already own this.

Solve My Problem

sauercrowd

911mo ago

AI/ML●●Solid

Promptloop – create, run, and improve prompt evals from the terminal

Terminal-native prompt evals with diff proposals beats web dashboards.

Ship ItNiche Gem

velapod

13320d ago

Developer Tools●●Solid

Skill Eval – A framework for testing the quality of AI agent skills

Test suite for LLM agent skills; fills a real gap in agent eval tooling.

Solve My ProblemNiche Gem

mgechev

103mo ago

AI/ML●Mid

Claude Code skills for building LLM evals

Structured eval workflow for Claude Code when LangSmith and Braintrust already exist.

Niche GemShip It

paulaq

201mo ago

AI/ML●●Solid

Skill Lab – CLI tool for testing and optimizing agent skills

Security scanning catches data exfiltration before skills go live.

Niche GemShip It

qu4rk5314

102mo ago