Agent-skills-eval – Test whether Agent Skills improve outputs
Lightweight A/B testing for SKILL.md files when LangSmith feels too heavy.
156 test docs, 16 AI skills, 1 human: honest experiment in AI-native SDLC, not production auth yet.
Researchers in AI-driven development, enterprises seeking Auth0 alternatives, skeptics of AI code quality
Auth0 · Keycloak · Okta
Lightweight A/B testing for SKILL.md files when LangSmith feels too heavy.
Security scanning catches data exfiltration before skills go live.
Auto-generates API tests from OpenAPI specs when Schemathesis and Postman already exist.
Keycloak replacement for dev testing that's actually lightweight and YAML-configurable.
Curated prompt library when dozens of skill packs already exist for Claude.
Agent-native eval workflow beats LangSmith's manual dashboard setup.