Rubric – test what your LLM agent did, not just what it said

Name: Rubric – test what your LLM agent did, not just what it said
Availability: InStock
Author: kareemrashed

by kareemrashed·Jun 12, 2026·1 point·0 comments

AI Analysis

●●SolidSolve My Problem

Tests tool calls and trace quality when LangSmith only checks output strings.

Strengths

•Zero required dependencies means fully local evaluation without API calls.
•LangGraph integration extracts tool calls from existing message structures automatically.

Weaknesses

Testing framework for AI agents with LLM judges and SQLite result tracking.

Solve My ProblemShip It

fdefitte

314mo ago

Security●●●●Gem

Proves text safety ≠ tool-call safety; catches hidden harmful executions deterministically.

Zero to OneBig BrainWizardry

acartag7

203mo ago

AST-based validation for function calling tests, but BFCL already covers this ground.

Ship ItNiche Gem

gauravvij137

303mo ago

VCR for LLM calls—eliminates API costs and non-determinism in agent testing.

Solve My ProblemShip ItSlick

beyhang

103mo ago

pytest-native testing for AI agents with 101 built-in safety attack probes.

Solve My ProblemSlick

xydac

301mo ago

Jest for LLMs—CI-native eval that fails builds on quality drops, not dashboards.

Ship ItSolve My ProblemBig Brain

fdefitte

303mo ago