Back to browse
GitHub Repository

Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.

10 starsPython

Rubric – test what your LLM agent did, not just what it said

by kareemrashed·Jun 12, 2026·1 point·0 comments

AI Analysis

●●SolidSolve My Problem

Tests tool calls and trace quality when LangSmith only checks output strings.

Strengths
  • Zero required dependencies means fully local evaluation without API calls.
  • LangGraph integration extracts tool calls from existing message structures automatically.
Weaknesses
  • Agent evaluation is crowded with LangSmith, Arize Phoenix, and others.
  • Limited to LangGraph; no native support for AutoGen or CrewAI agents.
Category
Target Audience

LLM agent developers, AI engineering teams

Similar To

LangSmith · Arize Phoenix · Braintrust

Similar Projects