GitHub Repository

A memory layer that tracks evidence, claims, and decisions to make multi-turn LLM judges and reviewer agents more inspectable and stable.

3 starsPython

I made a small helper for checking model-graded answers

Name: I made a small helper for checking model-graded answers
Availability: InStock
Author: ML0037

by ML0037·Jun 18, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Flags LLM judge verdicts unsupported by evidence without needing a second model.

Strengths

•Breaks judge runs into claims-evidence-verdicts chains for manual inspection.
•Detects position bias, verbosity bias, and rubric coverage gaps automatically.
•Local CLI viewer flags problematic verdicts without adding inference costs.

Weaknesses

•Niche audience limits adoption to researchers doing serious LLM eval work.
•Web dashboard mentioned but not yet implemented in the current release.

Post Description

I made this while checking model graded answer and helped me to check the odd cases by hand. Not sure if it’s useful to anyone else.

TL;DR: it breaks an LLM judge run into claims->evidence->verdicts and flags when a verdict is not supported by the evidence, so i can check it manually

Similar Projects

AI/ML●●●Banger

I made a small helper for checking model-graded answers

Structurally verifies LLM judge reasoning instead of paying for a second model check.

Big BrainSolve My ProblemDark Horse

ML0037

204d ago

AI/ML●Mid

mmcheck -- Check if a model supports multimodal inputs.

Useful utility but checking HuggingFace cards directly works too.

Niche Gem

init0

102mo ago

Developer Tools●●●Banger

AgentCost – Track, control, and optimize your AI spending (MIT)

One-line wrapping eliminates invisible LLM spend; real cost forecasting and model recommendations.

Solve My ProblemSlick

agentcostin

313mo ago

AI/ML●●Solid

DocForge – Multi-Agent RAG That Fact-Checks Its Own Answers

Multi-agent fact-checking loop, but RAG hallucination fixes are table stakes now.

Big BrainShip It

toheed11

114mo ago

Developer Tools●Mid

DBWarden – A database migration tool for Python/SQLAlchemy projects

SQLAlchemy migration tool, but Alembic already dominates this space completely.

Dark Horse

emi_gandini

103mo ago

SaaS●●Solid

I built a tool to check if side hustle income claims are real

Creator incentive classification beats generic fact-checkers at spotting course scams.

Solve My ProblemNiche Gem

ynxshiny

204d ago