GitHub Repository

A memory layer that tracks evidence, claims, and decisions to make multi-turn LLM judges and reviewer agents more inspectable and stable.

2 starsPython

I made a small helper for checking model-graded answers

Name: I made a small helper for checking model-graded answers
Availability: InStock
Author: ML0037

by ML0037·Jun 14, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My ProblemDark Horse

Structurally verifies LLM judge reasoning instead of paying for a second model check.

Strengths

•Structural verification flags ignored references without needing a second LLM judge call.
•Cites specific research papers on judge bias to justify the design decisions clearly.
•Local CLI viewer allows inspecting flagged runs immediately without cloud dashboard setup.

Weaknesses

•Web dashboard mentioned in README appears unfinished compared to the local CLI viewer.
•Requires modifying eval prompts to fit the claim-evidence structure, adding integration friction.

Post Description

I made this while checking model graded answer and helped me to check the odd cases by hand. Not sure if it’s useful to anyone else.

TL;DR: it breaks an LLM judge run into claims->evidence->verdicts and flags when a verdict is not supported by the evidence, so i can check it manually.