Back to browse

Tri·TFM Lens – 5-axis quality evaluation for ChatGPT/Gemini responses

by siris950·Mar 6, 2026·2 points·1 comment

AI Analysis

MidBig BrainEye Candy

5-axis evaluation with calibrated Fact scoring (r=0.96 across models), but shallow actionability.

Strengths
  • Novel calibration method for Fact axis (falsifiability ceiling) transfers across models with r=0.96 correlation
  • Surprising empirical findings (RLHF compensation for shallow prompts) suggest real insight
  • Clean UI with Balance score abstraction (STABLE/DRIFTING/DOM) is intuitive
Weaknesses
  • No clear use case: color-coded scores don't suggest how to improve responses, only critique them
  • No link or working demo provided; unclear if extension is actually available to install or just a concept
Category
Target Audience

Users of ChatGPT and Gemini who want quality assessment of AI responses

Similar To

OpenAI's response quality metrics (internal) · Anthropic's constitutional AI evaluation · ScamScore fraud detection frameworks

Post Description

I built a Chrome extension that evaluates AI chatbot responses across 5 dimensions: Emotion (tone fit), Fact (verifiability), Narrative (structure), Depth (explains WHY vs just WHAT), and Bias (directional framing).

One click next to any ChatGPT or Gemini response → 2 seconds → full quality profile with a Balance score (STABLE/DRIFTING/DOM).

Some results that surprised me:

- "How are you?" → DRIFTING. High emotion, zero facts, zero depth. - "Why don't antibiotics work on viruses?" → STABLE, Fact=0.95, Depth=0.75 - Persuasive prompts → Bias=+0.72. The model doesn't pretend to be neutral. - Philosophical answers → Fact=0.40 even with citations. Citing Kant doesn't make unfalsifiable claims verifiable.

The Fact axis uses a 3-step calibration: classify the question as falsifiable or not → apply a ceiling → score within it. This transfers across models at r=0.96.

Interesting negative finding: RLHF-trained models compensate for shallow prompts by adding unsolicited explanations. The Depth axis rubric works (5/5 on controlled responses) but in practice models over-explain everything.

Stack: Manifest V3, vanilla JS, Gemini Flash API as judge, Balance computed client-side. Uses your own API key, no data stored.

Research paper with full methodology and 100-prompt validation available on request.

Similar Projects

Productivity●●Solid

Desktop Agent Center – Local AI Automation via Hotkeys

Global hotkey to AI with zero API costs, but just wraps existing web UIs.

Solve My ProblemNiche Gem
Tint6666
1128d ago