Tri·TFM Lens – 5-axis quality evaluation for ChatGPT/Gemini responses

Name: Tri·TFM Lens – 5-axis quality evaluation for ChatGPT/Gemini responses
Author: siris950

by siris950·Mar 6, 2026·2 points·1 comment

View on HN

AI Analysis

●MidBig BrainEye Candy

5-axis evaluation with calibrated Fact scoring (r=0.96 across models), but shallow actionability.

Strengths

•Novel calibration method for Fact axis (falsifiability ceiling) transfers across models with r=0.96 correlation
•Surprising empirical findings (RLHF compensation for shallow prompts) suggest real insight
•Clean UI with Balance score abstraction (STABLE/DRIFTING/DOM) is intuitive

Weaknesses

•No clear use case: color-coded scores don't suggest how to improve responses, only critique them
•No link or working demo provided; unclear if extension is actually available to install or just a concept

Post Description

I built a Chrome extension that evaluates AI chatbot responses across 5 dimensions: Emotion (tone fit), Fact (verifiability), Narrative (structure), Depth (explains WHY vs just WHAT), and Bias (directional framing).

One click next to any ChatGPT or Gemini response → 2 seconds → full quality profile with a Balance score (STABLE/DRIFTING/DOM).

Some results that surprised me:

- "How are you?" → DRIFTING. High emotion, zero facts, zero depth. - "Why don't antibiotics work on viruses?" → STABLE, Fact=0.95, Depth=0.75 - Persuasive prompts → Bias=+0.72. The model doesn't pretend to be neutral. - Philosophical answers → Fact=0.40 even with citations. Citing Kant doesn't make unfalsifiable claims verifiable.

The Fact axis uses a 3-step calibration: classify the question as falsifiable or not → apply a ceiling → score within it. This transfers across models at r=0.96.

Interesting negative finding: RLHF-trained models compensate for shallow prompts by adding unsolicited explanations. The Depth axis rubric works (5/5 on controlled responses) but in practice models over-explain everything.

Stack: Manifest V3, vanilla JS, Gemini Flash API as judge, Balance computed client-side. Uses your own API key, no data stored.

Research paper with full methodology and 100-prompt validation available on request.

Similar Projects

SaaS●Mid

I scanned 35 SaaS products across ChatGPT, Claude, Perplexity, Gemini

AI representation auditing for SaaS, but the category already has SEO/discovery tools.

Solve My Problem

gissurthor

203mo ago

Productivity●●Solid

Desktop Agent Center – Local AI Automation via Hotkeys

Global hotkey to AI with zero API costs, but just wraps existing web UIs.

Solve My ProblemNiche Gem

Tint6666

1128d ago

Data●Mid

Measuring brand share in AI answers – a Y Combinator case study

Thorough methodology, but it's a one-off report—no recurring product or unique insight beyond the YC analysis.

Big Brain

roman10

433mo ago

AI/ML●●Solid

A multi-model interface where LLMs debate with each other

Orchestrates real-time skepticism between models to catch hallucinations before you see them.

Solve My ProblemShip It

capibara13

4920d ago

Developer Tools●●Solid

AsdPrompt – Vimium-style keyboard navigation for AI chat responses

Vimium for Claude responses—select and drill-down entirely from keyboard, but narrow audience.

SlickSolve My Problem

contrary2belief

203mo ago

Developer Tools●●●Banger

Geo-lint – open-source linter for GEO (AI search visibility)

First GEO linter catches what AI agents actually cite. 92 automated rules for ChatGPT, Perplexity, Gemini.

Zero to OneSolve My ProblemNiche Gem

ijonis

113mo ago