I scanned 35 SaaS products across ChatGPT, Claude, Perplexity, Gemini
AI representation auditing for SaaS, but the category already has SEO/discovery tools.
5-axis evaluation with calibrated Fact scoring (r=0.96 across models), but shallow actionability.
Users of ChatGPT and Gemini who want quality assessment of AI responses
OpenAI's response quality metrics (internal) · Anthropic's constitutional AI evaluation · ScamScore fraud detection frameworks
One click next to any ChatGPT or Gemini response → 2 seconds → full quality profile with a Balance score (STABLE/DRIFTING/DOM).
Some results that surprised me:
- "How are you?" → DRIFTING. High emotion, zero facts, zero depth. - "Why don't antibiotics work on viruses?" → STABLE, Fact=0.95, Depth=0.75 - Persuasive prompts → Bias=+0.72. The model doesn't pretend to be neutral. - Philosophical answers → Fact=0.40 even with citations. Citing Kant doesn't make unfalsifiable claims verifiable.
The Fact axis uses a 3-step calibration: classify the question as falsifiable or not → apply a ceiling → score within it. This transfers across models at r=0.96.
Interesting negative finding: RLHF-trained models compensate for shallow prompts by adding unsolicited explanations. The Depth axis rubric works (5/5 on controlled responses) but in practice models over-explain everything.
Stack: Manifest V3, vanilla JS, Gemini Flash API as judge, Balance computed client-side. Uses your own API key, no data stored.
Research paper with full methodology and 100-prompt validation available on request.
AI representation auditing for SaaS, but the category already has SEO/discovery tools.
Global hotkey to AI with zero API costs, but just wraps existing web UIs.
Thorough methodology, but it's a one-off report—no recurring product or unique insight beyond the YC analysis.
Orchestrates real-time skepticism between models to catch hallucinations before you see them.
Vimium for Claude responses—select and drill-down entirely from keyboard, but narrow audience.
First GEO linter catches what AI agents actually cite. 92 automated rules for ChatGPT, Perplexity, Gemini.