Analytics that tells AI product teams where their AI fails user

Name: Analytics that tells AI product teams where their AI fails user
Availability: InStock
Author: Lindadao

by Lindadao·Feb 19, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidShip ItBig Brain

Langfuse/Helicone angle—LLM-as-judge quality scoring—but no live product or differentiation yet.

Strengths

•Three-worker pipeline (intent classifier, quality scorer, task completion detector) is thoughtful architecture for multi-axis evaluation.
•Demo data reveals actionable insight (API integration failure at 75%, scaffolding success at 78%)—shows how the analytics could guide product decisions.
•Designed for PMs first (insights layer, revenue-at-risk estimates) rather than engineers drilling logs.

Weaknesses

•Entirely conceptual: sample data only, no live product, no real user conversations ingested—validation is a deck, not shipping code.
•Langfuse and Helicone already ship quality scoring, session replay, and retention metrics; unclear what unique insight Convometrics adds beyond 'we also run GPT-4o-mini as judge.'

Post Description

Traditional analytics tracks clicks. For AI products, you need to know: what was the user trying to do, did the AI help, and did they succeed?

I built a demo of this. It ingests AI conversations and runs 3 workers (GPT-4o-mini): intent classifier, quality scorer (LLM-as-judge), and task completion detector. Results show up in a dashboard designed for PMs, not engineers.

Stack: Python SDK (zero deps, async) → FastAPI → Supabase → GPT-4o-mini workers → Next.js dashboard.

Demo with sample data (not live product, validating the concept): https://dashboard-xi-taupe-75.vercel.app

The sample data models an AI app builder. Interesting patterns: scaffolding works great (78% success), but API integrations fail 75% of the time, and users who enter bug-fix loops almost always churn.

Key design question: is the "insights layer" (auto-generated recommendations, revenue-at-risk estimates, root cause identification) valuable enough to differentiate from Langfuse/Helicone adding product analytics to their existing tracing tools?

Looking for honest feedback, especially from AI product builders.

Similar Projects

SaaS●●Solid

I built a SaaS analytics tool because I got tired of GA4

GA4 + Stripe unified dashboard, but PostHog and Mixpanel already own this space.

Solve My ProblemShip It

wscld

203mo ago

SaaS●●Solid

Upvotics – Track Reddit conversations where people need your product

Tracks conversations over time and surfaces intent (questions, complaints, competitor mentions) rather than one-off keyword hits, which is the right mental model for hunting leads. The rule-checker and in-browser AI composer are smart UX moves — helping you avoid ban-happy mods while giving ready-to-post suggestions. It isn't reinventing social listening, but those subreddit-aware touches make it actually usable for Reddit outreach if the detection and moderation logic hold up.

Niche GemShip It

Yaramsa-Gautham

103mo ago

SaaS●Mid