AHD – an open-source linter and eval framework for AI-generated UI
Deterministic linter for AI UI slop with 39 named tells nobody else tracks.

Pre-2022 human posts vs AI models in a crowd-sourced detection benchmark.
AI researchers, data scientists, HN users interested in AI capabilities
Human or AI · AI or Not
The dataset: 16K human posts from Reddit, Hacker News, and Yelp, each paired with AI generations from 6 models across two providers (Anthropic and OpenAI) at three capability tiers. Same prompt, length-matched, no adversarial coaching — just the model’s natural voice with platform context. Every vote is logged with model, tier, source, response time, and position.
Early findings from testing: Reddit posts are easy to spot (humans are too casual for AI to mimic), HN is significantly harder.
I'll be releasing the full dataset on HuggingFace and I'll publish a paper if I can get enough data via this crowdsourced study.
If you play the HN-only mode, you’re helping calibrate how detectable AI is on here specifically.
Would love feedback on the pairs — are any trivially obvious? Are some genuinely hard?
Deterministic linter for AI UI slop with 39 named tells nobody else tracks.
Waze for Indian LPG: crowdsourced stock status saves hours of waiting in line.
Honest venue for AI-coded projects without HN gate-keeping anxiety.
Git-based slop metric is clever, but the author admits results are often wrong.
Interactive quiz proving AI detectors fail to spot obvious machine text.
HN meets invite-only exclusivity, but community norms aren't sticky.