HN Bot Detector - Detects LLM-Generated Comments on Hacker News
Clever n-gram TF-IDF detection of LLM paraphrases catches smart evasion; solves real HN problem but narrow use case.

Fingerprints LLM-generated HN comments (curly quotes, em-dashes, 3-example pattern).
Hacker News moderators, LLM researchers studying generated-text fingerprints
I have RSI so I use voice and LLM to type. Dictate my thoughts, model shapes the sentences. I got lazy about where the line was and automated too much.
After getting unbanned I went through all the comments dang has flagged for LLM posting over the years(https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...) and looked for patterns. Some are obvious, some surprised me:
- curly/typographic quotes (“ ” instead of " ") or even ’ vs ' (that’s is LLM, that's is human)
- humans typing in a browser text box produce straight ASCII. finding curly quotes in a plain HN comment means the text was generated elsewhere and pasted in
- exactly 3 paragraphs of 1-2 sentences each - extremely common LLM output shape
- examples always come in threes - "for example, X, Y, and Z"
- → arrows and — em dashes (sometimes replaced with - en dashes to evade detection)
- overly sycophantic openers - "great point", "this is really interesting" before saying anything
- fake personal framing - "in practice I've found..." immediately followed by a generic claim
Built a detector around these + some heavier signals (TF-IDF cosine similarity across a user's comment history, optional Anthropic/OpenAI LLM pass). You can paste any HN comment URL/ID or just raw text and see what fires
I ran my own banned comments through it. They score 70-85. Sounds about right.
https://hn-bot-detector.vercel.app/
gh: https://github.com/umairnadeem/hn-bot-detector
I wrote this post myself btw
Clever n-gram TF-IDF detection of LLM paraphrases catches smart evasion; solves real HN problem but narrow use case.
Teaches commenters to argue better; Akismet and Disqus only delete.
Explains attention mechanisms to five-year-olds while building LLaMA 3 from scratch.
NPS-scored sentiment graphs for HN threads, but Remove.bg and Sentiment140 already do this.
Cleaner inline reader for HN comment subtrees without the navigation clutter.
Auto-timestamped voice comments for YouTube, but the Chrome Web Store listing is currently unavailable.