I built an LLM comment detector for HN (I got banned)

Name: I built an LLM comment detector for HN (I got banned)
Availability: InStock
Author: umairnadeem123

by umairnadeem123·Feb 26, 2026·4 points·6 comments

Visit Project View on HN

AI Analysis

●MidBig Brain

Fingerprints LLM-generated HN comments (curly quotes, em-dashes, 3-example pattern).

Strengths

•Concrete pattern detection (typographic quotes, em-dashes, paragraph structure) reveals how LLMs leak artifacts
•Real-world post-ban analysis is honest self-reflection, not marketing spin
•Data source is public (dang's moderation queries), reproducible and transparent

Weaknesses

•Not a product—it's an Algolia search result view of existing moderation decisions
•No detection tool you can run yourself; purely observational analysis of already-flagged comments
•Patterns are easily evaded and already known in LLM detection literature

Post Description

Got banned from HN a few days ago for LLM posting. i honestly deserved it. 100+ comments in a few days, that's just abusive.

I have RSI so I use voice and LLM to type. Dictate my thoughts, model shapes the sentences. I got lazy about where the line was and automated too much.

After getting unbanned I went through all the comments dang has flagged for LLM posting over the years(https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...) and looked for patterns. Some are obvious, some surprised me:

- curly/typographic quotes (“ ” instead of " ") or even ’ vs ' (that’s is LLM, that's is human)

- humans typing in a browser text box produce straight ASCII. finding curly quotes in a plain HN comment means the text was generated elsewhere and pasted in

- exactly 3 paragraphs of 1-2 sentences each - extremely common LLM output shape

- examples always come in threes - "for example, X, Y, and Z"

- → arrows and — em dashes (sometimes replaced with - en dashes to evade detection)

- overly sycophantic openers - "great point", "this is really interesting" before saying anything

- fake personal framing - "in practice I've found..." immediately followed by a generic claim

Built a detector around these + some heavier signals (TF-IDF cosine similarity across a user's comment history, optional Anthropic/OpenAI LLM pass). You can paste any HN comment URL/ID or just raw text and see what fires

I ran my own banned comments through it. They score 70-85. Sounds about right.

https://hn-bot-detector.vercel.app/

gh: https://github.com/umairnadeem/hn-bot-detector

I wrote this post myself btw