HTML to Markdown with CSS selector and XPath annotations

Name: HTML to Markdown with CSS selector and XPath annotations
Availability: InStock
Author: andrew_zhong

by andrew_zhong·Apr 6, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Embeds DOM selectors in markdown comments so scrapers don't need LLM on every run.

Strengths

•Preserves DOM structure in markdown output, solving a real token-waste problem.
•CLI works via npx with no install, pipes directly from curl for quick testing.

Weaknesses

•Narrow audience: only matters if you're building LLM-based scrapers at scale.
•No benchmark showing actual token savings versus traditional HTML-to-markdown.

Post Description

HTML-to-Markdown converters produce clean, readable content for both humans and LLMs — but the DOM structure is lost along the way. You can always feed Markdown to an LLM to extract structured information, but that costs tokens on every page, every time.

What if the LLM could also see where each piece of content lives in the DOM? Then it can generate robust scraping code — stable selectors and XPaths that run without any LLM in the loop, saving tokens and improving accuracy on long or repetitive pages.

Scrapedown does exactly this: it converts HTML to Markdown and annotates each element with its CSS selector and/or XPath, so an LLM can produce precise, reusable scraper code in one shot.