Back to browse
GitHub Repository

HTML to Markdown with CSS selector and XPath annotations

11 starsTypeScript

HTML to Markdown with CSS selector and XPath annotations

by andrew_zhong·Apr 6, 2026·4 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Embeds DOM selectors in markdown comments so scrapers don't need LLM on every run.

Strengths
  • Preserves DOM structure in markdown output, solving a real token-waste problem.
  • CLI works via npx with no install, pipes directly from curl for quick testing.
Weaknesses
  • Narrow audience: only matters if you're building LLM-based scrapers at scale.
  • No benchmark showing actual token savings versus traditional HTML-to-markdown.
Target Audience

Developers building LLM-powered web scrapers

Similar To

JinaAI · Firecrawl · html2text

Post Description

HTML-to-Markdown converters produce clean, readable content for both humans and LLMs — but the DOM structure is lost along the way. You can always feed Markdown to an LLM to extract structured information, but that costs tokens on every page, every time.

What if the LLM could also see where each piece of content lives in the DOM? Then it can generate robust scraping code — stable selectors and XPaths that run without any LLM in the loop, saving tokens and improving accuracy on long or repetitive pages.

Scrapedown does exactly this: it converts HTML to Markdown and annotates each element with its CSS selector and/or XPath, so an LLM can produce precise, reusable scraper code in one shot.

Similar Projects