Snitchmd – Cloudflare-protected URLs into clean Markdown via Docker

Name: Snitchmd – Cloudflare-protected URLs into clean Markdown via Docker
Availability: InStock
Author: syabro

by syabro·Apr 29, 2026·8 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemShip It

Beats Firecrawl on token count for Cloudflare sites when you need local execution.

Strengths

•CloakBrowser integration bypasses Cloudflare 403s where standard curl fails completely.
•rs-trafilatura extraction slashes context noise from 187k tokens down to under 1k.
•Local Docker cache prevents redundant browser launches for repeated URL fetches.

Weaknesses

•Cannot solve interactive CAPTCHAs like reCAPTCHA v2 or hCaptcha traffic lights.
•Just glue code wrapping two existing OSS tools with no novel scraping engine.

Post Description

Shmauthor here. Built this for myself, putting it out in case it's useful.

Needed any URL as clean Markdown for LLM context — including Cloudflare/anti-bot sites. curl gets HTTP 403 on those, raw HTML is 80%+ nav noise eating context, paid SaaS (Firecrawl, Jina) wasn't an option for me.

It's a Docker wrapper around two existing OSS tools — CloakBrowser (stealth Chromium that passes Cloudflare) and rs-trafilatura (HTML → Markdown). No new scraper, just glue. Runs locally, my URLs stay on my box

Token reduction (raw curl HTML vs snitchmd, tiktoken cl100k_base):

- cloudflare.com/learning/bots — curl: HTTP 403 → snitchmd: 0.8k

- docs.docker.com/engine/install — 187k → 0.9k

- en.wikipedia.org/wiki/LLM — 222.7k → 29.7k

Heads up: passes Cloudflare, can't solve "click traffic lights" captchas (reCAPTCHA v2, hCaptcha)

MIT. Happy to answer questions

Similar Projects

Developer Tools●●Solid

Save, an API that turns any URL into clean Markdown for LLMs

HTML-to-Markdown for LLMs when JinaAI and Firecrawl already exist.

Solve My ProblemSlick

jswallez

301mo ago

Developer Tools●●Solid

Pagecast – Publish Markdown/HTML Reports to Cloudflare Pages

Replaces localhost tunnels for sharing Claude artifacts with stable Cloudflare URLs.

Solve My ProblemCozy

amaldavid

54131mo ago

Developer Tools●Mid

Klovr – Convert any webpage to Markdown (Cloudflare covers only 5%)

Nice, focused product: site-specific extraction rules (CSS selectors/metadata overrides), edge-first delivery (<500ms p99) and SDKs for Node/Python make it quick to drop into an LLM pipeline and claim 40–60% token savings. That said, HTML→Markdown is a crowded niche (Pandoc, Jina, Firecrawl and dozens of scrapers already exist), so Klovr needs clearer differentiation — e.g. demonstrable extraction accuracy, enterprise-grade rule sharing, or unique model-aware trimming — to move beyond 'handy utility'.

Solve My ProblemSlick

vaibhavlodha98

215mo ago

Developer Tools●●Solid