Back to browse
GitHub Repository
99 starsTypeScript

Snitchmd – Cloudflare-protected URLs into clean Markdown via Docker

by syabro·Apr 29, 2026·8 points·1 comment

AI Analysis

●●SolidSolve My ProblemShip It

Beats Firecrawl on token count for Cloudflare sites when you need local execution.

Strengths
  • CloakBrowser integration bypasses Cloudflare 403s where standard curl fails completely.
  • rs-trafilatura extraction slashes context noise from 187k tokens down to under 1k.
  • Local Docker cache prevents redundant browser launches for repeated URL fetches.
Weaknesses
  • Cannot solve interactive CAPTCHAs like reCAPTCHA v2 or hCaptcha traffic lights.
  • Just glue code wrapping two existing OSS tools with no novel scraping engine.
Target Audience

Developers building RAG pipelines or LLM agents needing web context

Similar To

Firecrawl · Jina AI Reader · Crawl4AI

Post Description

Shmauthor here. Built this for myself, putting it out in case it's useful.

Needed any URL as clean Markdown for LLM context — including Cloudflare/anti-bot sites. curl gets HTTP 403 on those, raw HTML is 80%+ nav noise eating context, paid SaaS (Firecrawl, Jina) wasn't an option for me.

It's a Docker wrapper around two existing OSS tools — CloakBrowser (stealth Chromium that passes Cloudflare) and rs-trafilatura (HTML → Markdown). No new scraper, just glue. Runs locally, my URLs stay on my box

Token reduction (raw curl HTML vs snitchmd, tiktoken cl100k_base):

- cloudflare.com/learning/bots — curl: HTTP 403 → snitchmd: 0.8k

- docs.docker.com/engine/install — 187k → 0.9k

- en.wikipedia.org/wiki/LLM — 222.7k → 29.7k

Heads up: passes Cloudflare, can't solve "click traffic lights" captchas (reCAPTCHA v2, hCaptcha)

MIT. Happy to answer questions

Similar Projects

Klovr – Convert any webpage to Markdown (Cloudflare covers only 5%)

Nice, focused product: site-specific extraction rules (CSS selectors/metadata overrides), edge-first delivery (<500ms p99) and SDKs for Node/Python make it quick to drop into an LLM pipeline and claim 40–60% token savings. That said, HTML→Markdown is a crowded niche (Pandoc, Jina, Firecrawl and dozens of scrapers already exist), so Klovr needs clearer differentiation — e.g. demonstrable extraction accuracy, enterprise-grade rule sharing, or unique model-aware trimming — to move beyond 'handy utility'.

Solve My ProblemSlick
vaibhavlodha98
213mo ago