Back to browse
GitHub Repository

Schema-first, self-healing HTML extraction powered by LLMs

9 starsTypeScript

Pluckr – LLM-powered HTML scraper that caches selectors and auto-heals

by pankaj3112·Feb 25, 2026·1 point·1 comment

AI Analysis

●●SolidBig BrainSolve My Problem

LLM-generated selector caching beats manual scraping, but Jina AI and Beautiful Soup handle this cheaper.

Strengths
  • Selector auto-healing via LLM validates real web-scraping pain point—sites change layouts constantly
  • Schema-first (Zod) gives type safety and declarative extraction logic
  • Pluggable storage (in-memory, SQLite, Redis) scales from solo to distributed teams
Weaknesses
  • API costs compound per page change; pure regex or DOM parsing may cost less long-term
  • No Windows-specific guidance, and no CLI—JavaScript/Node.js only limits accessibility
Target Audience

Backend developers, data engineers, web scraping teams using Node.js

Similar To

Apify · Puppeteer · Scrapy with adaptive selectors

Similar Projects