Back to browse
GitHub Repository

High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright

10 starsTypeScript

Feedstock – Web Crawler for TypeScript Built on Bun and Playwright

by tylergibbs1·Apr 11, 2026·3 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Accessibility tree snapshots cut payload 10x for LLMs when Firecrawl exists.

Strengths
  • Fetch-first engine avoids browser overhead unless JavaScript rendering is required
  • UCB1 bandit and Q-learning crawlers adapt to site structure dynamically
  • bun:sqlite caching with ETag and content hashing reduces redundant requests
Weaknesses
  • Learning-based strategies admitted as unmeasured—unclear if they actually help
  • Web crawling for LLMs is extremely crowded with Firecrawl, JinaAI, Crawl4AI
Target Audience

Developers building LLM pipelines or needing structured web data

Similar To

Firecrawl · Crawl4AI · JinaAI Reader

Post Description

Web crawler for TypeScript. Runs on Bun, uses Playwright by default, also speaks CDP and Lightpanda.

The part I want feedback on is the CLI. It's built for LLMs, not humans. JSON output when piped, feedstock schema crawl dumps every parameter at runtime, and --fields url,markdown lets you pull just what you need so a crawl result doesn't eat your whole context window. Other bits worth a look:

Fetch-first engine. Tries plain HTTP before booting a browser, escalates only if the page needs JS.

Deep crawl with BFS, DFS, a UCB1 bandit, and a Q-learning focused crawler. The learning ones seem to help on big docs sites but I haven't measured it carefully yet.

Accessibility tree snapshots instead of HTML. 3 to 10x smaller, easier to feed a model.

Cache uses bun:sqlite with ETag, Last-Modified, and content hashing.

v0.5.0, Apache 2.0, 325 tests. Just pushed it so the star count is what it is.

https://github.com/tylergibbs1/feedstock

Similar Projects