GitHub Repository

High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright

10 starsTypeScript

Feedstock – Web Crawler for TypeScript Built on Bun and Playwright

Name: Feedstock – Web Crawler for TypeScript Built on Bun and Playwright
Availability: InStock
Author: tylergibbs1

by tylergibbs1·Apr 11, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Accessibility tree snapshots cut payload 10x for LLMs when Firecrawl exists.

Strengths

•Fetch-first engine avoids browser overhead unless JavaScript rendering is required
•UCB1 bandit and Q-learning crawlers adapt to site structure dynamically
•bun:sqlite caching with ETag and content hashing reduces redundant requests

Weaknesses

•Learning-based strategies admitted as unmeasured—unclear if they actually help
•Web crawling for LLMs is extremely crowded with Firecrawl, JinaAI, Crawl4AI

Post Description

Web crawler for TypeScript. Runs on Bun, uses Playwright by default, also speaks CDP and Lightpanda.

The part I want feedback on is the CLI. It's built for LLMs, not humans. JSON output when piped, feedstock schema crawl dumps every parameter at runtime, and --fields url,markdown lets you pull just what you need so a crawl result doesn't eat your whole context window. Other bits worth a look:

Fetch-first engine. Tries plain HTTP before booting a browser, escalates only if the page needs JS.

Deep crawl with BFS, DFS, a UCB1 bandit, and a Q-learning focused crawler. The learning ones seem to help on big docs sites but I haven't measured it carefully yet.

Accessibility tree snapshots instead of HTML. 3 to 10x smaller, easier to feed a model.

Cache uses bun:sqlite with ETag, Last-Modified, and content hashing.

v0.5.0, Apache 2.0, 325 tests. Just pushed it so the star count is what it is.

https://github.com/tylergibbs1/feedstock