Back to browse
SnapAPI – Screenshot/PDF/Extract API Built with Fastify and Playwright

SnapAPI – Screenshot/PDF/Extract API Built with Fastify and Playwright

by Sleywill·Feb 19, 2026·2 points·1 comment

AI Analysis

●●SolidSlickSolve My Problem
The Take

SnapAPI packs screenshots, PDFs, video capture and structured extraction into one Fastify+Playwright endpoint, with practical features like ad/cookie blocking, element selectors and a markdown/article extraction backed by Readability. The real selling point is operational: handling browser contexts, crash recovery and fair billing so teams don't fight Playwright in production. Useful and pragmatic — but it's entering a crowded market where uptime, latency and pricing will decide whether it matters.

Target Audience

Backend developers, product engineers, startups building link previews, archiving, scraping or AI data pipelines

Post Description

Hi HN,

I'm a solo developer based in Tbilisi, Georgia, and I've been building SnapAPI for the past few months. It's a REST API that does four things: screenshots, PDFs, video recording, and structured data extraction from web pages.

*Why I built it:* I was working on a project that needed link previews. I tried self-hosting Playwright — it works, but managing browser contexts, memory leaks, cookie banners, and crash recovery is a surprising amount of operational work for what should be a simple task. The existing APIs (ScreenshotOne, Urlbox) are solid but expensive and fragmented — you need separate services for screenshots vs. data extraction.

*Architecture:*

- *Fastify* handles the API layer. Chose it over Express for the schema validation (I use it as the single source of truth for request validation) and the ~2x throughput difference under load. - *Playwright* over Puppeteer. The cross-browser support doesn't matter for this use case, but Playwright's `browser.newContext()` isolation and the built-in auto-wait are more reliable for capturing pages at the right moment. - *BullMQ + Redis* for job queuing. Screenshot requests go into a queue, a separate worker process picks them up. This decouples the API from the heavy rendering work and lets me set per-request timeouts without blocking the event loop. - *LRU cache* (in-memory + Redis) with content-hash keys. Same URL + same options = cache hit. This alone handles ~40% of requests on busy days. - *S3-compatible storage* for result delivery. Screenshots get uploaded to object storage, the API returns a pre-signed URL. Files auto-expire after 24h. - *Cookie banner blocking* uses @ghostery/adblocker-playwright with custom filter lists. It's not perfect (nothing is), but it catches ~90% of GDPR popups. For the rest, I inject custom JS to detect and dismiss common consent frameworks (OneTrust, CookieBot, etc.).

The whole thing runs on a single 4-core/8GB VPS in Amsterdam. Playwright is the bottleneck — each browser context uses ~80-120MB — so I cap concurrent renders at 6 and queue the rest. Under normal load (~500-1000 requests/day currently), p95 latency is around 3 seconds for a full-page screenshot.

*What's different from existing services:*

- Combined screenshot + PDF + video + data extraction in one API (most services only do screenshots) - AVIF output format (50-80% smaller than PNG, not widely supported elsewhere) - `extract` endpoint that returns clean markdown, plain text, or structured metadata — useful for RAG/LLM preprocessing pipelines - Starts at $9/month instead of $29-99/month

*What I'm still figuring out:*

- Video recording (MP4 of a browsing session) is the newest feature and the least optimized. FFmpeg encoding after capture adds 2-5 seconds of overhead. - Scaling beyond one server. Playwright doesn't cluster well — I'll probably need to go multi-server with a load balancer routing to render nodes. Haven't needed it yet. - Whether the $9 price point is sustainable. The compute cost per screenshot is ~$0.001, but there's a lot of variance depending on page complexity.

Free tier: 100 requests/month, no credit card. I'm not trying to bait-and-switch — the free tier is genuinely useful for side projects and testing.

Docs: https://snapapi.pics/docs.html

Happy to discuss the architecture, Playwright quirks, or the economics of running a screenshot API. I've learned a lot about browser automation edge cases that I didn't expect going in.

Similar Projects

AI-Powered Web Automation APIs (Screenshot, Scrape, SEO, PDF)

Packages common web automation tasks — screenshots, scrapes, SEO checks and PDFs — into APIs, which is convenient but very crowded territory. The live share is broken (the page shows 'zrok share ... not found'), so you can't test reliability or AI value‑adds; unless it provides robust semantic SEO insights, evasion/anti-bot handling, or superior extraction accuracy, it's another Puppeteer/Playwright wrapper.

Ship ItNiche Gem
openclaw_ai
203mo ago
AI/ML●●●Banger

PDF-Proof – Make Claude show where it found each value in a PDF

Verifies AI extractions by reading text back from highlighted regions — actual audit trails for PDFs.

Solve My ProblemBig BrainSlick
young_mete
402mo ago