Back to browse
GitHub Repository

A Python framework for building structured, resumable web crawlers — designed for domains where data quality matters.

5 starsPython

Ladon – typed, resumable web crawlers in Python

by feeder81·May 6, 2026·1 point·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Typed dataclasses beat Scrapy's weak Items for LLM pipeline correctness.

Strengths
  • SES protocol enforces schema correctness at the crawl stage, not post-processing.
  • Built-in HTTP layer handles retries, back-off, and robots.txt out of the box.
  • Resumable crawls via DuckDB audit tables skip already-processed items automatically.
Weaknesses
  • Smaller ecosystem than Scrapy means fewer community adapters and plugins available.
  • Python-only scope limits adoption for teams standardized on Go or Node crawlers.
Target Audience

Data engineers and Python developers building LLM training pipelines

Similar To

Scrapy · Beautiful Soup · Apache Nutch

Similar Projects

Developer Tools●●Solid

Lodum, a Python Serializer/Deserializer (a.k.a. Load/Dump) Library

Impressive engineering choices — bytecode/AST generation for ~64% faster dumps and explicit Pyodide/WASM support show someone wrestled real performance and portability problems. It bundles one API across JSON, YAML, TOML, MsgPack/CBOR/BSON and adds native numpy/pandas handling plus basic validators and schema output. Still, it lives in a crowded Python serialization space (pickle, orjson, pydantic/serde alternatives), so adoption will hinge on ecosystem compatibility and convincing users to switch.

Niche GemWizardry
webmaven
204mo ago