GitHub Repository

Summary: Schema-free ETL mapper with in-process graph orchestration: turn any REST API, CSV, Parquet, or XML into typed Python objects, then coordinate multi-source pipelines in a bounded time window with Tideweaver.

4 starsPython

Incorporator, Turn any API/File into typed Python graph with pipeline

Name: Incorporator, Turn any API/File into typed Python graph with pipeline
Availability: InStock
Author: PyPlumber

by PyPlumber·May 15, 2026·3 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidBig BrainShip It

Dynamic Pydantic models beat manual schemas for messy API responses.

Strengths

•Runtime class building absorbs schema changes without validation errors.
•Async HTTPX integration handles pagination without manual while loops.
•Single interface for JSON, XML, CSV, Parquet, SQLite reduces format switching.

Weaknesses

•Schema-free data ingestion already solved by Pandas, Polars, and ORMs.
•Two GitHub stars suggests untested production readiness.

Post Description

When landing data I prefer to keep it as close to the original source as possible. Most of the Python data ingestion programs I saw treated Python more like SQL instead of harnessing object orientation. This was my attempt at translating my object-orented columnar approach to Python. I originally did it with Requests and Pandas but the overhead costs did not seem worth it. Claude helped refactor for async and Pydantic.

Now HTTPX’s async capabilities and Pydantic’s class building took this project over the top. By harnessing their abilities I shifted the codebase from data mapper to pipeline orchestrator. I added every format I could that seemed to have an established Python library. Right now I believe I support : JSON, NDJSON, XML, CSV, TSV, PSV, SQLite, and HTML out of the box. Optional extras (~30 MB pyarrow) unlock Parquet, Feather, ORC; Avro and XLSX have their own extras. I also added every compression I could find. Benchmarks at least for a windows machine are on par with other elt packages.

By focusing on function wrappers to make the developer’s syntax as easy as possible for the original data mapping calls, I established simple automated pipelines with one cli command and one JSON reference file. The JSON is basically the same syntax you would use in Python.

Both stream and fjord accept inflow and outflow Python code. Inflow code allows you to set custom conversion functions and mappings for the incoming data. The outflow code allows you to manipulate the exporting data into a new object new entirely.

Also, because your pipeline is basically created by a JSON file. You should eventually be able to automate the creation of the entire pipeline. Enjoy.

https://github.com/PyPlumber/Incorporator/

How you use it: Declare a subclass with no fields, point it at a URL, and it infers a Pydantic model from the response at runtime — with full strict typing, dot-notation, and an optional registry lookup by any key. class Launch(Incorporator): pass launches = await Launch.incorp( inc_url="https://ll.thespacedevs.com/2.2.0/launch/upcoming/" )

These functions handle the rest of your data mapping and export format needs: - test() lets the framework write the call kwargs for you - refresh() re-fetches with the seed call's params auto-replayed - export() serialises to any of the 13 formats

Then these functions create a pipeline. - stream() runs a chunked daemon with bounded memory. Can be used in two modes: pass-through or stateful (in RAM) updates to be manipulated in real-time. - fjord() fans out N sources and fuses them through a user reducer. This accepts multiple sources and exports.

After that all works copy the parameters into pipeline.json and the command can be as simple: incorporator validate pipeline.json incorporator fjord pipeline.json –logs

Similar Projects

Developer Tools●●Solid

Llmff v1.0 FFmpeg for Inference

FFmpeg-shaped pipeline orchestration for LLMs with built-in JSON validation and repair.

SlickBig Brain

syndicalt

3014d ago

Data●Mid

Build a knowledge graph from unstructured text in Python

CogDB integration is the differentiator, but text-to-KG is a solved pattern.

am3141

102mo ago

Developer Tools●Mid

Jigs-tiny Rust framework for interactive maps of composable pipelines

Compile-time pipeline maps for Rust, but author says don't ship it yet.

Ship ItBig Brain

valeriavg_dev

201mo ago

AI/ML●●Solid

Stop Using LangChain

LangChain alternative with 2 dependencies and async-native architecture from the start.

Big BrainSolve My Problem

aminau

302mo ago

AI/ML●●Solid

Sieves, a unified interface for structured document AI

I like that it makes schema-first pipelines the main contract: define tasks once and swap in Outlines, LangChain, a GLiNER model or a local transformer without rewriting glue code. It’s primarily an orchestration/wiring layer with nice extras (conditional execution, built-in eval, distillation), so its impact will depend on the depth of integrations, real-world examples, and performance comparisons.

Niche GemSolve My Problem

rmitsch

303mo ago

AI/ML●Mid

OxyJen – Java framework to orchestrate LLMs in a graph-style execution

Graph-based LLM pipelines for Java, but LangChain4j already dominates and covers the same use cases more maturely.

Bold BetShip It

bdivyansh11

203mo ago