Back to browse
GitHub Repository

OmniDocsπŸ“„ - One stop visual document processing framework

140 starsPython

I built a unified inference layer for Document Processing Models

by Adithya-KolaviΒ·Feb 25, 2026Β·2 pointsΒ·0 comments

AI Analysis

●●SolidSolve My ProblemShip It

Abstraction layer for document AI models, but orchestrating existing tools isn't novel.

Strengths
  • β€’Genuine multi-backend flexibility: same code runs on PyTorch, MLX (Apple Silicon), vLLM, and cloud APIs without rewrites
  • β€’Type-safe Pydantic-based output design eliminates parsing boilerplate and enables structured extraction into custom schemas
Weaknesses
  • β€’Aggregates existing models/APIs without novel inference technique; swapping one abstraction for another solves a real friction point but not a novel problem
  • β€’Early-stage project (26 GitHub stars, active issues); ecosystem integration and maturity unclear
Category
Target Audience

ML engineers, AI pipeline builders working with document understanding and layout analysis tasks

Similar To

LangChain Β· Unstructured.io Β· AWS Textract

Post Description

Hey HN,

I’m Adithya, a 22-year-old researcher from India. I work with a lot of document processing models while building AI pipelines, and one pain kept repeating: every model has its own inference code, preprocessing steps, and output format. Swapping models or testing new ones meant rewriting a lot of boilerplate each time.

So I built Omnidocsβ€”an open source library to run document processing models through a simple, unified API, with a vision-first approach to understanding documents.

Key features:

> Pick a task and a model, run inference with one interface > Supports common document tasks: Text extraction, OCR, Table extraction, Layout analysis and Structured Extraction ... > 16+ models supported out of the box (many more integrations to come) > Runs locally on Mac or GPUs (MLX and vLLM backends supported) > Works with VLM APIs like GPT, Claude, Gemini and many more that support Open Responses API spec > Designed to quickly build and test document processing pipelines

This has helped me prototype document workflows much faster and compare models easily.

Would love feedback on the API design, developer experience, and what integrations would make this more useful.

Repo: https://github.com/adithya-s-k/omnidocs

Similar Projects