Parseflow, how to parse documents when you're broke
Student-built extraction API competing directly with established players like LlamaParse.

Local Gemma 3 via llama.cpp beats cloud PDF extractors on privacy.
Privacy-conscious professionals handling sensitive documents
JinaAI · Firecrawl · Adobe Acrobat
Student-built extraction API competing directly with established players like LlamaParse.
93% accuracy document extraction, but remove.bg-style competition already exists.
ProofPudding returns extraction results with explicit links back to the exact page and source text, supports native and scanned PDFs plus DOCX/images, and ships Python/TypeScript SDKs — handy for agents that need auditable facts. It’s a pragmatic product (per-extraction pricing and confidence scores are nice), but the market is crowded; I want clarity on underlying models, real-world accuracy numbers, and how it compares to Document AI/Textract in edge cases.
LLM infers schema once, Go does 10k-row extraction—avoids token waste.
Pure Rust parsers for legacy Office formats with zero external dependencies.
Offline Ollama + OCR keeps your documents private when cloud APIs won't.