Parseflow, how to parse documents when you're broke

Name: Parseflow, how to parse documents when you're broke
Availability: InStock
Author: bollethegoalie

by bollethegoalie·May 21, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●MidShip ItBold Bet

Student-built extraction API competing directly with established players like LlamaParse.

Strengths

•Transparent pricing model targets small teams priced out of enterprise document solutions.
•Returns diagnostic metadata alongside chunks to help debug parsing failures.
•Supports async jobs and batch processing for high-volume document ingestion.

Weaknesses

•No open-source alternative to verify parsing logic or self-host for sensitive data.
•Crowded market with LlamaIndex, Unstructured, and AWS Textract already solving this.

Post Description

Hello HN, I built Parseflow, it's a simple, evidence focused extraction API which can extract take PDFs, DOCX and TXT files and extract/chunk the info inside them to improve LLM context and reduce token usage. If you want to try out a demo, you can find it here: demo.parseflow.tech

I am still a student dev, graduating high school this year so I still have a lot to learn. I am trying to build this project to help pay for tuition this year but also to help me learn. So any feedback, advice, questions, etc... are super appreciated and either I will try to respond to the comments or you can email me at [email protected]

Thanks, bollethegoalie

Similar Projects

AI/ML●●Solid

Local Document Parsing for Agents

LlamaIndex open-sources their parser core, but LlamaParse cloud still handles complex layouts.

Solve My ProblemShip It

cheesyFish

2014mo ago

Productivity●●Solid

AutoRename-PDF – Open-source tool that uses AI to rename your PDFs

Offline Ollama + OCR keeps your documents private when cloud APIs won't.

Solve My ProblemCozy

SPQRK

103mo ago

Developer Tools●●●Banger

Open-source .docx editor library for building document apps

Canonical OOXML parsing beats HTML conversion by preserving document semantics and layout fidelity.

SlickSolve My ProblemDark Horse

thisisjedr

107171mo ago

AI/ML●●Solid

ProofPudding – Document Extraction API with Citations (PDF/Docx)

ProofPudding returns extraction results with explicit links back to the exact page and source text, supports native and scanned PDFs plus DOCX/images, and ships Python/TypeScript SDKs — handy for agents that need auditable facts. It’s a pragmatic product (per-extraction pricing and confidence scores are nice), but the market is crowded; I want clarity on underlying models, real-world accuracy numbers, and how it compares to Document AI/Textract in edge cases.

Solve My ProblemSlick

garai

105mo ago

AI/ML●Mid

AI-Powered PDF to Markdown Converter

PDF-to-Markdown for LLMs when JinaAI and Firecrawl already exist.

Solve My Problem

QingWu

451mo ago

Productivity●●Solid

Parseflow – Extract data from any document. Entirely on your Mac

Local Gemma 3 via llama.cpp beats cloud PDF extractors on privacy.

Solve My ProblemCozy

devtanna

2122d ago