Ocrbase.dev – pdf→.md/.json OCR for developers
94.5% accuracy, self-hostable, open source—beats Textract on cost and accuracy.
📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable.
Yet another OCR API wrapper when JinaAI and Firecrawl already exist.
Developers building document parsing pipelines
JinaAI Reader · Firecrawl · LlamaParse
94.5% accuracy, self-hostable, open source—beats Textract on cost and accuracy.
Local MLX model redacts PII without sending documents to the cloud.
CPU-only VLM OCR beats Tesseract accuracy without sending data to the cloud.
Empirical OCR vs. image embeddings shootout for scientific PDFs reveals complementary bottlenecks.
Feature launch from established 27.5K-star platform, not a standalone project.
Human-in-the-loop citation verification for local research files.