Back to browse
GitHub Repository

📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable.

21 starsTypeScript

ocrbase – PDF/IMG –>.MD/JSON Model-Agnostic OCR API

by adammajcher·Apr 16, 2026·1 point·0 comments

AI Analysis

MidShip It

Yet another OCR API wrapper when JinaAI and Firecrawl already exist.

Strengths
  • Model-agnostic design lets you swap PaddleOCR and GLM-OCR via env vars.
  • Optional S3 staging and BullMQ queue for async processing at scale.
Weaknesses
  • Models must be self-hosted separately—ocrbase is just a thin routing layer.
  • Document parsing APIs already solved by established players with better docs.
Target Audience

Developers building document parsing pipelines

Similar To

JinaAI Reader · Firecrawl · LlamaParse

Similar Projects