Back to browse
GitHub Repository

A Golang CLI for converting PDF files to JSONL for efficient use with LLMs

4 starsGo

PDF 2 Context – Convert PDF text to JSONL files

by vegancap·May 8, 2026·4 points·0 comments

AI Analysis

●●SolidShip ItSolve My Problem

Go CLI with built-in OCR fallback when JinaAI and Firecrawl already handle this.

Strengths
  • Automatic fallback to Tesseract OCR when text extraction yield falls below threshold.
  • Bubble Tea TUI provides real-time progress bars and worker status during processing.
  • Generates processing manifests with detailed statistics on chunk counts and failures.
Weaknesses
  • Depends on external system binaries like poppler and tesseract instead of bundling them.
  • Solves a common RAG preprocessing step already addressed by hosted API alternatives.
Target Audience

Developers building RAG pipelines or LLM applications

Similar To

JinaAI · Firecrawl · LangChain PDF loaders

Similar Projects

Developer Tools●●Solid

Markdown to WhatsApp Converter

Splits LLM Markdown into chat-sized WhatsApp messages while preserving lists, links, emails, tables and even Spanish punctuation. It applies a priority chain of processors — structural splits first, semantic fallbacks — and ships with zero dependencies plus 100% test coverage, which makes it a pragmatic, focused tool for messaging pipelines.

Niche GemSolve My Problem
daviddom
114mo ago