Back to browse
GitHub Repository

Listen to anything. TTS for documents, papers, and web pages.

24 starsPython

Yapit – PDF and webpage reader with TTS that doesn't suck

by MaxWolf-01·Apr 6, 2026·5 points·1 comment

AI Analysis

●●●BangerBig BrainSolve My Problem

Vision-LLM preprocessing fixes math and layout garbling that breaks Speechify.

Strengths
  • Vision-LLM pipeline converts complex PDFs to clean markdown before TTS.
  • WebGPU Kokoro runs locally in browser without server costs.
  • Markdown export via URL hack enables direct CLI and script workflows.
Weaknesses
  • Self-hosting requires Docker and significant RAM for local models.
  • Vision-LLM processing latency could be high for large academic documents.
Category
Target Audience

Researchers, students, and developers who read academic papers

Similar To

Speechify · NaturalReader · ElevenLabs Reader

Post Description

Yapit converts PDFs and web pages to audio, with a vision-LLM pipeline that handles math and complex layout instead of garbling them. I built it because I read a lot of papers and content online, but drift off after two paragraphs. Listening while following along keeps me focused and lowers the bar to actually start.

Every TTS tool I tried broke on complex formatting. Papers with math, citations, figure references, page numbers in the middle of sentences. You either get garbled output or you're listening to raw LaTeX.

Yapit converts everything to markdown as a common format. For web pages, defuddle (https://github.com/kepano/defuddle) handles the extraction and strips clutter from web pages, presenting the main article content in a clean, consistent format. For PDFs, a vision LLM rewrites each page into markdown with annotation tags that separate what you see from what gets read aloud. Math is rendered visually but gets spoken alt text. Citations like "[13]" or "(Schmidhuber, 1970)" are silently displayed. Page numbers and headers are removed entirely.

Both extraction and audio are cached by content hash, so the same content is never processed or synthesized twice.

Self-hosting works with any OpenAI-compatible TTS server (vLLM-Omni, ...) and any OpenAI-compatible vision model for PDF extraction:

git clone --depth 1 https://github.com/yapit-tts/yapit.git && cd yapit cp .env.selfhost.example .env.selfhost make self-host

Kokoro TTS also runs in the browser via WebGPU on desktop.

Try it on Attention Is All You Need (all voices cached, no account needed): https://yapit.md/listen/3bde213b-3a5a-465f-9198-be65430b699e...

Or paste any URL: https://yapit.md/https://arxiv.org/abs/1810.04805 https://yapit.md/https://x.com/karpathy/status/2039805659525...

GitHub: https://github.com/yapit-tts/yapit (AGPL-3)

Similar Projects