GitHub Repository

Local-first CLI that turns Markdown scripts into multi-speaker podcast-style audio using Coqui XTTS v2.

35 starsPython

Podvoice – Local-first CLI to turn Markdown into multi-speaker audio

Name: Podvoice – Local-first CLI to turn Markdown into multi-speaker audio
Availability: InStock
Author: aman179102

by aman179102·Feb 21, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemNiche Gem

Local multi-speaker TTS CLI with zero cloud dependencies beats ElevenLabs for podcast scripts.

Strengths

•Fully local inference with Coqui XTTS v2 eliminates API costs, latency, and data privacy concerns for reproducible audio.
•Clean Markdown syntax for speaker/emotion blocks is genuinely intuitive—lower barrier than training scripts or parameter tuning.
•Small, modular Python codebase with GPU-optional execution makes it hackable for beginners and sustainable for maintainers.

Weaknesses

•Emotion tags parsed but not interpreted by XTTS—future work, not shipped differentiation today.
•Initial model download and multi-speaker inference are slow; GPU requirement limits adoption on resource-constrained machines.

Post Description

Hi HN,

I built Podvoice because I wanted a simple way to turn Markdown podcast-style scripts into audio without relying on cloud TTS APIs.

It runs fully locally using Coqui XTTS v2. No API keys. No accounts. Just a CLI workflow.

You write something like:

[Host | calm] Hello and welcome.

[Guest | excited] Let’s talk about AI.

And it generates a single stitched audio file.

Would love feedback on the idea, UX, or use cases I might be missing.

Similar Projects

AI/ML●Mid

Ilya's 30 papers, explained in audio

AI-narrated paper summaries when ElevenLabs and NotebookLM already do this.

Cozy

janpmz

429d ago

AI/ML●●Solid

Podscript – Podcast/YouTube Transcription CLI

Outputs ready-to-use Markdown with speaker diarization and timestamps, accepts Apple Podcasts/YouTube/RSS links, and can run fully locally or use ElevenLabs for higher-quality diarization. Not groundbreaking — speech-to-text pipelines already exist — but the one-command UX, RSS browsing/search flags, and explicit local-mode make it genuinely useful for folks who want tidy transcripts without wiring together multiple tools.

Solve My ProblemNiche Gem

timf34

105mo ago

Productivity●●Solid