GitHub Repository

Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.

4,945 starsPython

Dograh – voice agents that pick Recordings over TTS using LLM

Name: Dograh – voice agents that pick Recordings over TTS using LLM
Availability: InStock
Author: a6kme

by a6kme·Mar 31, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidSlickSolve My Problem

Open-source Vapi alternative with pre-recorded audio responses for lower latency calls.

Strengths

•Full stack handled: WebRTC, telephony, STT, LLM, TTS without vendor lock-in or platform fees.
•Visual workflow builder means non-engineers can modify call flows without redeploying code.
•Built-in AI testing personas let you validate bot behavior before production calls.

Weaknesses

•Voice agent platform space is crowded: Vapi, Retell, Bland all have more traction.
•Self-hosting adds operational overhead that most teams don't want to manage.

Post Description

TL;DR: Dograh is an open-source platform to build voice AI agents with drag-and-drop workflows. New in v1.20: Gemini 3.1 live support, Pre-recorded audio support for lower latency and more natural responses. Fully self-hostable, no vendor lock-in.

Hi HN,

We’re the Dograh team (YC alumni). While building voice bots, we found that wiring WebRTC/ Telephony + STT + LLM + TTS took more time than the bots themselves. Teams are spending weeks on plumbing - handling call flows, extracting variables, dealing with telephony edge cases, and redeploying for small changes. Tools like Vapi/Retell are easy to start with but come with lock-in and platform fees. So we built Dograh: a 100% open-source platform that handles the full stack, with a visual workflow builder and self-hosting by default.

Dograh v1.20 introduces two major additions: 1. Gemini 3.1 Live support Run fully real-time voice agents using Gemini’s streaming APIs, without stitching together separate STT + LLM + TTS components. 2. Pre-recorded audio (hybrid voice) Upload real voice clips (greetings, confirmations, etc.), and the agent plays them instantly while using TTS only for dynamic responses. This reduces latency, improves naturalness, and cuts TTS costs.

It also includes:

- Plug-and-play LLM / STT / TTS (including self-hosted models) - Telephony integrations (Twilio, Vonage, Telnyx) along with Call Transfer - Post-call QA, transcripts, and variable extraction - Observability via Langfuse (OpenTelemetry traces + prompt playground)

Try it now: If you have Docker, you can run the below command for a 2-minute setup (no API keys needed out of the box).

``` curl -o docker-compose.yaml https://raw.githubusercontent.com/dograh-hq/dograh/main/dock... REGISTRY=ghcr.io/dograh-hq ENABLE_TELEMETRY=true docker compose up --pull always ```

Looking Ahead: We’re expanding self-hosted model support: you can already bring any LLM (e.g. Llama, Qwen) or TTS (Kokoro, Voxtral) by configuring API endpoints. We are working on updates that will enable anyone to run everything on a single server - your AI models along with Dograh Orchestration.

Looking forward to hearing thoughts of the community.