Back to browse
GitHub Repository

Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.

3,798 starsPython

Dograh – voice agents that pick Recordings over TTS using LLM

by a6kme·Mar 31, 2026·4 points·0 comments

AI Analysis

●●SolidSlickSolve My Problem

Open-source Vapi alternative with pre-recorded audio responses for lower latency calls.

Strengths
  • Full stack handled: WebRTC, telephony, STT, LLM, TTS without vendor lock-in or platform fees.
  • Visual workflow builder means non-engineers can modify call flows without redeploying code.
  • Built-in AI testing personas let you validate bot behavior before production calls.
Weaknesses
  • Voice agent platform space is crowded: Vapi, Retell, Bland all have more traction.
  • Self-hosting adds operational overhead that most teams don't want to manage.
Category
Target Audience

Developers building voice bots, call center automation teams

Similar To

Vapi · Retell AI · Bland AI

Post Description

TL;DR: Dograh is an open-source platform to build voice AI agents with drag-and-drop workflows. New in v1.20: Gemini 3.1 live support, Pre-recorded audio support for lower latency and more natural responses. Fully self-hostable, no vendor lock-in.

Hi HN,

We’re the Dograh team (YC alumni). While building voice bots, we found that wiring WebRTC/ Telephony + STT + LLM + TTS took more time than the bots themselves. Teams are spending weeks on plumbing - handling call flows, extracting variables, dealing with telephony edge cases, and redeploying for small changes. Tools like Vapi/Retell are easy to start with but come with lock-in and platform fees. So we built Dograh: a 100% open-source platform that handles the full stack, with a visual workflow builder and self-hosting by default.

Dograh v1.20 introduces two major additions: 1. Gemini 3.1 Live support Run fully real-time voice agents using Gemini’s streaming APIs, without stitching together separate STT + LLM + TTS components. 2. Pre-recorded audio (hybrid voice) Upload real voice clips (greetings, confirmations, etc.), and the agent plays them instantly while using TTS only for dynamic responses. This reduces latency, improves naturalness, and cuts TTS costs.

It also includes:

- Plug-and-play LLM / STT / TTS (including self-hosted models) - Telephony integrations (Twilio, Vonage, Telnyx) along with Call Transfer - Post-call QA, transcripts, and variable extraction - Observability via Langfuse (OpenTelemetry traces + prompt playground)

Try it now: If you have Docker, you can run the below command for a 2-minute setup (no API keys needed out of the box).

``` curl -o docker-compose.yaml https://raw.githubusercontent.com/dograh-hq/dograh/main/dock... REGISTRY=ghcr.io/dograh-hq ENABLE_TELEMETRY=true docker compose up --pull always ```

Looking Ahead: We’re expanding self-hosted model support: you can already bring any LLM (e.g. Llama, Qwen) or TTS (Kokoro, Voxtral) by configuring API endpoints. We are working on updates that will enable anyone to run everything on a single server - your AI models along with Dograh Orchestration.

Looking forward to hearing thoughts of the community.

Similar Projects

AI/ML●●Solid

Local Voice Assistant

This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.

WizardryNiche Gem
armcat
203mo ago