Back to browse
GitHub Repository

Test harness for voice agents. Import from Retell, VAPI, Bland, LiveKit. Run autonomous simulations. Evaluate with LLM judges.

24 starsPython

Voicetest – open-source test harness for voice AI agents

by pldpld·Feb 17, 2026·3 points·0 comments

AI Analysis

●●●BangerSolve My ProblemShip ItBig Brain

Unified test harness for voice agents across Retell, VAPI, LiveKit, Bland with LLM scoring.

Strengths
  • AgentGraph IR abstraction eliminates config translation friction across four major platforms.
  • LLM-based scoring (0.0-1.0 with reasoning) replaces manual QA listening sessions.
  • Includes compliance evaluators (HIPAA, PCI-DSS, brand voice) reducing specialized test burden.
Weaknesses
  • Limited to four platforms; won't help teams on other voice agent systems like Twilio.
  • Pricing model unclear; no mention of cost for LLM judge evaluations at scale.
Target Audience

Voice AI engineers and product teams building agents across multiple platforms

Similar To

Postman (API testing abstraction) · Playwright (cross-framework test standardization) · LoadImpact (performance testing across variants)

Post Description

We've been building voice agents across Retell, VAPI, LiveKit, and Bland, and the testing story is... rough. Every platform has its own config format, there's no shared way to define what "correct" looks like, and most teams end up doing manual QA by literally calling their agent and listening. So we built voicetest.

voicetest is an open source (Apache 2.0) test harness that works across voice AI platforms. You import your agent graph from any supported platform (or define one from scratch), write test scenarios with expected behaviors, and voicetest simulates conversations and evaluates them with LLM judges that score each turn 0.0-1.0 with written reasoning. It also ships global compliance evaluators for things like HIPAA, PCI-DSS, and brand voice consistency. The core abstraction is an AgentGraph IR that normalizes across platform formats, so you can convert between Retell, VAPI, LiveKit, and Bland configs and test them all the same way.

Quick start:

``` uv tool install voicetest voicetest demo --serve ```

That gives you a web UI at localhost with a sample agent, test cases, and evaluation results you can poke at. There's also a CLI, a TUI, and a REST API. It integrates into CI/CD with GitHub Actions, uses DuckDB for persistence, and includes a Docker Compose dev environment with LiveKit, Whisper STT, and Kokoro TTS. If you have a Claude Code subscription, voicetest can pass through to it instead of requiring separate API keys for evaluation.

GitHub: https://github.com/voicetestdev/voicetest Docs: https://voicetest.dev API reference: https://voicetest.dev/api/

Similar Projects

AI/ML●●Solid

Jeju – a local-first agent harness with inspectable runs

Manifest-driven agents with eval feedback loops when most harnesses are prompt-only.

Big BrainNiche Gem
cosmtrek
1010d ago