Back to browse
GitHub Repository

VITS EVOlution: Lightweight, deployable voice cloning TTS model

8 starsPython

Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

by ZDisket·Mar 18, 2026·4 points·4 comments

AI Analysis

●●●BangerWizardryDark Horse

5.6x realtime on CPU with voice cloning beats most local TTS options.

Strengths
  • 31M parameters with 0.18 real-time factor on server CPU is genuinely impressive.
  • Voice blending merges multiple speaker embeddings to create entirely new voices.
  • Apache 2.0 license with MIT phonemizer avoids commercial deployment restrictions.
Weaknesses
  • Audio quality and speaker similarity lag behind larger commercial models.
  • Trained on LibriTTS-R and VCTK limits voice diversity compared to proprietary datasets.
Category
Target Audience

ML engineers, developers building offline voice features

Similar To

Coqui TTS · Piper · StyleTTS 2

Post Description

Hi guys and gals, I made a TTS model based on my highly upgraded VITS base, conditioned on external speaker embeddings (Resemble AI's Resemblyzer).

The model, with ~31M parameters (ONNX), is tuned for latency and local inference, and comes already exported. I was trying to push the limits of what I could do with small, fast models. Runs 5.6x realtime on a server CPU

It supports voice cloning, voice blending (mix two or more speakers to make a new voice), the license is Apache 2.0 and it uses DeepPhonemizer (MIT) for the phonemization, so no license issues.

The repo contains the checkpoint, how to run it, and links to Colab and HuggingFace demos.

Now, because it's tiny, audio quality isn't the best, and as it was trained on LibriTTS-R + VCTK (both fully open datasets), speaker similarity isn't as good.

Regardless, I hope it's useful.

Similar Projects

AI/ML●●Solid

KokoClone – Zero-shot voice cloning using Kokoro TTS

Kokoro voice cloning with multilingual support, but voice cloning itself is crowded.

Niche GemShip It
Ashish106
213mo ago
AI/ML●●Solid

TTS.ai

Twenty-seven open-source TTS models in one UI with no signup required for the free tier.

SlickCrowd Pleaser
nadermx
301mo ago
AI/ML●●Solid

My 16MB vibe-coded voice cloning app

Shrinks the usual TTS bloat into a 16MB Electron-alternative wrapper while still letting you clone voices from a short sample and 'design' voices from text prompts. It handles model downloads for you, supports batch exports and macOS auto-updates — smart product trade-offs. Caveat: the app binary is tiny, but the underlying TTS models are downloaded on demand, so expect large model pulls behind the scenes.

Dark HorseWizardryShip It
yoav
203mo ago