GitHub Repository

VITS EVOlution: Lightweight, deployable voice cloning TTS model

8 starsPython

Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

Name: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)
Availability: InStock
Author: ZDisket

by ZDisket·Mar 18, 2026·4 points·4 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryDark Horse

5.6x realtime on CPU with voice cloning beats most local TTS options.

Strengths

•31M parameters with 0.18 real-time factor on server CPU is genuinely impressive.
•Voice blending merges multiple speaker embeddings to create entirely new voices.
•Apache 2.0 license with MIT phonemizer avoids commercial deployment restrictions.

Weaknesses

•Audio quality and speaker similarity lag behind larger commercial models.
•Trained on LibriTTS-R and VCTK limits voice diversity compared to proprietary datasets.

Post Description

Hi guys and gals, I made a TTS model based on my highly upgraded VITS base, conditioned on external speaker embeddings (Resemble AI's Resemblyzer).

The model, with ~31M parameters (ONNX), is tuned for latency and local inference, and comes already exported. I was trying to push the limits of what I could do with small, fast models. Runs 5.6x realtime on a server CPU

It supports voice cloning, voice blending (mix two or more speakers to make a new voice), the license is Apache 2.0 and it uses DeepPhonemizer (MIT) for the phonemization, so no license issues.

The repo contains the checkpoint, how to run it, and links to Colab and HuggingFace demos.

Now, because it's tiny, audio quality isn't the best, and as it was trained on LibriTTS-R + VCTK (both fully open datasets), speaker similarity isn't as good.

Regardless, I hope it's useful.

Similar Projects

AI/ML●●Solid

KokoClone – Zero-shot voice cloning using Kokoro TTS

Kokoro voice cloning with multilingual support, but voice cloning itself is crowded.

Niche GemShip It

Ashish106

213mo ago

AI/ML●●Solid

Voice gender classifier for European voice AI (1MB, ONNX, 4ms)

Enables grammatical gender inflection in EU voice agents with 4ms CPU inference.

Niche GemSlick

biduskamil

521mo ago

Developer Tools●●Solid

Kitten TTS Based Low-Latency Streaming Voice Assistant on CPU

Sub-sentence TTS streaming beats Piper/Sherpa-ONNX latency by token-level triggering on CPU.

Niche GemWizardry

gauravvij137

303mo ago

AI/ML●●Solid

TTS.ai

Twenty-seven open-source TTS models in one UI with no signup required for the free tier.

SlickCrowd Pleaser

nadermx

301mo ago

AI/ML●●Solid

My 16MB vibe-coded voice cloning app

Shrinks the usual TTS bloat into a 16MB Electron-alternative wrapper while still letting you clone voices from a short sample and 'design' voices from text prompts. It handles model downloads for you, supports batch exports and macOS auto-updates — smart product trade-offs. Caveat: the app binary is tiny, but the underlying TTS models are downloaded on demand, so expect large model pulls behind the scenes.

Dark HorseWizardryShip It

yoav

203mo ago

AI/ML●●●Banger

Three new Kitten TTS models – smallest less than 25MB

SOTA expressivity at 14M parameters beats cloud models for on-device TTS.

WizardryNiche GemZero to One

rohan_joshi

5611812mo ago