GitHub Repository

8 starsPython

Kitten TTS Based Low-Latency Streaming Voice Assistant on CPU

Name: Kitten TTS Based Low-Latency Streaming Voice Assistant on CPU
Availability: InStock
Author: gauravvij137

by gauravvij137·Feb 26, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemWizardry

Sub-sentence TTS streaming beats Piper/Sherpa-ONNX latency by token-level triggering on CPU.

Strengths

•Genuine latency innovation: 1.25s TTFA via token-based synthesis vs sentence boundaries is measurably faster
•Thorough benchmarking and comparison table (Piper, Sherpa-ONNX) with concrete metrics
•32-core ONNX tuning and adaptive buffering show real optimization thought, not just 'runs on CPU'

Weaknesses

•Narrow audience: local voice assistants on CPU is a niche within edge AI, not a broad product category
•NEO AI agent narrative feels performative; core value is the latency trick, which is already stated upfront

Post Description

We asked Neo AI to build a small voice assistant pipeline that runs with low latency on CPU instead of requiring a GPU.

The goal was to see how responsive a LLM → speech system can be on normal laptops or edge devices.

It includes: - Voice Activity Detection - CPU-friendly LLM + TTS streaming - Async pipeline to reduce latency

Modular LLM backend

Useful for local assistants, robotics prototypes, privacy-first setups, or benchmarking STT/LLM/TTS latency.

We’ve been experimenting with similar CPU-first pipelines inside NEO workflows for on-device agents, and this repo is a minimal standalone version.

Would love suggestions on lightweight STT/TTS models or latency tricks people have used on CPU.

Similar Projects

AI/ML●●●Banger

Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

5.6x realtime on CPU with voice cloning beats most local TTS options.

WizardryDark Horse

ZDisket

444mo ago

Developer Tools●●Solid

I built a sub-500ms latency voice agent from scratch

Outperformed Vapi 2× on latency by treating voice as turn-taking, not transcription.

Big BrainWizardry

nicktikhonov

5701534mo ago

AI/ML●●●Banger

Three new Kitten TTS models – smallest less than 25MB

SOTA expressivity at 14M parameters beats cloud models for on-device TTS.

WizardryNiche GemZero to One

rohan_joshi

5611814mo ago

AI/ML●●Solid

KokoClone – Zero-shot voice cloning using Kokoro TTS

Kokoro voice cloning with multilingual support, but voice cloning itself is crowded.

Niche GemShip It

Ashish106

214mo ago

AI/ML●Mid

NalityAI – desktop voice assistant with 9 personalities built in Python

Nine personality modes are prompt variations wrapped in Tkinter with Groq API.

Cozy

Lashaga

111mo ago

AI/ML●●Solid

Local Voice Assistant

Qwen3-TTS voice cloning without finetuning in ~1 second on RTX 5070.

WizardryCozy

armcat

205mo ago