TTSLab – Text-to-speech that runs in the browser via WebGPU
Whisper + Kokoro entirely in-browser via WebGPU, no API keys or network requests.

Full voice agent (STT→LLM→TTS) runs locally on GPU, no backend needed.
Developers evaluating TTS/STT, researchers benchmarking speech models, product teams needing privacy-first voice features.
Transformers.js (ONNX in browser) · Ollama (local model runner, but desktop-focused)
No API keys, no backend, no data leaves your machine.
When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.
You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.
The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.
Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base
Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models
Source: https://github.com/MbBrainz/ttslab (MIT)
Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.
Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.
Whisper + Kokoro entirely in-browser via WebGPU, no API keys or network requests.
20+ TTS models in one place, but Eleven Labs and Play.ht already own this space.
Shrinks the usual TTS bloat into a 16MB Electron-alternative wrapper while still letting you clone voices from a short sample and 'design' voices from text prompts. It handles model downloads for you, supports batch exports and macOS auto-updates — smart product trade-offs. Caveat: the app binary is tiny, but the underlying TTS models are downloaded on demand, so expect large model pulls behind the scenes.
This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.
Kokoro voice cloning with multilingual support, but voice cloning itself is crowded.
48 ASR models + WebGPU TTS offline beats Whisper-only alternatives like Otter.ai.