Free Live Speech Translator
Free Chrome-based translator when Google Translate already does live speech.
Live speech translation powered by on-device AI and cloud providers — OpenAI, Google Gemini, Palabra.ai, Kizuna AI, Volcengine, and more
48 ASR models + WebGPU TTS offline beats Whisper-only alternatives like Otter.ai.
Users needing private speech translation, developers building multilingual apps, meeting participants in remote calls.
Whisper (OpenAI) · Otter.ai · Google Live Translate
The latest release (v0.15) adds Local Inference mode — fully on-device ASR, translation, and TTS using WASM and WebGPU. No API key, no internet, no data leaving your machine. It ships with:
- 48 ASR models covering 99+ languages (sherpa-onnx WASM + Whisper WebGPU) - 55+ translation language pairs (Opus-MT) plus multilingual LLMs (Qwen 2.5/3/3.5) via WebGPU - 136 TTS models across 53 languages (Piper, Coqui, Mimic3, Matcha)
For those who prefer cloud providers, it also supports OpenAI Realtime API, Google Gemini Live, Palabra.ai, Volcengine ST, Doubao AST 2.0, and any OpenAI-compatible endpoint.
The browser extension integrates with Google Meet, Teams, Zoom, Discord, Slack, and others — it can capture participant audio and inject translated speech via a virtual microphone.
Tech stack: React + Zustand + Vite, Electron Forge, sherpa-onnx compiled to WASM, HuggingFace Transformers.js for WebGPU inference. Models are downloaded on demand and cached in IndexedDB.
I built this because existing translation tools either require expensive API keys, send your audio to the cloud, or don't support enough languages. The local inference mode makes it practical for privacy-sensitive use cases and for people without reliable internet.
AGPL-3.0 licensed. Available on Windows, macOS, Linux, Chrome Web Store, and Edge Add-ons.
GitHub: https://github.com/kizuna-ai-lab/sokuji Offical site: https://sokuji.kizuna.ai
Free Chrome-based translator when Google Translate already does live speech.
Keyboard-first macOS translation without context switching; runs entirely on-device.
Local Whisper + NLLB translation with 300ms latency overlay for Discord and games.
Four swap-able engines, 80ms latency, no subscription—beats Apple's 60-second timeout hard.
Wasm-to-Go compiler enabling pure-Go SQLite driver across 20 platforms.
Whisper + Kokoro entirely in-browser via WebGPU, no API keys or network requests.