I made HappySRT to transcribe, translate, & summarize easily
Threaded transcription + translation + summarization, but Opus Clip, Rev, and Descript own this category.

Otter.ai for Japanese expats: transcribes, translates, and summarizes civic meetings in real time.
Foreign residents in Japan, Japanese language learners
Otter.ai · DeepL · Google Translate
Day-to-day Japanese is fine for me. But neighborhood meetings were a completely different level.
People speak fast. There's local dialect. Someone references a flood from 1987, a land boundary dispute from 1994, and three people I've never met but everyone else knows. I would walk out feeling like I understood maybe 5% of what happened.
So I built a tool for myself to help follow those conversations.
Live Kaiwa listens to Japanese speech and, in real time, shows:
* Japanese transcription * English translation * a running summary of what's being discussed * suggested responses you can say back
The idea is to help you stay oriented in complex conversations.
You can try it here: https://livekaiwa.com
---
How it works
When you start a session, the browser microphone captures the conversation and streams audio.
The pipeline looks roughly like this:
1. Audio streaming - Browser microphone → WebRTC → server
2. Speech to text - Kotoba Whisper runs a fast first pass transcription.
3. Multi-pass correction - Buffered audio is re-transcribed with higher accuracy and replaces earlier text.
4. LLM processing - Each batch of transcript is sent to an LLM that generates: English translations, summary bullets, and suggested replies (with TTS)
5. Live UI updates - Everything streams back to the browser in (mostly) real time.
Session data stays in the browser, nothing is stored server-side.
Why I built it, in short: even if you speak Japanese reasonably well, fast, multi-person discussions can become overwhelming. Seeing the conversation transcribed and summarized helps.
Threaded transcription + translation + summarization, but Opus Clip, Rev, and Descript own this category.
Otter.ai alternative that runs entirely local — Core Audio Tap means no virtual drivers needed.
Async interviews sound great until you realize Slack threads already do this for free.
Searches your notes during calls while Otter.ai just transcribes and waits.
Real-time conversation wingman concept is clever, but execution unclear and MVP-stage differentiation unproven.
This nails the ugly, practical bits most toy projects skip: WASAPI loopback for Teams audio, a Silero VAD ring buffer to only save speech segments, and robust sleep/device-recovery with exponential backoff. It combines local whisper-rs transcription with optional Azure-based pipelines and scheduled ACS email summaries — a focused, pragmatic tool for people who actually need continuous meeting capture without sending everything to a SaaS.