Overlapping Speaker Transcription Model
Transcribes overlapping speakers in a single pass without needing separate diarization steps.

Reliable transcription with diarization, but Otter.ai, Rev, and AssemblyAI already own this.
Content creators, researchers, podcasters, and businesses needing video transcription without manual effort.
Otter.ai · Rev · AssemblyAI
Over the last few weeks I focused mostly on reliability. I changed the pipeline to extract audio first and then run transcription, which made long-file handling more stable. I also spent time improving failure handling so users see a clear message when a job fails, instead of raw model errors.
Pricing is intentionally simple right now: free users get 3 transcriptions per day, and there is one Unlimited plan at $20/month or $120/year.
I’d really appreciate feedback on the overall UX, whether the failure/retry behavior feels right, and whether the pricing is understandable for first-time users.
Transcribes overlapping speakers in a single pass without needing separate diarization steps.
It pairs WhisperX-grade transcription (speaker diarization and word-level timestamps) with optional multi-LLM analysis — summaries, Q&A, sentiment, topics and even fact-checking — plus YouTube import and standard export formats. Being vendor-agnostic and offering fact-checking is a smart differentiator, but the space is crowded (Descript/Otter/etc.); clearer accuracy numbers, pricing, or unique workflow hooks would make this stand out.
Echo deduplication and dual-channel audio for local meeting transcripts.
CPU-only ONNX transcription when Whisper.cpp already handles this well.
Local dictation with speaker diarization beats Whisper.cpp, but crowded against built-in macOS.
Native-script voice input for 11+ Indian languages, code-mixed, works everywhere.