Reproducible open-source STT API benchmarks with full methodology
Fixes WER scores by normalizing '$50' and 'fifty dollars' as equivalent.
Benchmark transcription APIs against real meeting audio. Measure WER, diarization, latency, and cost.
Tests Deepgram and AssemblyAI on actual crosstalk instead of clean audiobook samples.
Backend developers, CTOs choosing transcription infrastructure
Hugging Face Spaces · MLPerf
Fixes WER scores by normalizing '$50' and 'fifty dollars' as equivalent.
Dual-channel audio capture via PulseAudio is a genuinely useful detail most wrappers miss.
Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.
Cuts keyterm hallucinations by 60% before audio hits Deepgram.
Local Whisper transcription with zero audio egress, Chrome side panel stays invisible.
Benchmarked dead code finder across FastAPI, Pydantic, Flask—but Vulture, Bandit already solve this.