Back to browse
150M Mandarin transcription model with real-time metadata detection

150M Mandarin transcription model with real-time metadata detection

by ksingla025·Jun 18, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemBig Brain

Dual-head Citrinet fine-tune beats running Whisper plus a separate classifier.

Strengths
  • Single forward pass outputs transcription and speaker metadata simultaneously
  • ONNX exports included for straightforward deployment across platforms
  • 94.2% tag accuracy on age, gender, and dialect classification is solid
Weaknesses
  • Only 60 hours of training data limits generalization to diverse speakers
  • Mandarin-only scope restricts broader applicability beyond Chinese markets
Category
Target Audience

Developers building Mandarin speech applications needing speaker metadata

Similar To

Whisper · NVIDIA NeMo · Azure Speech API

Similar Projects

AI/ML●●●Banger

Transcribe-Critic – Merge transcript sources for stronger transcript

Textual-criticism approach to transcript merging beats single-model Whisper on accuracy alone.

Big BrainZero to One
ringger
213mo ago