17MB pronunciation scorer beats human experts at phoneme level

Name: 17MB pronunciation scorer beats human experts at phoneme level
Availability: InStock
Author: fabiosuizu

by fabiosuizu·Feb 20, 2026·4 points·2 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainSolve My Problem

Beats human experts at phoneme scoring while 70x smaller than SOTA models.

Strengths

•Quantized Citrinet-256 backbone achieves superhuman inter-annotator agreement (0.580 PCC vs 0.555 human) at 17MB—genuine constraint engineering.
•Sub-300ms CPU latency + REST/MCP/Azure APIs make it production-viable for real-time language learning feedback loops.
•CTC forced alignment + GOP scoring + ensemble methodology is technically non-obvious and well-benchmarked on standard datasets.

Weaknesses

•10-15% raw accuracy gap vs SOTA suggests limited upside for high-precision use cases beyond language learning apps.
•Demo UI is functional but barebones—no clear onboarding for developers to understand when/why to use this vs larger models.

Post Description

I built an English pronunciation assessment engine that fits in 17MB and runs in under 300ms on CPU.

Architecture: CTC forced alignment + GOP scoring + ensemble heads (MLP + XGBoost). No wav2vec2 or large self-supervised models — the entire pipeline uses a quantized NeMo Citrinet-256 as the acoustic backbone.

Benchmarked on speechocean762 (standard academic benchmark, 2500 utterances): - Phone accuracy (PCC): 0.580 — exceeds human inter-annotator agreement (0.555) - Sentence accuracy: 0.710 — exceeds human agreement (0.675) - Model is 70x smaller than wav2vec2-based SOTA

Trade-off: we're ~10-15% below SOTA on raw accuracy. But for real-time feedback in language learning apps, the latency/size trade-off is worth it.

Available as REST API, MCP server (for AI agents), and on Azure Marketplace.

Demo: https://huggingface.co/spaces/fabiosuizu/pronunciation-asses...

Interested in feedback on the scoring approach and use cases people would find valuable.

Similar Projects

AI/ML●●Solid

17MB pronunciation scorer beats human experts at phoneme accuracy

Phoneme-level scoring under 17MB beats commercial tools, but unclear if it generalizes beyond English.

Niche GemSolve My Problem

fabiosuizu

105mo ago

AI/ML●●Solid

17MB model beats human experts at pronunciation scoring

Beats humans at pronunciation scoring but doesn't ship product integration yet.

Big BrainWizardry

fabiosuizu

1315mo ago

AI math chat that knows your notes, connecting you to human experts

We built a platform where you can ask all your "stupid" math questions, and you can upload all your lecture notes, and the way you do math. It then learns your

lemma1729

1021d ago

Gaming●Mid