Evaluating Local LLMs as language translators for my app

Name: Evaluating Local LLMs as language translators for my app
Availability: InStock
Author: 3stacks

by 3stacks·Jun 19, 2026·4 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Local 18 GB Gemma ties frontier cloud on Afrikaans translation.

Strengths

•Public harness and test sets enable full reproducibility of all results
•Dual metrics (COMET for meaning, chrF++ for surface) catch different failure modes
•Practical finding saves money: local models match cloud for specific language pairs

Weaknesses

•Only 200 sentences per language limits statistical confidence in rankings
•Author acknowledges can't verify sources weren't in model training data

Post Description

This is my first attempt at running an eval of this nature so would love some methodology feedback.

I can't guarantee the sources weren't already in the model's inputs without getting novel translations from native speakers, but from my experience using the top models, they feel very accurate. Even encountering somewhat obscure texts from a relatively small language the translations generally beat Google Translate for proper idiomatic meaning.