Back to browse
Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA

Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA

by mannybruv·Mar 7, 2026·6 points·1 comment

AI Analysis

●●●BangerBig BrainShip It

Retrieval-aware inference beats 671B models by showing context matters more than scale.

Strengths
  • Replaces expensive oracle scaling with a focused retrieval loop, shipping real code.
  • Clear benchmark: 85% vs Sonar Pro's 85.8% on 3B params vs 671B competitors.
  • Economics shift: $0.005 per query commoditizes agent reasoning for anyone with a laptop.
Weaknesses
  • SimpleQA is a narrow benchmark, unclear generalization to complex reasoning tasks.
  • Keiro is a closed API dependency, not a pure open-source win.
Category
Target Audience

ML engineers, research teams, anyone building small language model systems

Similar To

Perplexity Sonar · DeepSeek-R1 with search · GPT-4 with Tavily

Post Description

ran this over the weekend. stack was Llama 3.2 3B running locally + Keiro Research API for retrieval.

85.0% on 4,326 questions. where that lands:

ROMA (357B): 93.9% OpenDeepSearch (671B): 88.3% Sonar Pro: 85.8% Llama 3.2 3B + Keiro: 85.0%

the systems ahead of us are running models 100-200x larger. that's why they're ahead. not better retrieval, not better prompting — just way more parameters.

the interesting part is how small the gap is despite that. 3 points behind a 671B model. 0.8 behind Sonar Pro. at some point you have to ask what you're actually buying with all that compute for this class of task.

Want to know how low the reader model can go before it starts mattering. in this setup it clearly wasn't the limiting factor and also if smaller models with web enabled will perform as good( if not better) as larger models for a lot of non coding tasks

Full benchmark script + results --> https://github.com/h-a-r-s-h-s-r-a-h/benchmark

Keiro research -- https://www.keirolabs.cloud/docs/api-reference/research

Similar Projects

AI/ML●●●Banger

Thaw – Git branch for a running LLM (fork agents, skip prefill)

Git branch for LLM agents — 400x faster forking with preserved KV cache.

WizardryBig BrainSolve My Problem
nilsmatteson
305d ago