Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6×
Cross-chip agent knowledge sharing beats CoreML by 6× on Apple Silicon.

Retrieval-aware inference beats 671B models by showing context matters more than scale.
ML engineers, research teams, anyone building small language model systems
Perplexity Sonar · DeepSeek-R1 with search · GPT-4 with Tavily
85.0% on 4,326 questions. where that lands:
ROMA (357B): 93.9% OpenDeepSearch (671B): 88.3% Sonar Pro: 85.8% Llama 3.2 3B + Keiro: 85.0%
the systems ahead of us are running models 100-200x larger. that's why they're ahead. not better retrieval, not better prompting — just way more parameters.
the interesting part is how small the gap is despite that. 3 points behind a 671B model. 0.8 behind Sonar Pro. at some point you have to ask what you're actually buying with all that compute for this class of task.
Want to know how low the reader model can go before it starts mattering. in this setup it clearly wasn't the limiting factor and also if smaller models with web enabled will perform as good( if not better) as larger models for a lot of non coding tasks
Full benchmark script + results --> https://github.com/h-a-r-s-h-s-r-a-h/benchmark
Keiro research -- https://www.keirolabs.cloud/docs/api-reference/research
Cross-chip agent knowledge sharing beats CoreML by 6× on Apple Silicon.
Genetic algorithm evolves x86 kernels; runs 80B MoE on single GPU with CPU offload.
Encrypted semantic search via modular arithmetic—98% quality, 8x faster than homomorphic encryption.
3.9s cold starts vs 45s+ for quantized models—real infra pain solved tangibly.
5.6x realtime on CPU with voice cloning beats most local TTS options.
Git branch for LLM agents — 400x faster forking with preserved KV cache.