NSED is public – Mixture-of-Models to Hit SOTA using self-hosted AI

Name: NSED is public – Mixture-of-Models to Hit SOTA using self-hosted AI
Availability: InStock
Author: t_peersky

by t_peersky·Feb 18, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainBold Bet

Three 8B-20B models beat GPT-5 at math via mixture-of-experts voting, fully local.

Strengths

•Genuine algorithmic insight: quadratic voting prevents model collapse and fixes naive majority voting's ceiling (54% → 84%)
•Paper-backed (arXiv 2601.16863) with reproducible AIME 2025 benchmarks—not hand-waved claims
•Enterprise-ready architecture: NATS bus, cost tracking, audit trails, human-in-the-loop injection

Weaknesses

•BSL 1.1 license limits commercial adoption—source-available, not open-source, despite framing
•Requires 64GB VRAM and orchestration overhead; benchmarks are on math problems, not general reasoning

Post Description

Hey HN, We're open-sourcing (source-available, BSL 1.1, patent pending) the orchestrator behind our paper benchmark results. NSED (N-Way Self-Evaluating Deliberation) is a Rust binary that coordinates multiple LLMs through structured rounds of proposals and cross-evaluation, using quadratic voting to prevent any single model from dominating the consensus.

The result: Three open-weight models (20B, 8B, 12B) on consumer GPUs — 64GB total VRAM, ~$7K hardware — score 84% on AIME 2025. The same models individually or with naive majority voting score ~54%. That's frontier-model performance on hardware you can buy at Micro Center.

How it works:

Each agent independently proposes a solution Every agent evaluates every other agent's work Scores aggregate via quadratic voting (cost of influence grows quadratically → no single model can dominate) Repeat. Agents see prior results, refine, re-evaluate System converges toward the highest-quality answer through adversarial cross-checking

It's provider-agnostic — mix Ollama, vLLM, OpenAI, Anthropic, or any OpenAI-compatible endpoint in the same deliberation. Everything streams over NATS JetStream with full persistence: every proposal, evaluation, score, and reasoning trace is logged and streamable via SSE.

Paper: arxiv.org/abs/2601.16863 Happy to answer questions about the architecture, the quadratic voting mechanism, benchmark methodology, or anything else.