Harden – 5 AI models audit your code, then debate each other's findings

Name: Harden – 5 AI models audit your code, then debate each other's findings
Availability: InStock
Author: greatrat000

by greatrat000·Mar 8, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidBig Brain

Multi-model debate orchestration is clever, but 'audit with AI' is crowded territory.

Strengths

•Ensemble approach with cross-examination genuinely reduces hallucinations (72% → 97% accuracy)
•Supports five different use cases (contracts, medical, claims, resumes, copy) beyond Solidity audit
•Transparent pricing tiers with crypto payment shows serious Web3 positioning

Weaknesses

•Smart contract auditing already has professional firms and established tools (ConsenSys, Trail of Bits)
•No evidence of real security findings vs false positives on live contracts; test data unclear

Post Description

I built harden because I kept copy-pasting code between ChatGPT, Claude, and Gemini trying to cross-check their reviews. Each one found things the others missed, but synthesizing their outputs manually was painful.

harden runs 5 frontier models (Claude, GPT-4o, Gemini, Mistral, DeepSeek) in parallel on the same input. They analyze independently, then cross-examine each other's findings. A coordinator synthesizes the debate into consensus findings and produces a fixed version.

The key insight: no single model finds more than ~72% of issues. The union of all five hits ~94%. After cross-examination (where models must defend findings against skeptical peers), accuracy rises to ~97% and false positives drop ~60%.

How it works: - Round 1: All 5 models audit independently (no groupthink) - Debate: Each model reviews others' findings, provides evidence for/against - Consolidation: Only findings that survive cross-examination make the report - Fix: Coordinator produces a revised version addressing consensus issues - Round 2+: Same pipeline runs on the fixed version, catching fix-introduced bugs

Started with smart contract audits but it generalizes — legal docs, resumes, fact-checking, financial analysis all benefit from multi-model consensus.

Free tier available.

Built with React, Node, SSE streaming for real-time progress. The debate transcripts are the most interesting part — watching GPT-4o argue with Claude about whether a reentrancy vector is exploitable is genuinely useful.

Blog with more details on the multi-model approach: https://harden.center/blog