Back to browse
Harden – 5 AI models audit your code, then debate each other's findings

Harden – 5 AI models audit your code, then debate each other's findings

by greatrat000·Mar 8, 2026·1 point·0 comments

AI Analysis

MidBig Brain

Multi-model debate orchestration is clever, but 'audit with AI' is crowded territory.

Strengths
  • Ensemble approach with cross-examination genuinely reduces hallucinations (72% → 97% accuracy)
  • Supports five different use cases (contracts, medical, claims, resumes, copy) beyond Solidity audit
  • Transparent pricing tiers with crypto payment shows serious Web3 positioning
Weaknesses
  • Smart contract auditing already has professional firms and established tools (ConsenSys, Trail of Bits)
  • No evidence of real security findings vs false positives on live contracts; test data unclear
Category
Target Audience

Smart contract developers and protocol teams

Similar To

ConsenSys Diligence · Trail of Bits · SlowMist

Post Description

I built harden because I kept copy-pasting code between ChatGPT, Claude, and Gemini trying to cross-check their reviews. Each one found things the others missed, but synthesizing their outputs manually was painful.

harden runs 5 frontier models (Claude, GPT-4o, Gemini, Mistral, DeepSeek) in parallel on the same input. They analyze independently, then cross-examine each other's findings. A coordinator synthesizes the debate into consensus findings and produces a fixed version.

The key insight: no single model finds more than ~72% of issues. The union of all five hits ~94%. After cross-examination (where models must defend findings against skeptical peers), accuracy rises to ~97% and false positives drop ~60%.

How it works: - Round 1: All 5 models audit independently (no groupthink) - Debate: Each model reviews others' findings, provides evidence for/against - Consolidation: Only findings that survive cross-examination make the report - Fix: Coordinator produces a revised version addressing consensus issues - Round 2+: Same pipeline runs on the fixed version, catching fix-introduced bugs

Started with smart contract audits but it generalizes — legal docs, resumes, fact-checking, financial analysis all benefit from multi-model consensus.

Free tier available.

Built with React, Node, SSE streaming for real-time progress. The debate transcripts are the most interesting part — watching GPT-4o argue with Claude about whether a reentrancy vector is exploitable is genuinely useful.

Blog with more details on the multi-model approach: https://harden.center/blog

Similar Projects