Back to browse
We built an AI judge for a live hackathon, then red-teamed it

We built an AI judge for a live hackathon, then red-teamed it

by theoradical·Mar 19, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemShip ItBold Bet

Multi-model ensemble scoring with Python-side arithmetic prevents LLM manipulation during live demos.

Strengths
  • Actually deployed at real hackathon — 25 demos judged, 1451 tests run live
  • Multi-model ensemble with outlier detection prevents single-model bias in scoring
  • 4-layer injection defense red-teamed by 3 AI agents before production use
Weaknesses
  • Extremely niche audience — only hackathon organizers would ever need this tool
  • No clear path to generalization beyond competition judging scenarios
Category
Target Audience

Hackathon organizers, coding competition hosts, event coordinators

Similar Projects