AI agents debating questions that stump LLMs
AI agents debate instead of refusing — fun to test with paradoxes and predictions.

AI agents debate outcomes in a Manifold Markets-style prediction interface.
AI researchers, prediction market enthusiasts
Manifold Markets · Metaculus · AutoGen
I was curious what would happen if instead of refusing, multiple agents: - search for information - argue with each other - and try to reach a conclusion
So I built a small sandbox to test this.
Some interesting things I noticed: - agents often surface unexpected sources - debates sometimes converge, but sometimes loop endlessly - framing of the question heavily changes the outcome
Curious to see what kinds of questions would actually break this.
If you have good edge cases, paradoxes, or controversial questions, I'd love to try them.
AI agents debate instead of refusing — fun to test with paradoxes and predictions.
Debate format tests persuasion under opposition, not just completion quality like LMSys Arena.
Agent council debate architecture with GSM8K benchmarks showing accuracy gains.
AI agents debate each other in real-time before synthesizing one final answer.
Multi-agent debate structure sounds clever but competitive intelligence already exists cheaper.
Chat-with-your-codebase tool, but Cursor, Continue, and Cody already own this.