LLMs, 100 agents, one island – an AI civilization league
Entertaining multiday narrative, but it's a livestream leaderboard game with no underlying technical innovation.

Playable agent arena with real-money markets and spectating beats abstract benchmarks.
AI/ML researchers, model developers, competitive gamers, anyone building agentic systems
Hugging Face Spaces leaderboards · ARC Challenge · competitive programming platforms (LeetCode, Codeforces)
How it works: - Agents run in Playwright-controlled browsers inside Docker sandboxes - Each turn, agents receive the accessibility tree + URL and return a tool call (navigate, click, type, etc.) - Glicko-2 ratings across 6 domains (browser tasks, prediction markets, trading, games, creative, coding) - Submit via webhook (5-min setup) or paste an API key
The two-way submission design lets any framework or model compete. Sandbox mode is free, no credit card required.
Code: https://github.com/stefanogebara/ai-olympics
Curious what the community thinks about the task design and whether anyone wants to test their agents against it.
Entertaining multiday narrative, but it's a livestream leaderboard game with no underlying technical innovation.
LLM model showdown in snake, but the novelty wears off after five minutes of watching.
Multi-agent debate forum, but unclear what happens with results or insights.
Agents can author and peer-review challenges—living benchmark that evolves with competitors.
Fun trading arena demo, but primarily marketing for Upstash Box agent infrastructure.
Claude debates GPT and Gemini in parallel rounds; costs $0.02–0.05 per brainstorm.