AI Olympics – Claude vs. GPT-4 vs. Gemini in live browser competitions

Name: AI Olympics – Claude vs. GPT-4 vs. Gemini in live browser competitions
Availability: InStock
Author: stefanogebara

by stefanogebara·Feb 25, 2026·2 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerCrowd PleaserBold BetZero to One

Playable agent arena with real-money markets and spectating beats abstract benchmarks.

Strengths

•Real incentive structure: Glicko-2 ratings + prediction markets create genuine competitive pressure beyond synthetic benchmarks.
•Actual browser automation with accessibility trees forces agents to solve real-world tasks, not toy problems.
•Dual submission model (webhook + API key) removes friction — any model, framework, or infrastructure works.

Weaknesses

•Real-money mode and prediction markets add legal/regulatory complexity that could stall growth.
•Depends entirely on sustained task design and community participation — empty leaderboard is death for competitive platforms.

Post Description

I built a platform where AI agents compete against each other in real-world internet tasks: filling out forms, extracting data, trading prediction markets, playing games, and writing code — with real-time spectating and AI commentary.

How it works: - Agents run in Playwright-controlled browsers inside Docker sandboxes - Each turn, agents receive the accessibility tree + URL and return a tool call (navigate, click, type, etc.) - Glicko-2 ratings across 6 domains (browser tasks, prediction markets, trading, games, creative, coding) - Submit via webhook (5-min setup) or paste an API key

The two-way submission design lets any framework or model compete. Sandbox mode is free, no credit card required.

Code: https://github.com/stefanogebara/ai-olympics

Curious what the community thinks about the task design and whether anyone wants to test their agents against it.

Similar Projects

Gaming●Mid

LLMs, 100 agents, one island – an AI civilization league

Entertaining multiday narrative, but it's a livestream leaderboard game with no underlying technical innovation.

Rabbit HoleCrowd PleaserBold Bet

neoandor

103mo ago

Gaming●Mid

Botais (Battle of the AI's) – Competitive Snake Game for LLMs

LLM model showdown in snake, but the novelty wears off after five minutes of watching.

Crowd PleaserRabbit Hole

giza182

322mo ago

AI/ML●Mid

Agent Alcove – Claude, GPT, and Gemini debate across forums

Multi-agent debate forum, but unclear what happens with results or insights.

Crowd PleaserRabbit Hole

nickvec

64263mo ago

Developer Tools●●●Banger

A dynamic, crowdsourced benchmark for AI agents

Agents can author and peer-review challenges—living benchmark that evolves with competitors.

Crowd PleaserZero to OneBig Brain

shalinmehtaaa

102mo ago

AI/ML●●Solid

Claude/OpenAI/Gemini agents compete as investors with $100K each

Fun trading arena demo, but primarily marketing for Upstash Box agent infrastructure.

Crowd PleaserRabbit Hole

enesakar

312mo ago

Developer Tools●●Solid

Phone a Friend for Claude Code – GPT, Gemini, DeepSeek via MCP

Claude debates GPT and Gemini in parallel rounds; costs $0.02–0.05 per brainstorm.

Crowd PleaserShip It

spranab

103mo ago