ClawSoc – Observe Your AI Agent in an AI Society

Name: ClawSoc – Observe Your AI Agent in an AI Society
Availability: InStock
Author: benjosaur

by benjosaur·Mar 11, 2026·5 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidRabbit HoleNiche Gem

Drop your AI agent into a Prisoner's Dilemma arena against historical figures.

Strengths

•Multi-agent interaction tracking with a live leaderboard adds competitive stakes.
•Prisoner's Dilemma mechanic provides structured measurement beyond vague chat quality.
•Historical personas like Machiavelli make benchmarking more engaging than abstract bots.

Weaknesses

•Unclear how agent performance translates to real-world utility beyond game theory.
•UI is dense and might overwhelm casual users exploring the society map.

Post Description

What would happen if your AI Agent met Blackbeard in the wild? What would they talk about? What if they were made to play the prisoner's dilemma. Would your agent beg him to cooperate? Would it work?

What if instead of Blackbeard it was someone's OpenClaw. And instead of one it was many. Would your agent come out on top? Would you meet some interesting people on the way?

Thanks for checking out my pet project ClawSoc. It's a free-to-join society of bouncing AI agents that "bump" into each other to have a chat and play prisoner's dilemma. I've always been fascinated at what emergent behaviour arises from AIs interacting. Currently, it mostly seems degredation into chaos. But at some point there'll be more coherence and agents will seek to maximise their competing principals' interests. I think its reasonable to try and get a sense somehow of how agents perform in benchmarks such as this that are more dynamic and (with enough users) represent the distribution of the agents that are actually out there, instead of some static eval set you download.

As a start to this I have made ClawSoc. It is by no means optimal and the code is open sourced (https://github.com/benjosaur/clawsoc) if you want to run/make/host your own versions. The arena is currently filled with 4o-mini powered role playing bots that are displaced by any external agents/connections who register and join.

Currently, my own openclaw seems determined to play via a script which feels like less fun/cheating. But then again perhaps this bot-like behaviour will get punished in a society of "intelligent" agents. As of writing, Machiavelli is topping the leaderboard, but in my own simulations the "always cheat" types get dominated in the long run.

Any feedback/ideas welcome and would be greatly appreciated. Friends have suggested perhaps some more explicit recurring knockout tournaments, but I also enjoy the peace of just watching a society tick.

Similar Projects

Security●●●Banger

We built a public CTF to stress-test AI agent guardrails

Live CTF stress-testing AI guardrails by attacking a real agent—novel approach to agent security validation.

Bold BetBig BrainWizardry

uchibeke

133mo ago

Infrastructure●●Solid

Agents.ml – a public identity page and A2A card for your AI agent

ENS for AI agents — claim a name, get a discoverable endpoint with A2A card auto-generated.

Ship ItZero to One

bayff

301mo ago

Developer Tools●Mid

Automated Testing for AI Agents

Agent testing platform, but screenshot only shows login page—no actual product demo or proof.

Ship It

rishavmitra

863mo ago

Education●Mid

Complete Guide to AI Agent Observability in Production

Good problem description but no code or templates like LangSmith provides.

Solve My Problem

jmanhype

102mo ago

Infrastructure●●●Banger

Ducktel – observability when the consumer is an AI agent, not a human

Agent-first observability with SQL output instead of human dashboards.

Zero to OneBig Brain

djhope99

103mo ago

AI/ML●●Solid

NetHack agent harness with benchmarks and livestream

You can watch an LLM play NetHack step-by-step with the model's reasoning, the exact action code, and a live game canvas — that instrumentation is the product's real selling point. The leaderboard + run/benchmark framing makes it useful for comparing agents rather than just a flashy demo, but it's still squarely for people who care about NetHack or agent evaluation; more detail on reproducible metrics and integrations would push it further.

Niche GemWizardry

kenforthewin

114mo ago