Back to browse
Language1 – Benchmarking LLM comprehension of vague prompts via Taboo

Language1 – Benchmarking LLM comprehension of vague prompts via Taboo

by kaandemirel·Jun 18, 2026·1 point·0 comments

AI Analysis

●●SolidRabbit HoleBig Brain

Reverse Taboo gameplay doubles as LLM prompt comprehension benchmark dataset.

Strengths
  • Dual-purpose design: entertaining game plus research data collection for LLM evaluation.
  • Tracks token consumption and solve time across multiple model providers for comparison.
  • No signup required for instant play with interactive simulation for learning.
Weaknesses
  • Benchmark value depends on sufficient player participation and data volume.
  • Game mechanic itself is straightforward despite the research angle.
Category
Target Audience

LLM researchers, prompt engineers, word game enthusiasts

Similar To

Taboo · Heads Up · LLM benchmark suites

Post Description

Hi HN,

I built Language1 (https://language1.app), a word game where you play "reverse Taboo" against an LLM.

How it works: You are given a target word (e.g., "Apple") and a list of forbidden "taboo" words (e.g., "fruit", "red", "tree"). Your goal is to write a prompt that guides the LLM to output the exact target word, without using any of the forbidden words.

The Benchmark Goal: I am developing this project with the plan of using the gameplay data to build a benchmark dataset. The goal is to test and evaluate LLM capabilities when processing unclear prompts, metaphors, analogies, and vague explanations under semantic constraints.

Game Modes:

Single Player: Play through a pool of challenges to test your prompt precision. You compete against other players globally across attempts, solve time, and token consumption (measured via standard cl100k_base encoding). You can play instantly without registering, or sign in (one-click Google login) to submit scores to the leaderboards. Multiplayer Races: Real-time lobbies of up to 10 players racing through 3 rounds. Note: Since the game is new, public lobbies might be empty at first, but you can create private lobbies to play with friends. Available Models:

Anonymous users play with the default Gemma 3 Instruct model. Free registered users can choose between multiple models to test and compare reasoning styles, including Llama 3 8B, Liquid LFM 24B, Amazon Nova Micro, and Ministral 8B.

The Tech & Guardrails: The app is built with a React frontend and a Node.js/AWS Lambda backend. To keep things fair, we built a validation guard that parses input clues to block easy bypasses like letter-spacing (e.g., "A-P-P-L-E"), translations, cyphers, and base64. You have to rely purely on semantic reasoning to guide the model.

The game is completely free, has no ads, and is playable instantly in the browser.

I'd love to hear your thoughts on the gameplay and see what creative semantic tricks you use to guide the LLM!

Similar Projects

AI/MLMid

My "home rig" for iterative attribute-weighted LLM benchmarking

Home rig for attribute-weighted benchmarking lacks the polish of established eval frameworks.

Ship It
yuvalhaim
211mo ago