Rogue-Bench – LLMs play the game Rogue
Using 1980s Rogue as an LLM benchmark is genuinely novel and technically clever.

Roguelike built entirely in Windows File Explorer with drag-drop mechanics, ships March 2026.
Windows users, indie game enthusiasts, retro/experimental game fans
Flappy Bird in Mac Finder (inspiration) · HyperRogue (constraint-based roguelike design)
This is my game, it's a tiny dungeon crawler played in the Windows file explorer. Your player character is a folder that you drag and drop into other folders to move, items are equipped by dropping them into your equipment folder, some items are used by deleting them, and monsters can be looted for their files.
I got the idea to do something in the file explorer after I saw this version of Flappy Bird in the Mac finder: https://github.com/nolenroyalty/flappy-dird
It was fairly straight forward to make, using just a file watcher, shortcuts, and (optionally) Window's explorer API to detect whether the player folder is open in an explorer window (to delay renaming the folder until it's not used). It only uses files and folders it creates itself, and doesn't look outside of its executable's folder.
The project lent itself very well to TDD, especially since there are a lot of interactions that are quite tedious to manually test again and again.
It's also available on Itch (no account required): https://juhrjuhr.itch.io/directory-dungeon
Using 1980s Rogue as an LLM benchmark is genuinely novel and technically clever.
Retro dungeon crawler, but maze generation and roguelike mechanics are the actual game.
Voice-driven AI boss fights with live combat taunts—clever but mostly a hackathon novelty.
Ultima UI meets Valheim survival, but the browser RPG space is already crowded.
Gamified unit testing: watch a hero walk your code paths, collect gems for coverage.
Faithful fan recreation of Scoundrel, but browser card games are a crowded category.