Voiced, image-based D&D inspired AI-native RPG

Name: Voiced, image-based D&D inspired AI-native RPG
Availability: InStock
Author: tommywilczek

by tommywilczek·Mar 6, 2026·1 point·5 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryCrowd PleaserEye Candy

AI game master that actually controls the world, not just narrates it—music, NPCs, items, cutscenes.

Strengths

•Structured AI commands orchestrate state changes (music, positions, items) rather than freeform narrative text
•Hand-authored world with voiced characters via Inworld avoids pure AI generation hallucination
•Solo dev shipping real-time inference on browser with Flux 2 Klein, separate quest journal agent

Weaknesses

•Alpha stage with known hallucinations and real per-turn inference costs unsustainable long-term
•Audience smaller than AI Dungeon or ChatGPT-based games; depends on monetization viability

Post Description

I'm a solo dev and I built a visual novel-style RPG where you type what you want to do and an AI game master responds in real time. Free alpha, plays in the browser.

What makes it different from AI Dungeon: the AI doesn't just generate text. It emits structured commands that change the music, move NPCs between locations, give/remove items, swap character portraits based on emotional reactions, and trigger cutscenes. Cinematic stills are generated on the fly with Flux 2 Klein 4B, and characters are voiced in real time via Inworld. Separate AI agents maintain a quest journal and write save summaries. The result feels more like a tabletop RPG session than a chatbot conversation.

The world is hand-crafted, not AI-generated. I wrote all the locations, characters, and lore by hand (Himalayan fantasy setting inspired by travel through Nepal and Bhutan). The AI's job is to run the game inside that authored world. Everyone explores the same world, every playthrough is different.

Stack: Godot 4.5 client, FastAPI backend, WebSocket streaming. Some AI calls use Gemini 3.1 Flash Lite, others use Claude Haiku 4.5 (cannot wait for 4.6). Cutscene images generated on the fly with Flux 2 Klein 4B. Voice TTS via Inworld.

Every turn costs real money in AI inference and I'm covering it until the $100 runs out (which will be a while because these models are SO cheap to run). Happy to answer questions about the architecture.