An LLM that's better at writing
Novel fine-tuning algorithm for writing, but the demo model is too small to prove the concept.

LLM with simulated exhaustion state—forces grounded prose when stressed, prevents inventory hallucinations.
Interactive fiction enthusiasts, local LLM hobbyists, game developers experimenting with AI agents
Dwarf Fortress (procedural simulation grounding narrative) · AI Dungeon (interactive fiction with LLMs) · Custom character AI systems with state management
I love local LLMs for interactive fiction, but I got tired of the "Naked Model" paradox. If you tell a standard RLHF'd model "I drink the health potion," it will gleefully describe you drinking it—even if your inventory is completely empty. They have no object permanence and are biased to be sycophantic assistants.
To fix this, I built BoneAmanita, an architecture that puts a fine-tuned LLM inside a Python-simulated body.
How it works: It’s a two-part system:
The Brain (GGUF): A custom 3B model (fine-tuned on Llama 3.2 via Unsloth). I scrubbed out the "helpful assistant" RLHF and trained it strictly on atmospheric, sensory, and philosophical prose.
The Body (Python Engine): A local terminal hypervisor that runs a physical simulation. It tracks variables like "ATP" (stamina), "ROS" (trauma), "Voltage," and Cortisol.
The Feedback Loop: The Python engine intercepts every turn and dynamically rewrites the LLM's system prompt based on its metabolic state. If you stress the engine out with high-entropy actions, its simulated Cortisol spikes. The Python engine injects a strict prompt override forcing the LLM to output short, fragmented, defensive sentences. It literally gets exhausted.Solving the Hallucination Problem (The Gordon Shock): To enforce hard physics, the Python engine manages a strict inventory state. If you attempt an impossible action (e.g., washing a car in a forest), an internal interceptor ("Gordon") catches the premise violation before the LLM can "Yes, and..." you. Gordon violently injects a CRITICAL OVERRIDE into the context window, forcing the LLM to coldly reject the action and ground you in reality.
It boots into 4 modes (Adventure, Conversation, Creative, Technical) depending on how strict you want the physics engine to be.
You can pull the brain straight through Ollama: ollama pull hf.co/aedmark/vsl-cryosomatic-hypervisor
And run the Python hypervisor here: https://github.com/aedmark/BoneAmanita
It’s completely free, local, and released under The Unlicense.
Come play!
Novel fine-tuning algorithm for writing, but the demo model is too small to prove the concept.
Wraps mlx-lm fine-tuning into a guided desktop UI, but local LLM tools are crowded.
Fine-tune LLMs on Apple Neural Engine using reverse-engineered private frameworks — genuinely novel approach.
Ancient Rome Q&A benchmark shows 81pp accuracy lift, but lacks adversarial defense evidence.
Mountain Curriculum routing: 5× compute to hard samples, skip mastered ones.
SHA-256 deterministic RNG beats Python hash for reproducible dataset generation.