Replicating Thinking Machines Interaction Model demo for $0.01 [video]
Sub-cent CPU-only voice agent with vision-keyed proactivity beats cloud APIs on cost.
CPU-only voice agent approximating Thinking Machines' Interaction Models demo
Runs Thinking Machines-style voice agent on a laptop CPU with no GPU required.
Developers interested in local AI and voice interfaces
Open Interpreter · Voiceflow · Rhasspy
Sub-cent CPU-only voice agent with vision-keyed proactivity beats cloud APIs on cost.
Runs real-time vision-keyed voice agents on a laptop CPU without custom silicon or training.
Identity and access control between agents solves the single-user assumption most frameworks make.
This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.
CarPlay coding sessions over SSH is a commute workflow nobody else is tackling.
Prompt engineering library dressed up as metacognition infrastructure.