Recreate Thinking Machines 276B voice demo with duct tape and 8B model
Runs Thinking Machines-style voice agent on a laptop CPU with no GPU required.
CPU-only voice agent approximating Thinking Machines' Interaction Models demo
Runs real-time vision-keyed voice agents on a laptop CPU without custom silicon or training.
Developers building local-first AI agents, hobbyists
Thinking Machines Interaction Models · Open Interpreter · Home Assistant
Runs Thinking Machines-style voice agent on a laptop CPU with no GPU required.
Sub-cent CPU-only voice agent with vision-keyed proactivity beats cloud APIs on cost.
Replicates Thinking Machines' multimodal demo on a CPU laptop with commodity models.
CarPlay coding sessions over SSH is a commute workflow nobody else is tackling.
This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.
5.6x realtime on CPU with voice cloning beats most local TTS options.