Cheap-IM – CPU-only voice agent approximating Thinking Machines' demo

Name: Cheap-IM – CPU-only voice agent approximating Thinking Machines' demo
Availability: InStock
Author: mrkn1

by mrkn1·May 17, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainWizardryDark Horse

Runs real-time vision-keyed voice agents on a laptop CPU without custom silicon or training.

Strengths

•Orchestrates YOLO11 pose detection and Silero VAD in a single async loop for sub-second reactions.
•Achieves 'slouch detection' and friend identification using purely off-the-shelf local models.
•Handles mid-conversation interrupts while background workers generate charts via API calls.

Weaknesses

•Still relies on remote LLM APIs (DeepInfra) rather than fully local inference for reasoning.
•Zero stars and no releases yet, suggesting early alpha status despite the impressive demo.

Similar Projects

AI/ML●●●Banger

Recreate Thinking Machines 276B voice demo with duct tape and 8B model

Runs Thinking Machines-style voice agent on a laptop CPU with no GPU required.

WizardryDark HorseBig Brain

mrkn1

305d ago

AI/ML●●●Banger

Replicating Thinking Machines Interaction Model demo for $0.01 [video]

Sub-cent CPU-only voice agent with vision-keyed proactivity beats cloud APIs on cost.

WizardryBig Brain

mrkn1

1029d ago

AI/ML●●●Banger

Realtime voice agent that sees, hears, and interrupts – on a CPU laptop

Replicates Thinking Machines' multimodal demo on a CPU laptop with commodity models.

WizardryBold Bet

mrkn1

116d ago

Productivity●●Solid

Voice control coding agents on your machine via smartwatch / CarPlay

CarPlay coding sessions over SSH is a commute workflow nobody else is tackling.

Niche GemBig Brain

Zante

7016d ago

AI/ML●●Solid

Local Voice Assistant

This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.

WizardryNiche Gem

armcat

204mo ago

AI/ML●●●Banger

Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

5.6x realtime on CPU with voice cloning beats most local TTS options.

WizardryDark Horse

ZDisket

443mo ago