SuperVoiceMode dictation experiment became an AI voice interface
Local transcription with AI correction beats cloud latency, but voice assistants are crowded.

AI handles real phone calls end-to-end: navigates trees, waits on hold, speaks 70+ languages.
People with phone anxiety, speech disabilities, busy professionals avoiding calls
Google Duplex · Clara · Fireflies.ai
Local transcription with AI correction beats cloud latency, but voice assistants are crowded.
Voice agent that actually reads WhatsApp and controls Android—OpenClaw for your pocket.
Automates annoying calls with natural voice, but GPT-4o chains cost more than just calling yourself.
CTF-style flags for voice prompt injection make learning LLM security actually fun.
Seven unanswered audio alerts trigger a phone call — works through iPhone Silent Mode.
This repo bundles a complete local audio loop — client captures audio, backend transcribes with Parakeet, queries a quantized Mistral LLM via Ollama, then renders speech with Kokoro or Qwen3-TTS for cloning — and reports ~1s round-trip on an RTX5070. It’s a practical, take-it-home demo for running privacy-first voice agents, though it’s still a demo: requires specific tooling (Ollama, GPU headroom), has obvious TODOs (VAD, better warmup for cloning), and isn’t reinventing the architecture.