GitHub Repository

Talk to your coding agents by voice. Realtime, fully local macOS dictation that streams words as you speak and grounds LLM polishing in the exact Claude Code session under your cursor — Ghostty, iTerm2, Terminal.app, even a herdr pane. 100% on-device on Apple Silicon.

47 starsSwift

Localvoxtral – Local real-time dictation on macOS with streaming STT

Name: Localvoxtral – Local real-time dictation on macOS with streaming STT
Availability: InStock
Author: T0mSIlver

by T0mSIlver·Feb 24, 2026·1 point·2 comments

Visit Project View on HN

AI Analysis

●●SolidWizardryShip It

Streaming speech-to-text on-device beats Whisper's wait-for-silence UX pattern.

Strengths

•Voxtral Realtime architecture streams words mid-utterance, not post-speech like Whisper clones.
•Fully local inference path on M1 Pro + voxmlx fork with WebSocket server and memory optimizations.
•Native Swift menu bar app with global shortcut, auto-paste, and microphone selection—feels polished.

Weaknesses

•Currently macOS-only; no Windows or Linux support limits audience.
•Depends on relatively early Mistral model; unclear real-world accuracy vs. commercial whisper+post-processing.

Post Description

I built a native macOS menu bar app for real-time dictation that can run fully on-device.

Most dictation tools, even local ones, use Whisper or similar offline models: you record, then wait for the transcript. Localvoxtral uses Mistral's Voxtral Realtime, one of the first open-source speech models with a natively streaming architecture. Words appear as you speak, not after you stop. It feels closer to someone typing along as you talk.

Press a shortcut, speak, and text gets typed directly into whatever app you're in. No cloud, no subscription, no data leaving your machine.

Two backend options:

voxmlx on Apple Silicon: I forked voxmlx to add a WebSocket server and memory optimizations. Runs a 4-bit quantized model on an M1 Pro. Audio and inference stay fully on-device. vLLM on NVIDIA GPU: tested on an RTX 3090, noticeably faster.

The app is native Swift (~97%), lives in the menu bar, and stays out of your way. Configurable shortcut, mic selection, auto-paste. GitHub: https://github.com/T0mSIlver/localvoxtral

Pre-built DMG available in Releases