Rapid-MLX – Run local LLMs on Mac, 2-3x faster than alternatives
Claims 4.2x Ollama speed with 0.08s cached TTFT on Apple Silicon.
OpenClaw local TTS plugin powered by mlx-audio, zero API key, zero cloud dependency
MLX-powered local TTS plugin for OpenClaw—elegant but audience is Apple Silicon only.
Apple Silicon Mac users running OpenClaw; developers needing private/offline TTS
OpenClaw built-in Edge TTS · oobabooga local TTS · Bark
It wraps mlx-audio and handles the full lifecycle: bootstraps its own Python environment via uv, downloads the model on first run, manages the server process, auto-restarts on crash, and exposes a standard OpenAI-compatible /v1/audio/speech endpoint.
Installation:
openclaw plugin install @cosformula/openclaw-mlx-audio Four models out of the box:
• Kokoro-82M: ~400 MB RAM, fastest, good for English/Japanese • Qwen3-TTS-0.6B: ~1.4 GB RAM, best Chinese quality, 3-second voice cloning • Qwen3-TTS-1.7B VoiceDesign: generate voices from text descriptions • Chatterbox: 16 languages, ~3.5 GB RAM
Works on 8 GB Macs with Kokoro or Qwen3-0.6B. A proxy layer injects model-specific parameters so OpenClaw's TTS client needs zero changes.
Why not just run mlx-audio directly? You can. This plugin removes the setup friction: no Python version juggling, no pip install, no manual server management. It also adds OOM detection, memory pre-checks, startup progress tracking, and hot config reload.
Claims 4.2x Ollama speed with 0.08s cached TTFT on Apple Silicon.
Full MLX power in Ruby: lazy arrays, Metal GPU, transformer layers—but Ruby adoption risk.
Free local CharacterAI with voice cloning under 10s audio, plus ESP32 hardware integration.
Unlocks Apple's locked LLM with OpenAI-compatible server for existing SDKs.
Native Swift inference with SSD streaming runs 100B MoE models without kernel panics.
The repo does one practical thing well: quantify the real-world impact of Apple Silicon's unified memory on analytics by running six TPC-H queries plus a GPU-favorable QX and shipping the raw charts and code. It's specific and empirical — you get MLX vs NumPy vs DuckDB numbers and PNGs, not just hand-wavy claims — but it's narrowly scoped to M4 hardware and small-ish scales, so its conclusions are useful for experimentation rather than sweeping generalization.