Back to browse
GitHub Repository

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

2,526 starsPython

Rapid-MLX – Run local LLMs on Mac, 2-3x faster than alternatives

by raullen·Apr 18, 2026·9 points·4 comments

AI Analysis

●●●BangerWizardrySolve My Problem

Claims 4.2x Ollama speed with 0.08s cached TTFT on Apple Silicon.

Strengths
  • Specific tok/s benchmarks per model and RAM configuration
  • 1900+ tests with 17 tool parsers for reliable function calling
  • Drop-in OpenAI replacement works with Cursor and Claude Code
Weaknesses
  • Apple Silicon only, no Linux or Windows support
  • Performance claims need independent verification beyond author benchmarks
Category
Target Audience

Mac developers running local LLMs for coding assistants

Similar To

Ollama · llama.cpp · LM Studio

Similar Projects

AI/ML●●●Banger

I built a free CharacterAI that runs locally

Free local CharacterAI with voice cloning under 10s audio, plus ESP32 hardware integration.

Zero to OneWizardrySolve My Problem
akadeb
842mo ago