Back to browse
GitHub Repository

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.

688 starsSwift

SwiftLM – Qwen Chat on iPhone, 100B+ Moe on M5 Pro 64GB (Native Swift)

by aegis_camera·Apr 1, 2026·1 point·2 comments

AI Analysis

●●●BangerWizardryNiche Gem

Native Swift inference with SSD streaming runs 100B MoE models without kernel panics.

Strengths
  • SSD streaming swaps MoE layers directly from NVMe to GPU without trashing Unified Memory.
  • Hybrid TurboQuant achieves V3 quality at V2 speeds using custom Metal shaders.
  • Zero Python dependencies means no GIL overhead and single binary deployment.
Weaknesses
  • SSD streaming marked experimental; stability unproven compared to mature llama.cpp.
  • Apple Silicon lock-in excludes Windows and Linux users entirely.
Category
Target Audience

Apple Silicon developers, local LLM enthusiasts, iOS engineers

Similar To

Ollama · llama.cpp · LM Studio

Similar Projects

AI/ML●●Solid

Running Gemma 4 on an iPhone 13 Pro

Clean Swift wrapper for Gemma 4 with vision and audio on iPhone.

Niche GemShip It
dengjiuhong
101mo ago