oMLX – SSD-backed KV cache cuts coding agent TTFT from 90s to 1s on Mac
SSD-backed KV cache cuts coding agent TTFT from 90s to 1s, packed in a native macOS app.
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
SSD-cached KV blocks dodge re-prefill tax on context shifts—Claude Code now viable locally.
Apple Silicon Mac users running local LLMs, especially those using coding agents like Claude Code.
Ollama · vLLM · LM Studio
oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix returns, it's restored instantly instead of being recomputed. This makes long coding sessions significantly faster.
It also supports continuous batching for concurrent requests, multi-model serving (LLM + embedding + reranker) with LRU eviction, block-level KV cache with prefix sharing and copy-on-write, OpenAI and Anthropic compatible APIs, and tool calling.
Ships as a signed macOS menubar app with a web dashboard.
GitHub: https://github.com/jundot/omlx
SSD-backed KV cache cuts coding agent TTFT from 90s to 1s, packed in a native macOS app.
Persists KV cache to SSD—makes local LLMs actually usable for real coding.
Local MLX agent for Mac when Cursor and Copilot already dominate the market.
HDR/EDR video grading on macOS, but explicitly not daily-driver ready yet.
Direct ANE access bypasses CoreML to enable training—genuinely novel Apple Silicon unlock.
Shows how to run OpenClaw agents on a rented Mac mini M4 and use the 38 TOPS Neural Engine for low-latency local inference while offloading heavy work to Scaleway's Generative APIs. Practical details — hourly billing, remote desktop access, and step-by-step tutorials — make it useful for PoCs, but it's essentially a cloud-provider integration rather than a new agent platform.