GitHub Repository

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

16,475 starsPython

oMLX – Native Mac inference server that persists KV cache to SSD

Name: oMLX – Native Mac inference server that persists KV cache to SSD
Availability: InStock
Author: jundot

by jundot·Feb 19, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemWizardryShip It

SSD-cached KV blocks dodge re-prefill tax on context shifts—Claude Code now viable locally.

Strengths

•Paged SSD KV cache is genuinely clever: solves the specific pain of coding agents that invalidate prefixes mid-session.
•Menubar app + OpenAI-compatible API + built-in dashboard removes friction—real product, not a research demo.
•Continuous batching + multi-model LRU eviction + copy-on-write shows solid engineering depth beyond the core idea.

Weaknesses

•Apple Silicon only—massive addressable market, but eliminates Windows/Linux users (most of the inference server market).
•Text-only LLMs, no VLM/OCR yet—limits use cases vs. vLLM or Ollama's broader model support.

Post Description

I built an open-source LLM inference server optimized for Apple Silicon. The main motivation was coding agents - tools like Claude Code send requests where the context prefix keeps shifting, invalidating KV cache. A few turns later the agent circles back, and your Mac has to re-prefill the entire context from scratch.

oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix returns, it's restored instantly instead of being recomputed. This makes long coding sessions significantly faster.

It also supports continuous batching for concurrent requests, multi-model serving (LLM + embedding + reranker) with LRU eviction, block-level KV cache with prefix sharing and copy-on-write, OpenAI and Anthropic compatible APIs, and tool calling.

Ships as a signed macOS menubar app with a web dashboard.

GitHub: https://github.com/jundot/omlx

Similar Projects

AI/ML●●●Banger

oMLX – SSD-backed KV cache cuts coding agent TTFT from 90s to 1s on Mac

SSD-backed KV cache cuts coding agent TTFT from 90s to 1s, packed in a native macOS app.

WizardrySlick

jundot

413mo ago

Developer Tools●●●Banger

OMLX – coding agents on local LLMs without the painful reprefill

Persists KV cache to SSD—makes local LLMs actually usable for real coding.

WizardrySolve My Problem

jundot

303mo ago

AI/ML●Mid

Mlx-code – I built a "backyard shed" AI coding agent for Mac

Local MLX agent for Mac when Cursor and Copilot already dominate the market.

CozyShip It

JosefAlbers

201mo ago

Design●Mid

FOSS Lightroom alternative with video support (macOS)

HDR/EDR video grading on macOS, but explicitly not daily-driver ready yet.

Ship It

surrTurr

103mo ago

AI/ML●●●Banger

Orion – Native Training LLMs on the Apple Neural Engine Without CoreML

Direct ANE access bypasses CoreML to enable training—genuinely novel Apple Silicon unlock.

WizardryZero to OneBig Brain

mechramc

213mo ago

AI/ML●Mid

Running OpenClaw on a managed Mac Mini 4 instance

Shows how to run OpenClaw agents on a rented Mac mini M4 and use the 38 TOPS Neural Engine for low-latency local inference while offloading heavy work to Scaleway's Generative APIs. Practical details — hourly billing, remote desktop access, and step-by-step tutorials — make it useful for PoCs, but it's essentially a cloud-provider integration rather than a new agent platform.

Niche GemSolve My Problem

enthusaist

204mo ago