RunAnwhere – Faster AI Inference on Apple Silicon
Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.
Bit-exact f64 emulation on Metal GPUs where Apple's native double support is missing.
Graphics programmers and simulation engineers on Apple Silicon
SoftFloat · Cuda Softfloat
Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.
Native macOS VMs with APFS snapshots beat Docker for agent isolation.
Autonomous agent wrote custom Metal kernels boosting decode speed 42% over upstream llama.cpp.
MLX-powered local TTS plugin for OpenClaw—elegant but audience is Apple Silicon only.
Classic treemap UI back on native Apple Silicon, but disk space visualizers already exist.
The repo does one practical thing well: quantify the real-world impact of Apple Silicon's unified memory on analytics by running six TPC-H queries plus a GPU-favorable QX and shipping the raw charts and code. It's specific and empirical — you get MLX vs NumPy vs DuckDB numbers and PNGs, not just hand-wavy claims — but it's narrowly scoped to M4 hardware and small-ish scales, so its conclusions are useful for experimentation rather than sweeping generalization.