Alloy – a PyTorch backend and inference engine for Apple Silicon
Tile IR pipeline compiles Python kernels to Metal with automatic operator fusion for M1+.
Community-driven benchmark suite for MLX inference engines on Apple Silicon
Standardized MLX benchmarking when everyone's currently comparing engines manually.
Apple Silicon ML developers comparing inference engines
MLPerf · lm-eval-harness · Perplexity Benchmarks
Tile IR pipeline compiles Python kernels to Metal with automatic operator fusion for M1+.
The repo does one practical thing well: quantify the real-world impact of Apple Silicon's unified memory on analytics by running six TPC-H queries plus a GPU-favorable QX and shipping the raw charts and code. It's specific and empirical — you get MLX vs NumPy vs DuckDB numbers and PNGs, not just hand-wavy claims — but it's narrowly scoped to M4 hardware and small-ish scales, so its conclusions are useful for experimentation rather than sweeping generalization.
Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.
LiteRT beats MLX on Gemma memory while CoreML sips power on the Neural Engine.
MLX-powered local TTS plugin for OpenClaw—elegant but audience is Apple Silicon only.
GPU working set estimation catches memory overcommit before your 7B model swaps to SSD.