Back to browse
OpenGraviton – Run 500B+ parameter models on a consumer Mac Mini

OpenGraviton – Run 500B+ parameter models on a consumer Mac Mini

by fatihturker·Mar 7, 2026·13 points·5 comments

AI Analysis

●●SolidBig BrainWizardry

Ternary quantization and layer streaming for 140B models on Mac Mini, but claims lack real-world validation.

Strengths
  • Novel 1.58-bit ternary quantization ({-1, 0, +1}) achieves 10x compression over FP16
  • Layer streaming via mmap bypasses RAM limits by reading weights from NVMe on demand
  • Combines speculative decoding, dynamic sparsity, and MoE routing into unified system
Weaknesses
  • Benchmarks appear synthetic (140B stress test shows 35GB but not actual model inference quality)
  • No released code or working examples; claims unverified against real models like Mixtral
Category
Target Audience

ML researchers and Apple Silicon users experimenting with large model inference on consumer hardware

Similar To

llama.cpp · GPTQ quantization · Ollama

Post Description

Hi HN,

I built OpenGraviton, an open-source AI inference engine designed to push the limits of running extremely large models on consumer hardware.

The system combines several techniques to drastically reduce memory and compute requirements:

• 1.58-bit ternary quantization ({-1, 0, +1}) for ~10x compression • dynamic sparsity with Top-K pruning and MoE routing • mmap-based layer streaming to load weights directly from NVMe SSDs • speculative decoding to improve generation throughput

These allow models far larger than system RAM to run locally.

In early benchmarks, OpenGraviton reduced TinyLlama-1.1B from ~2.05GB (FP16) to ~0.24GB using ternary quantization. Synthetic stress tests at the 140B scale show that models which would normally require ~280GB FP16 can fit within ~35GB when packed with the ternary format.

The project is optimized for Apple Silicon and currently uses custom Metal + C++ tensor unpacking.

Benchmarks, architecture, and details: https://opengraviton.github.io

GitHub: https://github.com/opengraviton

Similar Projects

AI/MLMid

Running OpenClaw on a managed Mac Mini 4 instance

Shows how to run OpenClaw agents on a rented Mac mini M4 and use the 38 TOPS Neural Engine for low-latency local inference while offloading heavy work to Scaleway's Generative APIs. Practical details — hourly billing, remote desktop access, and step-by-step tutorials — make it useful for PoCs, but it's essentially a cloud-provider integration rather than a new agent platform.

Niche GemSolve My Problem
enthusaist
204mo ago
Other●●Solid

The Birds in My Backyard

Solar-powered local bird AI is charming, but the tech (BirdNET + local inference) is established.

CozyNiche Gem
arm32
103mo ago