Back to browse
I fit a 9-agent LLM pipeline into 1.5GB of RAM on iOS

I fit a 9-agent LLM pipeline into 1.5GB of RAM on iOS

by TheCosmicStage·Mar 5, 2026·2 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainShip It

ExecuTorch compilation + speculative decoding cuts 9-agent LLM to 1.5GB on iOS.

Strengths
  • Blackboard pattern decouples multi-agent reasoning without sequential context degradation, solving a real architectural problem.
  • Ahead-of-time PyTorch compilation to .pte binaries eliminates wrapper overhead; speculative decoding gives 2.2-3.6x speedup measured rigorously.
  • Tiered model strategy (1B/3B/11B) with identical architecture across hardware—thoughtful constraint-driven design balancing capability with device reality.
Weaknesses
  • Pre-release tech spec with no live demo, ship date, or user testing—vaporware risk outweighs the architectural innovation.
  • Whisper voice input + biometrics promised but incomplete; shipping timeline unclear and missing critical journaling features (export, sync, backup).
Category
Target Audience

Mobile developers, AI/ML engineers interested in on-device inference

Similar To

Ollama · llama.cpp · MLX (Apple Silicon)

Post Description

"Hey HN. I've been building a completely offline AI journal. The biggest hurdle was the memory footprint of running multiple agent personas. I ended up bypassing standard wrappers and using Meta's ExecuTorch to compile the PyTorch graphs ahead-of-time for the Apple Neural Engine, plus 4-bit quantization. Happy to answer any questions about the CoreML backend or managing the 'Blackboard' state object for the agents without killing the battery."

Similar Projects

AI/ML●●Solid

LLM-use – cost-effective LLM orchestrator for agents

Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.

Niche GemBig Brain
justvugg
213mo ago
AI/ML●●●Banger

Whichllm – Find and run the best local LLM for your hardware

One command finds and runs the best local LLM for your exact hardware specs.

Solve My ProblemBig BrainNiche Gem
andyyyy64
302mo ago
AI/ML●●Solid

Memex – A local-first AI journal that keeps everything as Markdown

Local-first AI journal with multi-agent architecture when most competitors store everything in the cloud.

Dark HorseSolve My Problem
sparkleMing
102d ago