I ran Qwen3.5 35B on my iPhone at 5.6 tok/SEC

Name: I ran Qwen3.5 35B on my iPhone at 5.6 tok/SEC
Availability: InStock
Author: alexintosh

by alexintosh·Mar 21, 2026·4 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidWizardryBold Bet

Runs 19.5GB Qwen3.5 on 12GB RAM iPhone via memory swapping.

Strengths

•Achieves usable 5.6 tok/sec speed despite heavy memory swapping on iOS.
•Demonstrates 4-bit quantization efficiency on mobile hardware constraints successfully.
•Video proof validates on-device inference without cloud dependency.

Weaknesses

•No linked repository or build instructions to replicate the setup.
•Relies on specific iPhone model with unified memory architecture.

Post Description

I'm pretty confident I can run the 397B-A17B next.

Similar Projects

AI/ML●●Solid

Qwen3.6-35B-A3B on a 16 GB M1 Pro with SSD-streamed MoE

SSD-streamed MoE lets 16GB M1s run 35B models, but it's a specialized fork of antirez's ds4.

Big BrainNiche Gem

andreaborio

24412d ago

AI/ML●Mid

Tok/s on a 35B MoE model using a $100 AMD crypto APU and Vulkan

Clever hardware hack but this is a config guide, not a shipped tool.

Niche Gem

akandr

214mo ago

AI/ML●●●Banger

Samosa Chat - Run Qwen3.6-35B-A3B Locally on a 16 GB Mac

Fits a 35B MoE model into 16GB RAM by running entirely on CPU without GPU acceleration.

WizardryBig Brain

dwa3592

6915d ago

AI/ML●●●Banger

2-bit Qwen3.6-35B-A3B with ~100% FP8 quality retention

Runs a 35B MoE model on 24GB VRAM with 2-bit quantization and minimal quality loss.

WizardryBig BrainSolve My Problem

phe2019

2015h ago

AI/ML●●●Banger

NVFP4 on Desktop Blackwell – 122B MoE on a Single RTX PRO 6000 31 tok/s

Bypasses NVIDIA's artificial FP4 lock—122B MoE on single desktop GPU at 31 tok/s.

WizardryDark Horse

jcartu

204mo ago

AI/ML●●●Banger

Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

26B model at 124 tok/s on CPU by compressing the output head, not the experts.

WizardryBig BrainZero to One

arun-prasath

10129d ago