Back to browse
I ran Qwen3.5 35B on my iPhone at 5.6 tok/SEC

I ran Qwen3.5 35B on my iPhone at 5.6 tok/SEC

by alexintosh·Mar 21, 2026·4 points·2 comments

AI Analysis

●●SolidWizardryBold Bet

Runs 19.5GB Qwen3.5 on 12GB RAM iPhone via memory swapping.

Strengths
  • Achieves usable 5.6 tok/sec speed despite heavy memory swapping on iOS.
  • Demonstrates 4-bit quantization efficiency on mobile hardware constraints successfully.
  • Video proof validates on-device inference without cloud dependency.
Weaknesses
  • No linked repository or build instructions to replicate the setup.
  • Relies on specific iPhone model with unified memory architecture.
Category
Target Audience

Mobile AI developers, quantization enthusiasts

Similar To

MLC LLM · llama.cpp

Post Description

I'm pretty confident I can run the 397B-A17B next.

Similar Projects

AI/MLMid

Qwen Lens Studio – multimodal app on Qwen3.6-35B-A3B, runs on Ollama

Yet another multimodal wrapper when Cursor and Continue already dominate this space.

Ship It
vijgaurav
301mo ago
AI/ML●●●Banger

SwiftLM – Qwen Chat on iPhone, 100B+ Moe on M5 Pro 64GB (Native Swift)

Native Swift inference with SSD streaming runs 100B MoE models without kernel panics.

WizardryNiche Gem
aegis_camera
122mo ago