Tok/s on a 35B MoE model using a $100 AMD crypto APU and Vulkan
Clever hardware hack but this is a config guide, not a shipped tool.

Runs 19.5GB Qwen3.5 on 12GB RAM iPhone via memory swapping.
Mobile AI developers, quantization enthusiasts
MLC LLM · llama.cpp
Clever hardware hack but this is a config guide, not a shipped tool.
Bypasses NVIDIA's artificial FP4 lock—122B MoE on single desktop GPU at 31 tok/s.
Temporary public endpoint for Qwen3.6-35B quant on a spot instance.
Kernel ttm.pages_limit workaround unlocks 16GB UMA for Vulkan inference on repurposed crypto hardware.
Yet another multimodal wrapper when Cursor and Continue already dominate this space.
Native Swift inference with SSD streaming runs 100B MoE models without kernel panics.