Back to browse
GitHub Repository

Google's MRT2 → Core ML, full Apple Neural Engine execution, zero GPU. Runs locally on iPhone without burning your hand.

10 starsPython

Magenta Real-Time Music Generation on iPhone, Without the GPU

by MediaSquirrel·Jun 10, 2026·7 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainZero to One

Splits a 230M-param model across ANE and CPU to avoid thermal throttling on iPhone.

Strengths
  • Model partitioning across silicon components based on hardware affinity is genuinely novel.
  • Published rigorous metrics: 12-digit correlation, zero GPU time, 14ms p99 latency.
  • Runs 10 minutes straight on iPhone 12 Pro without melting or thermal throttling.
Weaknesses
  • iOS-only; no Android or cross-platform path demonstrated yet.
  • Specific to Magenta RealTime 2; unclear if technique generalizes to other models.
Category
Target Audience

Mobile developers, musicians, on-device AI engineers

Similar To

Off Grid AI · MLX · Core ML tools

Post Description

Last Thursday, Deepmind released Magenta Realtime 2 , an open source music generation model. They said it could run on Mac, but not iPhone.

As a v̵i̵b̵e̵ ̵c̵o̵d̵i̵n̵g̵ ̵a̵d̵d̵i̵c̵t̵ agentic AI maxxi and person who has melted iPhones before (link at bottom), I took that as a personal challenge and made it my weekend project.

On Saturday, I got it to run for 10min straight on an iPhone 12 Pro from 2020 without melting the phone or - shockingly - touching the GPU.

How? I chopped the model up into 5 pieces and set them each to run on different parts of Apple's system on a chip (SoC).

My past experience taught me that if you can actually leverage it, the iPhone's NPU is incredibly powerful, and power efficient. If you're doing sustained real-time generation for long periods of time on a device without a fan, you gotta use the neural engine or else you will melt the device.

See: https://accelerateordie.com/p/we-melted-iphones-for-science

The Apple Neural Engine has a ton of constraints, the main one being that it only accepts fixed shape inputs, and only supports some architectures -- which is why I chopped the model up into pieces.

But it works! And I wrote zero lines of code by hand. Back when I was running VC-backed companies, I would have needed a small team of grumpy greybeard engineers to do this and it would have taken 2-6 weeks. Now I can feed my own nerd fetish and do this stuff myself.

Next up: I'm building an iPhone app that ties into your heart rate, movement data, location etc to generate a real-time soundtrack to you life.

What a time to be alive!

Similar Projects

AI/ML●●●Banger

Valkyr LM Inference with Realtime Guarantees

Pure Vulkan compute enables LLMs inside game loops without CUDA lock-in.

WizardryNiche Gem
quatonion
301mo ago