Back to browse
I made an AI that reviews iPhone apps – 1h of autonomous GUI work

I made an AI that reviews iPhone apps – 1h of autonomous GUI work

by bayes-song·Mar 27, 2026·3 points·0 comments

AI Analysis

●●SolidBig BrainShip It

Handles hour-long GUI tasks by splitting workflows into separate child sessions for stability.

Strengths
  • Session architecture prevents context drift during long-horizon autonomous workflows.
  • Integrates real iPhone Mirroring instead of simulators for authentic mobile testing.
  • Local-first execution with support for eight different messaging channels for control.
Weaknesses
  • macOS-only support limits reach compared to cross-platform agents like OpenHands.
  • GUI agent space is saturating fast with Anthropic and others solving similar problems.
Category
Target Audience

Developers, QA engineers, automation enthusiasts

Similar To

Anthropic Computer Use · OpenInterpreter · OpenHands

Post Description

I've been building Understudy, an open-source GUI agent for macOS. Wanted to push the GUI stuff beyond the usual short demos, so I tried turning it into an iPhone app reviewer.

You give it one prompt. It browses the real App Store in Chrome, installs the app on a real iPhone through macOS iPhone Mirroring (not a simulator), opens the app and explores it — never seen Snapseed before — records clips and screenshots, composites a narrated review video with FFmpeg locally, uploads it to YouTube, then deletes the app. About an hour, didn't touch the keyboard.

The exploration part is what I'm happiest with. The agent reads the App Store description, goes "they say background removal works, let me try that," and then figures out an unfamiliar app on its own. It regrounds from the live screenshot every action, so unexpected dialogs or UI changes don't kill it.

The reason it can sustain an hour of work: each of the 6 stages runs as a separate child session with its own context. You can't fit an hour of screenshots into one window, so the isolation is necessary. Stages are typed — "workers" are deterministic (browser automation, device control), "skills" are agentic (the agent decides what to do). A "playbook" orchestrates both.

Result video (what the agent published): https://youtube.com/shorts/jliTvpTnsKY?feature=share

Process video (how it was built): https://youtu.be/gYMYI0bxkJs

X: https://x.com/LiangSong850509/status/2037612742392357218?s=2...

MIT license.

Similar Projects

Developer Tools●●Solid

I built an OpenClaw plugin for autonomous development saving 70% tokens

The repo actually implements an autonomous scheduling engine (work_heartbeat) with per-project isolation, role-based workers, and automated PR review loops — not just a toy demo. It's a bold, concrete attempt to run real dev work from chat (onboarding via channels, auto-created PRs), but it's niche and risky: the payoff depends on OpenClaw adoption and how comfortable you are giving agents commit/review power.

Bold BetWizardry
laurentenhoor
204mo ago