I made an AI that reviews iPhone apps – 1h of autonomous GUI work

Name: I made an AI that reviews iPhone apps – 1h of autonomous GUI work
Availability: InStock
Author: bayes-song

by bayes-song·Mar 27, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainShip It

Handles hour-long GUI tasks by splitting workflows into separate child sessions for stability.

Strengths

•Session architecture prevents context drift during long-horizon autonomous workflows.
•Integrates real iPhone Mirroring instead of simulators for authentic mobile testing.
•Local-first execution with support for eight different messaging channels for control.

Weaknesses

•macOS-only support limits reach compared to cross-platform agents like OpenHands.
•GUI agent space is saturating fast with Anthropic and others solving similar problems.

Post Description

I've been building Understudy, an open-source GUI agent for macOS. Wanted to push the GUI stuff beyond the usual short demos, so I tried turning it into an iPhone app reviewer.

You give it one prompt. It browses the real App Store in Chrome, installs the app on a real iPhone through macOS iPhone Mirroring (not a simulator), opens the app and explores it — never seen Snapseed before — records clips and screenshots, composites a narrated review video with FFmpeg locally, uploads it to YouTube, then deletes the app. About an hour, didn't touch the keyboard.

The exploration part is what I'm happiest with. The agent reads the App Store description, goes "they say background removal works, let me try that," and then figures out an unfamiliar app on its own. It regrounds from the live screenshot every action, so unexpected dialogs or UI changes don't kill it.

The reason it can sustain an hour of work: each of the 6 stages runs as a separate child session with its own context. You can't fit an hour of screenshots into one window, so the isolation is necessary. Stages are typed — "workers" are deterministic (browser automation, device control), "skills" are agentic (the agent decides what to do). A "playbook" orchestrates both.

Result video (what the agent published): https://youtube.com/shorts/jliTvpTnsKY?feature=share

Process video (how it was built): https://youtu.be/gYMYI0bxkJs

X: https://x.com/LiangSong850509/status/2037612742392357218?s=2...

MIT license.

Similar Projects

Education○Pass

AI agent that works autonomously while I'm offline

Compelling airplane-mode story, but the 'guide' is unverified claims dressed as a product.

ZeroDayCyber

213mo ago

SaaS●Mid

Usplus.ai – Build a company of AI agents and execute work autonomously

Multi-agent startup-in-a-box, but already crowded by Cognition, Devin, and generalist AI assistants.

Bold Bet

usplusAI

293mo ago

SaaS●Mid

Construct Computer – Agentic Cloud OS for Daily Work

Persistent agent infrastructure beats API calls, but still waitlist-only with no public demo or shipping product.

Bold Bet

ankushKun

2183mo ago

SaaS●Mid

Deploy OpenClaw in 1 minute and run Multiple agents

Managed multi-agent workspace, but ChatGPT, Claude Projects, and Anthropic's built-in task delegation already solve this.

Crowd PleaserShip It

jacobsyc

103mo ago

Developer Tools●●Solid

I built an OpenClaw plugin for autonomous development saving 70% tokens

The repo actually implements an autonomous scheduling engine (work_heartbeat) with per-project isolation, role-based workers, and automated PR review loops — not just a toy demo. It's a bold, concrete attempt to run real dev work from chat (onboarding via channels, auto-created PRs), but it's niche and risky: the payoff depends on OpenClaw adoption and how comfortable you are giving agents commit/review power.

Bold BetWizardry

laurentenhoor

204mo ago

Developer Tools●●Solid

Sugar – A task queue that lets AI coding agents work autonomously

Task queue for AI agents, but orchestrates existing tools without novel architecture.

Big BrainShip It

cdnsteve

103mo ago