AI agent that works autonomously while I'm offline
Compelling airplane-mode story, but the 'guide' is unverified claims dressed as a product.

Handles hour-long GUI tasks by splitting workflows into separate child sessions for stability.
Developers, QA engineers, automation enthusiasts
Anthropic Computer Use · OpenInterpreter · OpenHands
You give it one prompt. It browses the real App Store in Chrome, installs the app on a real iPhone through macOS iPhone Mirroring (not a simulator), opens the app and explores it — never seen Snapseed before — records clips and screenshots, composites a narrated review video with FFmpeg locally, uploads it to YouTube, then deletes the app. About an hour, didn't touch the keyboard.
The exploration part is what I'm happiest with. The agent reads the App Store description, goes "they say background removal works, let me try that," and then figures out an unfamiliar app on its own. It regrounds from the live screenshot every action, so unexpected dialogs or UI changes don't kill it.
The reason it can sustain an hour of work: each of the 6 stages runs as a separate child session with its own context. You can't fit an hour of screenshots into one window, so the isolation is necessary. Stages are typed — "workers" are deterministic (browser automation, device control), "skills" are agentic (the agent decides what to do). A "playbook" orchestrates both.
Result video (what the agent published): https://youtube.com/shorts/jliTvpTnsKY?feature=share
Process video (how it was built): https://youtu.be/gYMYI0bxkJs
X: https://x.com/LiangSong850509/status/2037612742392357218?s=2...
MIT license.
Compelling airplane-mode story, but the 'guide' is unverified claims dressed as a product.
Multi-agent startup-in-a-box, but already crowded by Cognition, Devin, and generalist AI assistants.
Persistent agent infrastructure beats API calls, but still waitlist-only with no public demo or shipping product.
Managed multi-agent workspace, but ChatGPT, Claude Projects, and Anthropic's built-in task delegation already solve this.
The repo actually implements an autonomous scheduling engine (work_heartbeat) with per-project isolation, role-based workers, and automated PR review loops — not just a toy demo. It's a bold, concrete attempt to run real dev work from chat (onboarding via channels, auto-created PRs), but it's niche and risky: the payoff depends on OpenClaw adoption and how comfortable you are giving agents commit/review power.
Task queue for AI agents, but orchestrates existing tools without novel architecture.