PerceptAI – Give AI agents eyes on any screen, not just browsers

Name: PerceptAI – Give AI agents eyes on any screen, not just browsers
Author: Neerajj04

by Neerajj04·May 10, 2026·1 point·3 comments

View on HN

AI Analysis

●●SolidShip ItSolve My Problem

Desktop automation beyond the DOM using Groq Vision and PyAutoGUI.

Strengths

•Targets legacy desktop apps where DOM-based agents like Cursor fail completely.
•Self-healing logic prevents brittle automation scripts from breaking on minor UI shifts.
•Combines EasyOCR with vision models for robust screen understanding without APIs.

Weaknesses

•PyAutoGUI is notoriously fragile on high-DPI screens and multi-monitor setups.
•No mention of security sandboxing for autonomous actions on local machines.

Post Description

I built PerceptAI because every agent tool I found only works on websites via DOM.

75% of real computer work happens in desktop apps, legacy software, and tools with zero APIs. Agents are completely blind to all of it.

PerceptAI uses EasyOCR + Groq Vision to read any screen and PyAutoGUI to act on it. One plain English instruction executes autonomously with self-healing and memory.

Demo: percept-ai-phi.vercel.app GitHub: github.com/Neeraj04-CY/PerceptAi

Would love feedback from anyone building agents.

Similar Projects

Developer Tools●●Solid

Lumen – vision-first browser agent (state of the art, open source)

Vision-only coordinates beat DOM selectors where Stagehand and browser-use still stumble on UI changes.

Big BrainSolve My Problem

fearlessboi

213mo ago

Developer Tools●Mid

Automate Mac with Codex: macOS Control MCP Demo

Lets agents actually see the screen and act on it by returning OCR text with pixel coordinates and offering commands like click_at, type_text, and press_key. You can run it instantly with npx (it auto-creates a Python venv and hooks into Apple Vision/Quartz), and there are ready-made integration snippets for Claude, VS Code, and Cursor — a pragmatic, technically neat tool for closed-loop agent UI work. It’s limited to macOS 13+ and Apple APIs, but within that niche it removes a lot of friction.

WizardryNiche Gem

peterhddcode

104mo ago

AI/ML●●Solid