Back to browse

PerceptAI – Give AI agents eyes on any screen, not just browsers

by Neerajj04·May 10, 2026·1 point·3 comments

AI Analysis

●●SolidShip ItSolve My Problem

Desktop automation beyond the DOM using Groq Vision and PyAutoGUI.

Strengths
  • Targets legacy desktop apps where DOM-based agents like Cursor fail completely.
  • Self-healing logic prevents brittle automation scripts from breaking on minor UI shifts.
  • Combines EasyOCR with vision models for robust screen understanding without APIs.
Weaknesses
  • PyAutoGUI is notoriously fragile on high-DPI screens and multi-monitor setups.
  • No mention of security sandboxing for autonomous actions on local machines.
Category
Target Audience

RPA engineers, agent developers

Similar To

UiPath · Claude Computer Use · Adept

Post Description

I built PerceptAI because every agent tool I found only works on websites via DOM.

75% of real computer work happens in desktop apps, legacy software, and tools with zero APIs. Agents are completely blind to all of it.

PerceptAI uses EasyOCR + Groq Vision to read any screen and PyAutoGUI to act on it. One plain English instruction executes autonomously with self-healing and memory.

Demo: percept-ai-phi.vercel.app GitHub: github.com/Neeraj04-CY/PerceptAi

Would love feedback from anyone building agents.

Similar Projects

Automate Mac with Codex: macOS Control MCP Demo

Lets agents actually see the screen and act on it by returning OCR text with pixel coordinates and offering commands like click_at, type_text, and press_key. You can run it instantly with npx (it auto-creates a Python venv and hooks into Apple Vision/Quartz), and there are ready-made integration snippets for Claude, VS Code, and Cursor — a pragmatic, technically neat tool for closed-loop agent UI work. It’s limited to macOS 13+ and Apple APIs, but within that niche it removes a lot of friction.

WizardryNiche Gem
peterhddcode
104mo ago